Remove post processor selection
Solve #179 (closed)
Merge request reports
Activity
@tiago.pereira do you think you can review this?
We plan to add S-norm, adaptive top-norm, and ZT-norm later. Without making this change, each normalization could add 5 lines in this
__call__
, since each needs a specific check and requires to pass different samples. So probably this can make the code more succinct and easy to extend for other normalizations.Probably later we could move the choose of normalization type into the config file, which requires much more changes, like lines 340-355 in the
entry_point.py
. This is only a long-term plan :)Edited by Yu LinghuSo, the goal here is to have a unified API for the post-processors. This would enable us to add different post-processors in different code packages (e.g. in
bob.bio.demographics
). Currently, this is not possible since we need to know which kind of post-processor we have and select them viaif/else
(which is bad code style anyway).I would recommend to replace the list by a dictionary, so that it is clear which element is which. Also keyword arguments and
**kwargs
would be useful, but this might break the API of ansklearn
transformer116 116 ) 117 117 118 118 # Training the score transformer 119 if isinstance_nested( 120 self.post_processor, "estimator", ZNormScores 121 ) or isinstance(self.post_processor, ZNormScores): 119 try: 122 120 self.post_processor.fit( 123 [post_process_samples, biometric_references] 121 [post_process_samples, biometric_references, probe_samples] Maybe it would be a good idea to have a dictionary here rather than a list, so that possible future changes can incorporate different information, too.
I am not sure if we break the API when we use keyword arguments instead, which would be the preferred option:
self.post_processor.fit( post_process_samples=post_process_samples, biometric_references=biometric_references, probe_samples=probe_samples )
and let the post-processor select what it needs, e.g., by ignoring parameters via
**kwargs
.I agree that a dictionary would be better, and I think we should be a bit careful with feeding our transformers with increasingly complex
X
inputs as is done right now.I am mostly brainstorming here but to keep in touch with the API I think a good rule of thumb would be to ensure proper functioning of the fit_transform method from sklearn (if possible).
This means in particular that the
X
input tofit
and totransform
should remain structurally the same. It doesn't look to me like it is the case in the current implementation (also before this merge request), at least looking at the TNorm transformer.fit
takes currently a list of 3 independent set of Samples (probe samples and 2 cohorts) whiletransform
looks to simply take a list of raw scores to transform.In that context, I am thus wondering if we could pass the cohort samples as additional
**fit_params
?Edited by Laurent COLBOIS
mentioned in merge request !296 (merged)
I am mostly brainstorming here but to keep in touch with the API I think a good rule of thumb would be to ensure proper functioning of the fit_transform method from sklearn (if possible). This means in particular that the
X
input tofit
and totransform
should remain structurally the same.Fixed in !296 (merged) please open a new issue to discuss this API further.