Skip to content
Snippets Groups Projects

Remove post processor selection

Closed Yu Linghu requested to merge post-processor-select into master
1 unresolved thread

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • @tiago.pereira do you think you can review this?

  • @ydayer or @lcolbois do you think you can review this? I am not familiar with this part of code.

  • From what I understand this aims to enable to have the same call signature for Z-score normalization and T-score normalization transformers.

    Is there a specific reason (i.e. a specific score normalization scheme) that requires this change ?

  • Author Developer

    We plan to add S-norm, adaptive top-norm, and ZT-norm later. Without making this change, each normalization could add 5 lines in this __call__, since each needs a specific check and requires to pass different samples. So probably this can make the code more succinct and easy to extend for other normalizations.

    Probably later we could move the choose of normalization type into the config file, which requires much more changes, like lines 340-355 in the entry_point.py. This is only a long-term plan :)

    Edited by Yu Linghu
  • I can have a look at it later today/tomorrow.

  • So, the goal here is to have a unified API for the post-processors. This would enable us to add different post-processors in different code packages (e.g. in bob.bio.demographics). Currently, this is not possible since we need to know which kind of post-processor we have and select them via if/else (which is bad code style anyway).

    I would recommend to replace the list by a dictionary, so that it is clear which element is which. Also keyword arguments and **kwargs would be useful, but this might break the API of an sklearn transformer

116 116 )
117 117
118 118 # Training the score transformer
119 if isinstance_nested(
120 self.post_processor, "estimator", ZNormScores
121 ) or isinstance(self.post_processor, ZNormScores):
119 try:
122 120 self.post_processor.fit(
123 [post_process_samples, biometric_references]
121 [post_process_samples, biometric_references, probe_samples]
  • Maybe it would be a good idea to have a dictionary here rather than a list, so that possible future changes can incorporate different information, too.

    I am not sure if we break the API when we use keyword arguments instead, which would be the preferred option:

    self.post_processor.fit(
        post_process_samples=post_process_samples, 
        biometric_references=biometric_references, 
        probe_samples=probe_samples
    )

    and let the post-processor select what it needs, e.g., by ignoring parameters via **kwargs.

  • I agree that a dictionary would be better, and I think we should be a bit careful with feeding our transformers with increasingly complex X inputs as is done right now.

    I am mostly brainstorming here but to keep in touch with the API I think a good rule of thumb would be to ensure proper functioning of the fit_transform method from sklearn (if possible).

    This means in particular that the X input to fit and to transform should remain structurally the same. It doesn't look to me like it is the case in the current implementation (also before this merge request), at least looking at the TNorm transformer. fit takes currently a list of 3 independent set of Samples (probe samples and 2 cohorts) while transform looks to simply take a list of raw scores to transform.

    In that context, I am thus wondering if we could pass the cohort samples as additional **fit_params ?

    Edited by Laurent COLBOIS
  • Please register or sign in to reply
  • Amir MOHAMMADI mentioned in merge request !296 (merged)

    mentioned in merge request !296 (merged)

  • I am mostly brainstorming here but to keep in touch with the API I think a good rule of thumb would be to ensure proper functioning of the fit_transform method from sklearn (if possible). This means in particular that the X input to fit and to transform should remain structurally the same.

    Fixed in !296 (merged) please open a new issue to discuss this API further.

  • Please register or sign in to reply
    Loading