Wrappers and aggregators
Hi guys, follow an awesome update of this MR that is ready to be merged to dask-pipelines
.
Follow below the list of features:
- Ported the legacy transformers with the new aggregators API (Preprocessor, Extractor and Algorithm)
- Created some easy to use wrappers to wrap these legacy objects
- Removed all traces of non picklable stuff in our design. It was a terrible idea and this brings more problems than solve. It's a lesson learned that shouldn't be repeated.
- Making an effort to make stuff picklable in the most relevant packages (bob.bio.face, bob.learn.em, bob.learn.linear,....)
- Rewrote part of the
BiometricAlgorithm
class and created aggregators to handle dask and checkpoint support (https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/wrapper-api/bob/bio/base/pipelines/vanilla_biometrics/wrappers.py#L122 and https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/wrapper-api/bob/bio/base/pipelines/vanilla_biometrics/wrappers.py#L12 respectively) - Detached the score writing mechanism from BiometricAlgorithm. This is useful if we want to write scores for different "analyzers". Now we have 2 score writers.
- FourColumnsWriter: This one is our link to the past (e.g. our plotting scripts)
- CSVWriter: In one line writes: i-Metadata from biometric reference, ii-Metada from probe, iii-the score. This is useful when you want to analyze certain cohorts based on metadata (e.g demographics, device model, etc,..). Even the vulnerability test can benefit from it. We could write one protocol containing all probes (genuines and PAs) and make an analyzer that filters this information from the CSV.
- ... you could imagine writing a SQLWriter in case you have large scale datasets and want to leverage from some SQL features (indexes, optimized aggregation functions, etc...)
- Created better tests
Things to be done:
- A user guide is still missing, but I have an idea for it to make things more understandable.
- I still need to move forward with this package for my work. For instance, I have stashed here a pipeline for score normalization.
I think that's it
Edited by Tiago de Freitas Pereira