this is a duplicate of a discussion on the biometric ML in Jan 2017 -- where Guillaume Heusch @heusch agreed to lead this as well as with the help of Hannah @hmuckenhirn
currently we are going to get some support from the devel team led by Samuel @samuel.gaist so it will be good to synch with him as well
I have the impression that we are still a little bit behind with respect to the harmonisation of performance reporting as discussed before the Bob refactoring last year.
We are still reporting errors rates and plots with FRR/FAR/SFAR and EER and inconsistently FMR/FNMR/IAPRM/ACPER …
We should converge with an harmonisation following current practices that follow more and more ISO.
I know all the elements are in our hands (Tiago for CMCs, Amir for IAPMR and nice scatter plots with decision, …). See some examples attached.
We need a documented package with examples, on how to produce from a set of scores produced by our biometric and PAD experiments, that anyone can use to report results.
More particularly, we need to use
FNMR(or GMR=1-FNMR) vs FMR instead of FAR/FRR when we report biometric performance (authentication task) in tables (FNMR @ FMR=0.1% or smaller), DET and ROC (EPC case to be discussed)
TPIR/rank when we report biometric performance (identification task) in tables (TPIR @ FPIR=0.1%) and CMC
nice bar plots of score distributions for biometric recognition (Genuine, Zero-effort Impostor)
nice bar plots of score distributions for biometric recognition and PA (Genuine, Zero-effort Impostor, PA) with IAPMR
ACPER/BPCER instead of FAR/FRR when we report PAD performance in tables, DET and ROC
nice bar plot of score distributions for PAD (BonaFide, PA)
EPSC for biometric recognition and PAD
scatter plots for bi-modal biometric recognition
scatter plots for biometric recognition and PAD
Additionally we would need a routine to compute the statistical significance.
a summary of these performance reporting is provided in the attached document (section 4) prepared with our SWAN partners along with references to ISO documents (that can also be found in our biometrics group directory /idiap/group/biometric/standards/ISO-IEC/ eg. ISO-IEC-19795-1 ).
Hi @sebastien.marcel I have been working on this through my book chapter revision. I have been iterating on the design of this on #37 (closed) but if @heusch want to do it we should definitely coordinate.
@amohammadi It would have been nice to have a better idea and understanding of what is done, what remains before Friday's meeting but it can wait if you prefer. Alternatively, I can drop by your office so that you can present my what you have done so far? Cheers.
Hi @tgentilhomme unfortunately I am sick today and I am working from home.
I am trying to come up with a concrete plan of what needs to be done. In the meantime, you can look at #37 (closed) where I explain what I am planning to do. The issues that we have are referenced in #37 (closed). By reading the issues, you may be able to understand the problems that we have. You can also ask your technical questions in #37 (closed).
There should be guides, code snippets, and helper functions for all the possible plots which would allow people to easily write their own plotting scripts.
All plotting functionality must be written in a modular and re-usable way. (e.g. they should not be click commands. click commands should call these functions)
The output of scripts should be capturable to a log file.
What sebastien wanted goes to:
bob bio evaluate FNMR(or GMR=1-FNMR) vs FMR instead of FAR/FRR when we report biometric performance (authentication task) in tables (FNMR @ FMR=0.1% or smaller), DET and ROC (EPC case to be discussed)
bob bio evaluate TPIR/rank when we report biometric performance (identification task) in tables (TPIR @ FPIR=0.1%) and CMC
bob bio evaluate nice bar plots of score distributions for biometric recognition (Genuine, Zero-effort Impostor)
bob pad evaluate nice bar plots of score distributions for biometric recognition and PA (Genuine, Zero-effort Impostor, PA) with IAPMR
bob pad evaluate ACPER/BPCER instead of FAR/FRR when we report PAD performance in tables, DET and ROC
bob pad evaluate nice bar plot of score distributions for PAD (BonaFide, PA)
bob pad eps EPSC for biometric recognition and PAD
bob fusion scatter scatter plots for bi-modal biometric recognition
bob fusion scatter scatter plots for biometric recognition and PAD
Hey guys, as discussed we thought it would be a good idea that you post some interesting plots and/or tables that you have produced before here so that we know what we are going to produce by the end of this. Please attach them as png images here so we can look at them without downloading them.
For example here is an ROC image that I generated for my paper.
Development:
Evaluation:
There are two important details here.
The threshold is chosen on the development set (vertical line) and using the same threshold, evaluation points are drawn in the evaluation ROC figure.
The colors and line schemes are chosen in a way so that when you print this in grayscale, it's going to be readable still. (this something very specific to this plot though. I don't think this can and should be done for every figure).
Regarding the tasks #38 (comment 26050), I need your green light and inputs for the following:
#25 (closed) Move the biometric related functionality of bob.measure to bob.bio.base:
move bob.measure scripts (all of them?) to bob.bio.basescripts directory
move bob.measureload.py and openbr.py to bob.bio.baseutils or tools (or somewhere else?)
move some (which ones?) of (or all?) the functions from __init__.py to a file (with what name?) in bob.bio.base (what directory?)? If some functions in __init__.py have to stay in bob.measure, wouldn't it better to move them in another file (what name?)? It is a little bit strange to have these kind of functions in __init__.py, no?
Regarding #38 (comment 26097), it would be great if you can give a list of all the plots/histo/tables you want, with an illustrative picture as suggested by @amohammadi and also test data or even a link to a potentially existing test function.
move bob.measure scripts (all of them?) to bob.bio.basescripts directory
No, please just remove them for now. We'll provide bob measure evaluate that does what these scripts does but using a different score file format.
move bob.measureload.py and openbr.py to bob.bio.baseutils or tools (or somewhere else?)
Yes please create a folder called bob.bio.base.score for now.
move some (which ones?) of (or all?) the functions from __init__.py to a file (with what name?) in bob.bio.base (what directory?)? If some functions in __init__.py have to stay in bob.measure, wouldn't it better to move them in another file (what name?)? It is a little bit strange to have these kind of functions in __init__.py, no?
I don't think we are moving the functions from here. You need to ask this on #25 (closed).
Basically move it in bob.measure?
Yes with tests and documentation, please. You need to explain where this particular confidence interval can be used and where it cannot be used.
Regarding #38 (comment 26097), it would be great if you can give a list of all the plots/histo/tables you want, with an illustrative picture as suggested by @amohammadi and also test data or even a link to a potentially existing test function.
This is a bit hard to say now. Let's start with biometric plots that are implemented in evaluate.py in bob.bio.base. Sebastien gave a list of the plots too in this issue.
@theophile.gentilhomme's work on this is converging nicely. Because we wanted to release a version of Bob before merging his work, all his work is being merged into branches named theo. However, you should treat this merge requests as if they are being merged into master. So, if you want to have some say in them, please go ahead and review his merge requests (even the merged ones). After Bob is released we will just merge the theo branches into master (in bob.measure, bob.bio.base, and bob.pad.base).
Some example of plots. Let me know if this is okay with your expectations. Note that the commands come with numerous options that can change the display of the plots.
Bob measure commands
bob measure roc .../data/{dev,test}-{1,2}.txt -la 0.4,0.1
bob measure det .../data/{dev,test}-{1,2}.txt -la 0.4,0.1 -ts SysA,SysB
bob measure epc .../data/{dev,test}-{1,2}.txt -ts SysA,SysB
bob measure hist .../data/{dev,test}-{1,2}.txt -ts SysA,SysB
bob measure metrics .../data/{dev,test}-{1,2}.txt -ts SysA,SysB
Bob measure also comes with evaluate command that generates several plots and metrics at once and gen that generates fake scores for bob.measure supported input files.
I have encountered an issue with bob bio evaluate and bob measure metrics commands and I hope this is a correct place to report it.
bob bio evaluate: when given two score files, dev and eval, it computes EER and MIN-HTER thresholds independently on each file. However, it should compute a threshold on dev scores and apply the same threshold on eval scores in each EER and MIN-HTER scenarios.
bob measure metrics: I thought may be the other command would work. I gave the same dev and eval score files to this command and I got an Error telling me that the file must by a two column file with the first column containing -1 or 1 and the second column the scores. I am a bit confused. Do we use two column files anywhere?
I just noticed that this package still reports errors following biometrics naming conventions. IMO, biometrics-related stuff (e.g., the distinction between FAR FMR) should be moved up to bob.bio.base. Here, you should have things that look more like https://en.wikipedia.org/wiki/Precision_and_recall.
To reply correctly, I need to know if there are any practical differences between FAR and FMR, and FRR and FNMR. If there aren't, I would report, on a given threshold:
FPR -> False Positive Rate (spell it out so there are no confusions)
FNR -> False Negative Rate
Precision
Recall
F1-Score
And that is it
An option could allow you, for example, to replace the values above by something more digestible for biometrics, say --biometrics and then the program prints:
False Acceptance Rate
False Rejection Rate
Half-Total Error Rate
The thresholding should also be configurable if that is not already the case: it should be possible to say "report me all values when FPR is set to 10%" or "report me all values when FAR is set to 0.01%" or "report me all values at the Equal-Error Rate":
bob measure metrics dev-1.txt (as per above, use Equal-Error Rate to calculate the threshold, that should be also reported)
bob measure metrics --biometrics dev-1.txt (as per above)
bob measure metrics --far=0.0001 --biometrics dev-1.txt (reports values for an FAR of 0.0001 (0.01%), no minimisation takes place)
bob measure metrics --criterion=minhter dev-1.txt (reports values using FPR/FNR terminology with threshold calculated by minimizing the HTER on the set)
For obvious reasons, options such as --criterion and --far should be mutually exclusive. As it is currently coded, it is confusing that you should pass --criterion=far --far-value=0.0001. It would be easier to say --far=0.0001 and that is it. If the user passes both, then an error is raised.
It is important that this program is very clear about metrics being used, so I would avoid any acronyms during the error reporting. It is OK to have acronyms on the option names, but documentation should be explicit.
To reply correctly, I need to know if there are any practical differences between FAR and FMR, and FRR and FNMR. If there aren't, I would report, on a given threshold: