As Bob keeps its trajectory to serve more types of systems, it becomes less and less obvious to keep biometrics-related functionality inside this package. I'm proposing we move those into bob.bio.base, which is the place they should have been in the first place.
A few things that come to mind:
All score loading/saving functionality
OpenBR exchange support
The scripts, which are tunned for Biometrics-style reporting (and can only load biometric score files)
Not sure about all the identification stuff, maybe generic enough to keep here?
Thanks for your feedback.
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
We could, but I think we should leave it like it is. Moving code would break a lot of related packages, for example our reproducible research package.
I thought that you people would also have other things to do rather than shifting code from package to package. I do have. If you insist to move the code, please do not count on me.
I understand your concerns regarding API breaking. Nevertheless, that is true for no matter API change we do, even recent changes done in other packages affect things (e.g. all database manipulations). If we start on those grounds, than basically we should decrete total API freeze forever ;-)
Which "reproducible research" package are you refering to, specifically? I'd like to give a look at that.
My main concern is that we have a biometric-related functionaliy in a very low-level package. Everytime somebody wants to change something related to biometrics - or fixing stuff, we need to re-issue a new version of this package and this implies in a whole release cycle for Bob! This is not a reasonable situation. Take for example, ISO standardization of plots - which BTW, we're suppose to implement soon.
I thought that from all involved, you'd be the most excited about this since in the longer term, it would make your life a bit easier. Today, we have "competing" solutions for biometrics evaluation (evaluate.py in bob.bio.base and the scripts here). Which people should use? Which shall we maintain as a group?
Deprecation doesn't have to be done in a single step and we can take the long route to do this. My understanding is that it makes no sense to keep such a high-level interfaces in this low-level package.
For example, one of the latest open source packages that we have: https://pypi.python.org/pypi/bob.chapter.FRICE (for which I don't have access to the GitLab repo anymore) contains code that directly interfaces bob.measure. You might want to check bob/chapter/FRICE/tools.py. It basically uses bob.measure.load.split_four_column to load the score files, which you want to move into bob.bio.base. I am not saying that it is impossible to move that code, I am just saying that I don't have time for that.
It is true that we have several scripts, for example, to plot ROC and DET curves. I have implemented my own way of plotting in bob.bio.base.scripts.evaluate since in bob.measure there were some odd parameters that I did not like. From my point of view, we can remove those scripts (including compute_perf.py, apply_threshold.py, eval_threshold.py, recurse_results.py and plot_cmc.py. Also, the complete bob,measure.plot sub-package can be removed; the plotting functions are IMHO not flexible enough. Examples on how to generate these plots using matplotlib should rather be given in the documentation. In this case, people can modify their plots as they want (e.g., using log or non-logscale x-axes, or put the legends where they want to put them). This would make the package itself independent of matplotlib.
Indeed, in the above-mentioned open-source package, I am not even using the evaluate.py to plot, but I generate plots that are highly specialized. The evaluate.py only serves as an example (and a quick way) to generate some kind of plots. I would never put one of these plots directly into a publication.
The openbr functionality can be moved to bob.bio.base -- you might want to create a new sub-directory in bob/bio/base for the evaluation-related functions. Also, the DIR plots should be integrated into evaluate.py, but sofar I didn't have the time or need to do that.
I think it might be a good idea to maintain the package, as we already have reports that try to use the package (with the latest package versions) and fail: https://groups.google.com/forum/#!topic/bob-devel/ZHw_kqz_bWw
I can maintain that package to work with the latest versions of the bob packages. However, I don't think that it is required to put it into the nightlies tests...
Can you ask the system guys and @sebastien.marcel why I am blocked, and remove the block in case there is no reason? Thanks.
@andre.anjos what actually should be moved is a little vague here. Could you please narrow it down more?
Guys, I am going ahead with moving these codes but I will not remove them from bob.measure instead it will raise a deprecation warning.
So bob.measure will work as is for now.
One of the major goals that I have is to mix compute_perf.py and evaluate.py into one script. So if you have ideas or opinions, you should say them now.
Also, looking at several papers it seems that you can use ROC, DET, EPC curves for any binary classification task (e.g. Ivana's paper: Biometrics evaluation under spoofing attacks). So I am not sure the core functionality of those should be removed. The only thing that changes through these plots in different application is their title and x and y labels. The same can be said even for the EPSC framework which can be used for any classification task with 3 classes IMHO.
There should be one application to devise a full set of default plots which are standard to biometric experiments
This application should be intuitive (avoid non-optional options for example) and do simple things in a simple manner
Examples on the command line --help are more than welcome
The script should be as simple as possible, just parse and validate the command-line and send options to a library function that actually does the job
Building on the top of the library function should be possible (i.e., it should be as modular and re-usable as possible). Possibly, having individual functions à la bob.measure that work with scores is the best
An extra app should be in charge of combining scores into plots - somebody needs to think about how to do this and provide an easy way to do it, which is also intuitive. This functionality can be optionally combined with the first script if that makes sense to the implementor.
Remember: easy should be easy; it is OK to read more for complex tasks
Any option parser (docopt or argparse) should be OK to implement the above
We want a script to compare biometric systems. This is achieved by compute_perf.py and evaluate.py but they work differently and there should be only one. I think an interface like this should be helpful:
# the script would reside in bob.bio.base# print and plot evaluations of one system using just one set of scores$ evaluate.py scores-dev# print and plot evaluations of one system using just development and evaluation scores$ evaluate.py scores-dev scores-eval# I also want the interface to be easily integrated with bash expansions. So this should work too.$ evaluate.py ~/idiap/user/mobio/vgg-cosine/male/nonorm/scores-{dev,eval}# With an option to specify evaluation scores exist also. More explaining on this later.$ evaluate.py [-e|--evaluation] scores-dev scores-eval# The interface should also support comparing multiple systems: Here we assume all scores are test scores.$ evaluate.py (-m|--multiple-systems) sys1-scores sys2-scores sys3-scores ...$ evaluate.py -m ~/idiap/user/mobio/{vgg-cosine,rankone}/male/nonorm/scores-eval# Comparing several systems with dev and eval scores$ evaluate.py (-m|--multiple-systems)(-e|--evaluation) sys1-dev sys1-eval sys2-dev sys2-eval ...$ evaluate.py -me ~/idiap/user/mobio/{vgg-cosine,rankone}/male/nonorm/scores-{dev,eval}# The legends are extracted using the differences between paths or can be provided:$ evaluate.py -me ~/idiap/user/mobio/{vgg-cosine,rankone,isv}/male/nonorm/scores-{dev,eval}[-l|--legends] VGG RANKONE ISV
All plots would be saved in a file curves.pdf and metrics will be printed. Options will be provided to enable/disable different plots/metrics.
A neat option would be to give the console to the user (ipdb.set_trace() ??) after each plot so that user can run custom commands like changing the axis range, labels, legends, etc.
In my eyes. the current version of evaluate.py is doing a good job. I never liked the compute_perf.py.
I don't see, why we should differentiate between --multiple-systems and a single system. In fact, it might be a good idea if we integrate it more into the directory structure of scores that the bob.bio packages provide. For example, we might have options called --systems and --protocol , which automatically parses into the results/<system>/<protocol>/{no,zt}norm/scores-{dev,eval}, checks which of the files are available, and performs the according plots. The --protocol option might even be optional and automatically determined, e.g., if only one <protocol> directory is inside all <system> directories.
I particularly like if you can specify, which plots you want to have. Most users either want a DET or an ROC or a CMC curve. Especially when evaluating large score files, ROC and CMC will both take some time to be computed.
And people want to include the plots directly into publications. I know that it is easy to do with a multi-page pdf in LaTeX, but other people might not know the according LaTeX options.
So far, I tried to make available all the command line options to change the main parts of the plots.
However, we also might want to add a command line parameter that allows to specify code (or a script) that will be evaluated after plotting -- this would allow the same functionality as giving the console to people.
If we want to go with @amohammadi's proposal, I remember that there is an input command in python, which will allow you to write code. I have not worked with that yet, but I guess that this could be a way to go (no ipdb is required in this case). However, I would prefer the command line option -- as this would allow to write the full command line into a (shell) script and recreating the plot in exactly the same way would be as easy as calling the script again.
In my eyes. the current version of evaluate.py is doing a good job. I never liked the compute_perf.py.
I like evaluate.py because it offers more functionality and compute_perf.py because it offers a clean interface with good examples.
I believe there should be one with all the chocolate.
I don't see, why we should differentiate between --multiple-systems and a single system.
Because "This application should be intuitive (avoid non-optional options for example) and do simple things in a simple manner" and "easy should be easy; it is OK to read more for complex tasks". In the simplest case, you just want to evaluate one system so you would do evaluate.py scores-dev or evaluate.py scores-dev scores-eval but how would the script know if you want to evaluate two systems using only development scores: evaluate.py sys1-dev sys2-dev? hence the --multiple-systems option. If you think about it, when you want to compare two systems, you would expect things to be more complex. Hence searching for a parameter like --multiple-systems would be expected from the user.
And people want to include the plots directly into publications. I know that it is easy to do with a multi-page pdf in LaTeX, but other people might not know the according LaTeX options.
Multi-page pdf files are nothing new; both scripts do that.
I particularly like if you can specify, which plots you want to have. Most users either want a DET or an ROC or a CMC curve. Especially when evaluating large score files, ROC and CMC will both take some time to be computed.
That's a good point. But the problem with the current version of evaluate.py is that it puts each of these in different pdf files so you would have to specify --det mobio-rankone-vgg-isv-nonorm.det.pdf --roc mobio-rankone-vgg-isv-nonorm.roc.pdf --cmc mobio-rankone-vgg-isv-nonorm.cmc.pdf which becomes quite annoying and all of these pdf files are already multipage too. Why not something like: evalute.py --det --roc --cmc --output=mobio-rankone-vgg-isv-nonorm.curves.pdf -me ~/idiap/user/mobio/{vgg-cosine,rankone}/male/nonorm/scores-{dev,eval}
However, we also might want to add a command line parameter that allows to specify code (or a script) that will be evaluated after plotting -- this would allow the same functionality as giving the console to people.
how would the script know if you want to evaluate two systems using only development scores: evaluate.py sys1-dev sys2-dev? hence the --multiple-systems option.
But this doesn't solve the problem. What if you have 4 systems with only dev files: evaluate.py -m sys1-dev sys2-dev sys3-dev sys4-dev? How should the script differentiate this from evaluate.py -m sys1-dev sys1-eval sys2-dev sys2-eval?
I don't really think that there is a way around having two options, one for dev and one for eval.
With bob.bio.base!68 (merged) I have removed all dependencies of bob.bio.base from bob.measure.plot. Any biometrics-related plot from bob.measure.plot (which I think is all of it) can now be removed.
I see now. You could make an argument that CMC while used only for biometrics now the base implementation and generic plotting could be in bob.measure in case someone wants to use it for something else. See: https://stats.stackexchange.com/a/142617
I was thinking of the same thing for the EPSC. The Expected performance and spoofability curves (EPSC) could be potentially applied to any problem that has one positive class and two negative classes. So it might be a good idea to keep the generic EPSC implementation here as well.
I think if it can be used in other applications, it would be better to keep it here. Tests are already working with the new generic input file format. I guess the same question stands for the open set rates?
@theophile.gentilhomme I am not sure how the CMC curve can be created from the generic input file format (I am not sure, what this stores, though). To compute CMC curves, you need at least store the probe filename, since the rank needs to be computed per probe. Just splitting into positives and negatives is not sufficient.