Incorporate a general overview of biometric verification and illustrate biometric verification experiment flow in bob.bio.base doc
fixes #73 (closed)
Merge request reports
Activity
36 * Biometric recognition (steps 4 to 8) 37 * Evaluation (step 9) 38 39 The communication between two steps is file-based, usually using a binary HDF5_ interface, which is implemented in the :py:class:`bob.io.base.HDF5File` class. 40 The output of one step usually serves as the input of the subsequent step(s). 41 Depending on the algorithm, some of the steps are not applicable/available. 42 E.g. most of the feature extractors do not need a special training step, or some algorithms do not require a subspace projection. 43 In these cases, the according steps are skipped. 44 ``bob.bio`` takes care that always the correct files are forwarded to the subsequent steps. 18 "Biometric verification" refers to the process of confirming that an invidual is who they say they are, 19 based on their biometric data. This implies that we have access to both the person's biometric data and 20 their identity (e.g., a numerical ID, name, etc.). 21 22 A biometric verification system has two stages: 23 24 1. **Enrollment:** A person's biometric data is added to the system's biometric database alongside the person's ID. This is not true. For example, if you have a computer login application based on biometrics, which requires both a username and a biometric in place of a password, you need to store a database of biometric models (assuming that the verification is done on a server, which deals with multiple people).
37 * Evaluation (step 9) 38 39 The communication between two steps is file-based, usually using a binary HDF5_ interface, which is implemented in the :py:class:`bob.io.base.HDF5File` class. 40 The output of one step usually serves as the input of the subsequent step(s). 41 Depending on the algorithm, some of the steps are not applicable/available. 42 E.g. most of the feature extractors do not need a special training step, or some algorithms do not require a subspace projection. 43 In these cases, the according steps are skipped. 44 ``bob.bio`` takes care that always the correct files are forwarded to the subsequent steps. 18 "Biometric verification" refers to the process of confirming that an invidual is who they say they are, 19 based on their biometric data. This implies that we have access to both the person's biometric data and 20 their identity (e.g., a numerical ID, name, etc.). 21 22 A biometric verification system has two stages: 23 24 1. **Enrollment:** A person's biometric data is added to the system's biometric database alongside the person's ID. 25 2. **Verification:** A person's biometric data is compared to the biometric data with the same ID in the system database, and a match score is generated. The match score tells us how similar the two biometric samples are. Based on a match threshold, we then decide whether or not the two biometric samples come from the same person (ID). This is true for identification (1-N comparison), but not for verification (1-1 comparison). In a verification scenario, I would be trying to prove to the system that I am person A. So, I would provide my ID ("A") and my biometric sample. The system would then look for "A" in the database and check that my provided sample matches the one stored for "A". If they match, the system decides that I am indeed person A.
Unfortunately, the term verification is not correctly used inside the updated documentation. I don't know, why you replaced the (more generic) term "Recognition" with "Verification".
bob.bio.base
can do both, identification and verification. It is simply a matter, how the score files are evaluated.It is true that the script is called
verify.py
. But this does not mean that it is limited to verification. We might call the script differently, if you prefer that. But do not use different terms for Recognition.I haven't found the time to read through the complete docs. I will do that as soon as I find the time. I am buried with work right now. I hope I can come up with some updates tomorrow.
@mguenther As I mentioned in my comments in issue #73 (closed), the reason I used the term "verification" is because our script is called verify.py. While this may not be a big deal and perhaps doesn't warrant changing the name, it is a little confusing for people to whom the distinction between identification and verification matters. I'd be happy to leave the term "recognition" if you honestly think it won't be confusing for people, but my changes were just suggested to try to be as consistent and true to biometrics terminology as possible. In fact, if I were to be really picky, I would suggest that the script shouldn't be called verify.py OR identify.py OR recognise.py, because none of these three things actually happens in the script. The script finishes with scoring, but scoring does not make the final decision on a person's identity - this happens in Evaluation. Hence why I'm keen to clarify ... let me know your thoughts.
added 1 commit
- 85241e73 - Changed all mention of "verification" to the more general term "recognition"
The manual looks awesome and I like very much the diagrams - they are helpful. Here are some small comments.
- The main index.rst file has a few typos:
- "A databases..."
- "tolls" instead of "tools"
- "If you are interested, please continue reading:" - I wonder what is the purpose of this sentence? ;-)
- Installation instructions
- Mostly outdated - we should start with the "conda-based" installation then keep the "pip-based" one. Remove zc.buildout as that is only for development
- What are "some default biometric recognition databases"? Maybe the word "default" or this sentence would need revision
- Instead of "original" data, I'd prefer the term "raw" data as to be compatible with the bob.db.base guide
- If the installation is done with "conda" or "pip" the directory "./bin/" will not exist - please remove all references to this
- For generating the documentation, I think we should remove the "./bin/sphinx-build" stuff from the installation instructions. That should only be used in dev mode.
- "Running biometric verification experiments"
- We need to clarify "verification" versus "recognition" here - I kind of agree with @mguenther on this
- Overall, I'm a bit concerning if the growing number of command-line options one need to master to conduct an experiment (in a flexible way). Annotated configuration files can be used as resources as well and would allow the complete specification of a pipeline w/o command-line options. I think that, in this documentation, we should default to this and just say all would be possible with command-lines as well.
That is it for the time being.
- The main index.rst file has a few typos:
-
Changing the name of the script to
bio_toolchain.py
would be fine with me. Indeed,bob.bio
can do both verification and identification (and even classification as required by some databases' protocols such as LFW or YouTube), and how to evaluate is decided after the experiment is run. -
I can see your point that you think feature projection is part of the feature extraction process. Your intuition might be correct, but from the implementation details, it is rather bound to the algorithm. In fact, feature projection is just a way to speed up processing. For example, in LDA projection, you could always project the features during enrollment and during scoring, which would take some time as you handle the same probe files several times. The same applies for ISV, where we use the projection step to pre-compute some information, which need to be computed only once per (probe) sample. Hence, performing the projection before enrollment or scoring just saves time.
-
Now, ISV can be applied to any type of (two-dimensional) features. We use, for example, DCT-block features for faces, and MFCC coefficients for speaker recognition. Implementing the ISV-specific projection into the feature extraction would not make sense, since we would need to implement the same technique several times. Also, the DCT-blocks and MFCC features might be used by other algorithms, which might require a different kind of projection, if any. Hence, feature projection has been designed to be part of the
Algorithm
, not of theExtractor
. Only theAlgorithm
knows, which kind of projection it needs. -
I can also see your confusion for the
Eigenfaces
extractor. Indeed, in there the "projection" step is part of the extraction. This is indeed a little bit messed up. The two main purposes of the existence of theEigenfaces
extractor are: a) that people can see, how to train an extractor (none of our other extractors is trainable), and b) because eigenfaces are a default feature reported in literature, which might be used in otherAlgorithms
, which in turn even might define their own way of feature projection, i.e., projecting the eigenface features further into their own projection space. In fact, other feature extractors also perform some kind of projection, for example, the DCT-blocks extractor projects face images into DCT components (ehm, is the discrete cosine transform a projection???). The only difference there is that the DCT-block projection is fixed and is not learned from data. However, we should stick to the term projection only in combination with theAlgorithm
, i.e., in order not to confuse readers more then necessary. -
I still have not found the time to read the full documentation. I just had a look at the new figures that you added. Although it is a bit late now (and I am sorry that I didn't remember that earlier), I wanted to let you know that I have generated a different graphic that explains the complete biometric recognition toolchain. You can find it in figure 14 (page 50) in http://publications.idiap.ch/downloads/reports/2013/Gunther_Idiap-RR-13-2017.pdf I also have the figure as plain PDF or PNG if you want to integrate this.
In response to @andre.anjos:
- The sentence "If you are interested, please continue reading:" is left over from me. Feel free to remove it.
- Since installation is now done via pip in most cases, we should get rid of the
zc.buildout
section. Maybe, move the example of adding more packages of thebob.bio
framework into thepip
section. I also agree to remove thesphinx-build
stuff. You are already reading the documentation, so why would you want to re-generate it. I guess/hope that this is detailed in the installation guide of Bob, anyways. - Identification and verification (as well as classification) are types of recognition. How to evaluate is decided only during calling the
evaluate.py
script, which currently supports both identification (Recognition rate and CMC for closed set, and DIR for open set identification) as well as verification (ROC, DET, EER, HTER, EPC). For classification (e.g. for LFW), where mean and standard deviation of classification accuracy should be computed, there is no simple script at the moment.
I also agree that we should make the configuration file for an experiment to be the default way of running an experiment. Please make sure to document that command line options override any option in the configuration file (I am not sure, what about the
verbose
option, i.e., if it would reset the verbosity level or if it would *add more verbosity).-
@andre.anjos Thank you for your feedback. Just to clarify, I only worked on the part from "Running Biometric Recognition Experiments" to "Running Experiments (part I)" (but not including the running experiments part). I think the remainder of bob.bio.base's documentation could also be enhanced so that it is a bit clearer, but I did not have time to tackle this during the Hackathon.
@mguenther Thank you for your feedback and for the detailed explanation of your thinking behind the projection bit. As for the diagrams, I had a look at your one and I think it's very nice. However, I wanted to relate the diagrams of the flow of a biometric recognition experiment in bob.bio.base to my diagram of a typical biometric recognition system, so I wanted to show how the 4 stages (Preprocessing, Feature extraction, Matching, and Decision making) are set up in bob.bio.base. I also wanted to show where the outputs are stored, hence the folder/directory pictures. Taking this into account, I would like to leave my diagrams in there (not least because they took a long time to draw!), but I'm open to feedback from you (and others) after you have fully read my section (from "Running Biometric Recognition Experiments" to "Running Experiments (part I)", but excluding the running experiments part). We could always incorporate your diagram in at a later stage. For example, we discussed @onikisins @akomaty the need to explain in more detail (+ diagrammatically) how the different classes are linked/called when we run verify.py - it could be useful to have your diagram, or something similar, somewhere in this section?
Guys, @vkrivokuca's contribution is extremely important here. Thank you @vkrivokuca for making this effort.
A few points:
- Installations instructions are based on conda now: !72 (merged) You don't see it here because this branch is based on an older version of master. If you still see problems with it, open another issue and discuss it separately.
- The goal of this merge request is to talk about biometric experiments without explaining what happens in the code which I think is quite good so far. This is not meant to refactor or rename things in the bob.bio framework. If you really believe something should be changed, then you should implement it as well and make sure everything works.
- once we have high-level diagrams and explanations, this can help us create robust toolchains in the BEAT platform in future.
- @vkrivokuca I think you came down from your high-level and conceptual explanation to implementation details of bob.bio.base a little early. You could have explained things a bit more without talking about extractors and algorithms. I'll discuss this with you when I get back in the afternoon. For example, you could have just skipped talking about the projection step since that's just implementation details.
-
@vkrivokuca the diagrams are really good but I would rather not see
extractor
oralgorithm
boxes in there rather calling them justfeature extraction
andmatcher
. - No mention of buildout or buildout related things must exist in docs. This is explained here: bob#241 (closed) @akomaty took care of this in all other documentations.
I have read through the documentation, and applied some small fixes. First of all, @vkrivokuca thank you for your work. The time you spent is highly appreciated.
I agree with @amohammadi that the installation instructions should be updated. However, on another point I disagree with @amohammadi. He said the you should talk about
feature extraction
andmatcher
, while I think, we should not use different words for the same things.Feature Extraction
might be OK, while I would preferExtractor
as this is the terminology used throughout the documentation. Anyways, thematcher
would be misleading here. Particularly, @vkrivokuca uses theAlgorithm
for projection, enrollment and scoring, while the "matcher" (from Fig. 1) only performs the scoring part.I have some more things to notice for the installation section:
- In the Installation/Databases section, the link to the
verification database
simply links to the Package list of Bob. I don't think that this link is good. I had a different link to a list of biometric databases in the documentation ofbob.db.verification.utils
(http://pythonhosted.org/bob.db.verification.utils/implemented.html). However, it seems that this is gone now. Maybe, we can update this list and move it to thebob.bio.base
documentation, i.e., as a separate.rst
file? - In the same section, there is still a
./bin/databases.py
command, which should be updated. What about calling this scriptbio_databases.py
(inside thesetup.py
should be sufficient), i.e., to be distinguishable from other scripts with the same name (for example,pad_databases.py
, if existent). - In the Note below, there is a link to a file (bob/bio/base/test/utils.py), which links to a local file (that needs to be edited), which is only valid when installed via
buildout
. I am not sure, how to link these files, when installed viapip
. I think, we should completely remove this note. - As @andre.anjos mentioned, the section on "Generate this documentation" can also be removed.
For the experiment section, I think there are some points to be changed:
- In the first paragraph, you mix up identification and verification. In point Enrollment you talk about a model database (identification), while in Recognition you talk about a decision threshold, which is verification. I have added a few phrases there, which clarify that.
- In figure 1, you talk about the Template Database. Theoretically, this is the correct term, but in the
bob.bio
packages, we call the enrolled templatesmodels
throughout. Maybe you should change the Template Database to Model Database. However, as mentioned in the point above, this is only valid for identification. An even better solution would be to simply call it Model, which works for both identification and verification. - In the dame figure 1, you have a "Decision Maker", which is -- once again -- only valid for verification. Maybe, in figure 1 you should stick to one of the two, either verification or identification. As the adaptation to verification is simpler (just remove the Template Database with Enrolled Model), I would suggest to do verification in figure 1. This would also correspond better to the description below figure 1.
- Below that, when you talk about the Template Database, you refer to the enrolled model as "extracted feature set" or "template". While template is correct (and you can leave the term here), the enrolled model is not simply an "extracted feature set", but can be much more complicated. Hence, you should call it a "model" instead (which is also the terminology that the reader will find later on).
- Figure 4 is misleading, as the algorithm is not linear (for example, scoring is not only based on the models). Instead of having an overview of the Algorithm, just have the (optional) projection step in Figure 4. Figures 5 and 6 better explain the enrollment and scoring steps, and they are not misleading.
- In the "Feature Extraction" subsection, I have removed the sentence "For example, only a few points in a person's face (e.g., eyes, nose, mouth, chin) are actually used for recognition purposes." which is not correct. In most cases, features are extracted from the full (preprocessed) face image, not only from eyes and nose.
- In "Decision Making" and in Figure 7, you only talk about measures and curves for verification. However, there are more ways to evaluate, for example Recognition Rates (RR), Cumulative Match Characteristics (CMC) for closed-set identification, as well as Detection & Identification Rate (DIR) plots for open-set identification. It would be great to incorporate these, too. Otherwise, people might think that
bob.bio
would be limited to verification only.
- In the Installation/Databases section, the link to the
added 1 commit
- 2404614e - Added some phrases to separate identification and verification
Now, reading further in the documenation, I think we should have @vkrivokuca's explanations in a separate
.rst
file, not on top of how to run an experiment. This page is IMHO getting too long. What about a page explaining "The Structure of a Biometric Recognition System"?added 16 commits
-
2404614e...e396e970 - 13 commits from branch
master
- 188f4b88 - Modified experiments.rst to add overview on biometric verification + illustratio…
- 7b885940 - Changed all mention of "verification" to the more general term "recognition"
- f0289d57 - Added some phrases to separate identification and verification
Toggle commit list-
2404614e...e396e970 - 13 commits from branch
Hi I rebased this branch to master since the installation instructions were changed in master and now is based on conda. It's because even though I kept saying installation instructions are changed !72 (merged) and already merged in master, you guys kept commenting here on old installation instructions which btw is not the goal of this merge request :)