Skip to content
Snippets Groups Projects

Incorporate a general overview of biometric verification and illustrate biometric verification experiment flow in bob.bio.base doc

Merged Vedrana KRIVOKUCA requested to merge verification-overview into master

fixes #73 (closed)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Vedrana KRIVOKUCA changed title from Modified experiments.rst to Incorporate a general overview of biometric verification and illustrate biometric verification experiment flow in bob.bio.base doc

    changed title from Modified experiments.rst to Incorporate a general overview of biometric verification and illustrate biometric verification experiment flow in bob.bio.base doc

36 * Biometric recognition (steps 4 to 8)
37 * Evaluation (step 9)
38
39 The communication between two steps is file-based, usually using a binary HDF5_ interface, which is implemented in the :py:class:`bob.io.base.HDF5File` class.
40 The output of one step usually serves as the input of the subsequent step(s).
41 Depending on the algorithm, some of the steps are not applicable/available.
42 E.g. most of the feature extractors do not need a special training step, or some algorithms do not require a subspace projection.
43 In these cases, the according steps are skipped.
44 ``bob.bio`` takes care that always the correct files are forwarded to the subsequent steps.
18 "Biometric verification" refers to the process of confirming that an invidual is who they say they are,
19 based on their biometric data. This implies that we have access to both the person's biometric data and
20 their identity (e.g., a numerical ID, name, etc.).
21
22 A biometric verification system has two stages:
23
24 1. **Enrollment:** A person's biometric data is added to the system's biometric database alongside the person's ID.
  • For verification, there is not biometric database of models. A database of models is only created in identification.

  • This is not true. For example, if you have a computer login application based on biometrics, which requires both a username and a biometric in place of a password, you need to store a database of biometric models (assuming that the verification is done on a server, which deals with multiple people).

  • Please register or sign in to reply
  • 37 * Evaluation (step 9)
    38
    39 The communication between two steps is file-based, usually using a binary HDF5_ interface, which is implemented in the :py:class:`bob.io.base.HDF5File` class.
    40 The output of one step usually serves as the input of the subsequent step(s).
    41 Depending on the algorithm, some of the steps are not applicable/available.
    42 E.g. most of the feature extractors do not need a special training step, or some algorithms do not require a subspace projection.
    43 In these cases, the according steps are skipped.
    44 ``bob.bio`` takes care that always the correct files are forwarded to the subsequent steps.
    18 "Biometric verification" refers to the process of confirming that an invidual is who they say they are,
    19 based on their biometric data. This implies that we have access to both the person's biometric data and
    20 their identity (e.g., a numerical ID, name, etc.).
    21
    22 A biometric verification system has two stages:
    23
    24 1. **Enrollment:** A person's biometric data is added to the system's biometric database alongside the person's ID.
    25 2. **Verification:** A person's biometric data is compared to the biometric data with the same ID in the system database, and a match score is generated. The match score tells us how similar the two biometric samples are. Based on a match threshold, we then decide whether or not the two biometric samples come from the same person (ID).
    • This is also not true. A person's biometric data is compared to an enrolled model, independently if it has the same ID or not.

    • This is true for identification (1-N comparison), but not for verification (1-1 comparison). In a verification scenario, I would be trying to prove to the system that I am person A. So, I would provide my ID ("A") and my biometric sample. The system would then look for "A" in the database and check that my provided sample matches the one stored for "A". If they match, the system decides that I am indeed person A.

    • Please register or sign in to reply
  • Unfortunately, the term verification is not correctly used inside the updated documentation. I don't know, why you replaced the (more generic) term "Recognition" with "Verification". bob.bio.base can do both, identification and verification. It is simply a matter, how the score files are evaluated.

    It is true that the script is called verify.py. But this does not mean that it is limited to verification. We might call the script differently, if you prefer that. But do not use different terms for Recognition.

    I haven't found the time to read through the complete docs. I will do that as soon as I find the time. I am buried with work right now. I hope I can come up with some updates tomorrow.

  • @mguenther As I mentioned in my comments in issue #73 (closed), the reason I used the term "verification" is because our script is called verify.py. While this may not be a big deal and perhaps doesn't warrant changing the name, it is a little confusing for people to whom the distinction between identification and verification matters. I'd be happy to leave the term "recognition" if you honestly think it won't be confusing for people, but my changes were just suggested to try to be as consistent and true to biometrics terminology as possible. In fact, if I were to be really picky, I would suggest that the script shouldn't be called verify.py OR identify.py OR recognise.py, because none of these three things actually happens in the script. The script finishes with scoring, but scoring does not make the final decision on a person's identity - this happens in Evaluation. Hence why I'm keen to clarify ... let me know your thoughts.

  • The verify.py script is just running a toolchain - why not call it toolchain.py instead?

  • In that case, it should be called bio_toolchain.py or something like that, since this script runs a specific verification/recognition kind of toolchain, not a generic one.

  • added 1 commit

    • 85241e73 - Changed all mention of "verification" to the more general term "recognition"

    Compare with previous version

  • The manual looks awesome and I like very much the diagrams - they are helpful. Here are some small comments.

    1. The main index.rst file has a few typos:
      • "A databases..."
      • "tolls" instead of "tools"
      • "If you are interested, please continue reading:" - I wonder what is the purpose of this sentence? ;-)
    2. Installation instructions
      • Mostly outdated - we should start with the "conda-based" installation then keep the "pip-based" one. Remove zc.buildout as that is only for development
      • What are "some default biometric recognition databases"? Maybe the word "default" or this sentence would need revision
      • Instead of "original" data, I'd prefer the term "raw" data as to be compatible with the bob.db.base guide
      • If the installation is done with "conda" or "pip" the directory "./bin/" will not exist - please remove all references to this
      • For generating the documentation, I think we should remove the "./bin/sphinx-build" stuff from the installation instructions. That should only be used in dev mode.
    3. "Running biometric verification experiments"
      • We need to clarify "verification" versus "recognition" here - I kind of agree with @mguenther on this
      • Overall, I'm a bit concerning if the growing number of command-line options one need to master to conduct an experiment (in a flexible way). Annotated configuration files can be used as resources as well and would allow the complete specification of a pipeline w/o command-line options. I think that, in this documentation, we should default to this and just say all would be possible with command-lines as well.

    That is it for the time being.

  • @vkrivokuca

    1. Changing the name of the script to bio_toolchain.py would be fine with me. Indeed, bob.bio can do both verification and identification (and even classification as required by some databases' protocols such as LFW or YouTube), and how to evaluate is decided after the experiment is run.

    2. I can see your point that you think feature projection is part of the feature extraction process. Your intuition might be correct, but from the implementation details, it is rather bound to the algorithm. In fact, feature projection is just a way to speed up processing. For example, in LDA projection, you could always project the features during enrollment and during scoring, which would take some time as you handle the same probe files several times. The same applies for ISV, where we use the projection step to pre-compute some information, which need to be computed only once per (probe) sample. Hence, performing the projection before enrollment or scoring just saves time.

    3. Now, ISV can be applied to any type of (two-dimensional) features. We use, for example, DCT-block features for faces, and MFCC coefficients for speaker recognition. Implementing the ISV-specific projection into the feature extraction would not make sense, since we would need to implement the same technique several times. Also, the DCT-blocks and MFCC features might be used by other algorithms, which might require a different kind of projection, if any. Hence, feature projection has been designed to be part of the Algorithm, not of the Extractor. Only the Algorithm knows, which kind of projection it needs.

    4. I can also see your confusion for the Eigenfaces extractor. Indeed, in there the "projection" step is part of the extraction. This is indeed a little bit messed up. The two main purposes of the existence of the Eigenfaces extractor are: a) that people can see, how to train an extractor (none of our other extractors is trainable), and b) because eigenfaces are a default feature reported in literature, which might be used in other Algorithms, which in turn even might define their own way of feature projection, i.e., projecting the eigenface features further into their own projection space. In fact, other feature extractors also perform some kind of projection, for example, the DCT-blocks extractor projects face images into DCT components (ehm, is the discrete cosine transform a projection???). The only difference there is that the DCT-block projection is fixed and is not learned from data. However, we should stick to the term projection only in combination with the Algorithm, i.e., in order not to confuse readers more then necessary.

    5. I still have not found the time to read the full documentation. I just had a look at the new figures that you added. Although it is a bit late now (and I am sorry that I didn't remember that earlier), I wanted to let you know that I have generated a different graphic that explains the complete biometric recognition toolchain. You can find it in figure 14 (page 50) in http://publications.idiap.ch/downloads/reports/2013/Gunther_Idiap-RR-13-2017.pdf I also have the figure as plain PDF or PNG if you want to integrate this.

    In response to @andre.anjos:

    1. The sentence "If you are interested, please continue reading:" is left over from me. Feel free to remove it.
    2. Since installation is now done via pip in most cases, we should get rid of the zc.buildout section. Maybe, move the example of adding more packages of the bob.bio framework into the pip section. I also agree to remove the sphinx-build stuff. You are already reading the documentation, so why would you want to re-generate it. I guess/hope that this is detailed in the installation guide of Bob, anyways.
    3. Identification and verification (as well as classification) are types of recognition. How to evaluate is decided only during calling the evaluate.py script, which currently supports both identification (Recognition rate and CMC for closed set, and DIR for open set identification) as well as verification (ROC, DET, EER, HTER, EPC). For classification (e.g. for LFW), where mean and standard deviation of classification accuracy should be computed, there is no simple script at the moment.

    I also agree that we should make the configuration file for an experiment to be the default way of running an experiment. Please make sure to document that command line options override any option in the configuration file (I am not sure, what about the verbose option, i.e., if it would reset the verbosity level or if it would *add more verbosity).

  • @andre.anjos Thank you for your feedback. Just to clarify, I only worked on the part from "Running Biometric Recognition Experiments" to "Running Experiments (part I)" (but not including the running experiments part). I think the remainder of bob.bio.base's documentation could also be enhanced so that it is a bit clearer, but I did not have time to tackle this during the Hackathon.

  • @mguenther Thank you for your feedback and for the detailed explanation of your thinking behind the projection bit. As for the diagrams, I had a look at your one and I think it's very nice. However, I wanted to relate the diagrams of the flow of a biometric recognition experiment in bob.bio.base to my diagram of a typical biometric recognition system, so I wanted to show how the 4 stages (Preprocessing, Feature extraction, Matching, and Decision making) are set up in bob.bio.base. I also wanted to show where the outputs are stored, hence the folder/directory pictures. Taking this into account, I would like to leave my diagrams in there (not least because they took a long time to draw!), but I'm open to feedback from you (and others) after you have fully read my section (from "Running Biometric Recognition Experiments" to "Running Experiments (part I)", but excluding the running experiments part). We could always incorporate your diagram in at a later stage. For example, we discussed @onikisins @akomaty the need to explain in more detail (+ diagrammatically) how the different classes are linked/called when we run verify.py - it could be useful to have your diagram, or something similar, somewhere in this section?

  • Guys, @vkrivokuca's contribution is extremely important here. Thank you @vkrivokuca for making this effort.

    A few points:

    • Installations instructions are based on conda now: !72 (merged) You don't see it here because this branch is based on an older version of master. If you still see problems with it, open another issue and discuss it separately.
    • The goal of this merge request is to talk about biometric experiments without explaining what happens in the code which I think is quite good so far. This is not meant to refactor or rename things in the bob.bio framework. If you really believe something should be changed, then you should implement it as well and make sure everything works.
      • once we have high-level diagrams and explanations, this can help us create robust toolchains in the BEAT platform in future.
    • @vkrivokuca I think you came down from your high-level and conceptual explanation to implementation details of bob.bio.base a little early. You could have explained things a bit more without talking about extractors and algorithms. I'll discuss this with you when I get back in the afternoon. For example, you could have just skipped talking about the projection step since that's just implementation details.
    • @vkrivokuca the diagrams are really good but I would rather not see extractor or algorithm boxes in there rather calling them just feature extraction and matcher.
    • No mention of buildout or buildout related things must exist in docs. This is explained here: bob#241 (closed) @akomaty took care of this in all other documentations.
  • I have read through the documentation, and applied some small fixes. First of all, @vkrivokuca thank you for your work. The time you spent is highly appreciated.

    I agree with @amohammadi that the installation instructions should be updated. However, on another point I disagree with @amohammadi. He said the you should talk about feature extraction and matcher, while I think, we should not use different words for the same things. Feature Extraction might be OK, while I would prefer Extractor as this is the terminology used throughout the documentation. Anyways, the matcher would be misleading here. Particularly, @vkrivokuca uses the Algorithm for projection, enrollment and scoring, while the "matcher" (from Fig. 1) only performs the scoring part.

    I have some more things to notice for the installation section:

    1. In the Installation/Databases section, the link to the verification database simply links to the Package list of Bob. I don't think that this link is good. I had a different link to a list of biometric databases in the documentation of bob.db.verification.utils (http://pythonhosted.org/bob.db.verification.utils/implemented.html). However, it seems that this is gone now. Maybe, we can update this list and move it to the bob.bio.base documentation, i.e., as a separate .rst file?
    2. In the same section, there is still a ./bin/databases.py command, which should be updated. What about calling this script bio_databases.py (inside the setup.py should be sufficient), i.e., to be distinguishable from other scripts with the same name (for example, pad_databases.py, if existent).
    3. In the Note below, there is a link to a file (bob/bio/base/test/utils.py), which links to a local file (that needs to be edited), which is only valid when installed via buildout. I am not sure, how to link these files, when installed via pip. I think, we should completely remove this note.
    4. As @andre.anjos mentioned, the section on "Generate this documentation" can also be removed.

    For the experiment section, I think there are some points to be changed:

    1. In the first paragraph, you mix up identification and verification. In point Enrollment you talk about a model database (identification), while in Recognition you talk about a decision threshold, which is verification. I have added a few phrases there, which clarify that.
    2. In figure 1, you talk about the Template Database. Theoretically, this is the correct term, but in the bob.bio packages, we call the enrolled templates models throughout. Maybe you should change the Template Database to Model Database. However, as mentioned in the point above, this is only valid for identification. An even better solution would be to simply call it Model, which works for both identification and verification.
    3. In the dame figure 1, you have a "Decision Maker", which is -- once again -- only valid for verification. Maybe, in figure 1 you should stick to one of the two, either verification or identification. As the adaptation to verification is simpler (just remove the Template Database with Enrolled Model), I would suggest to do verification in figure 1. This would also correspond better to the description below figure 1.
    4. Below that, when you talk about the Template Database, you refer to the enrolled model as "extracted feature set" or "template". While template is correct (and you can leave the term here), the enrolled model is not simply an "extracted feature set", but can be much more complicated. Hence, you should call it a "model" instead (which is also the terminology that the reader will find later on).
    5. Figure 4 is misleading, as the algorithm is not linear (for example, scoring is not only based on the models). Instead of having an overview of the Algorithm, just have the (optional) projection step in Figure 4. Figures 5 and 6 better explain the enrollment and scoring steps, and they are not misleading.
    6. In the "Feature Extraction" subsection, I have removed the sentence "For example, only a few points in a person's face (e.g., eyes, nose, mouth, chin) are actually used for recognition purposes." which is not correct. In most cases, features are extracted from the full (preprocessed) face image, not only from eyes and nose.
    7. In "Decision Making" and in Figure 7, you only talk about measures and curves for verification. However, there are more ways to evaluate, for example Recognition Rates (RR), Cumulative Match Characteristics (CMC) for closed-set identification, as well as Detection & Identification Rate (DIR) plots for open-set identification. It would be great to incorporate these, too. Otherwise, people might think that bob.bio would be limited to verification only.
  • added 1 commit

    • 2404614e - Added some phrases to separate identification and verification

    Compare with previous version

  • Now, reading further in the documenation, I think we should have @vkrivokuca's explanations in a separate .rst file, not on top of how to run an experiment. This page is IMHO getting too long. What about a page explaining "The Structure of a Biometric Recognition System"?

  • Amir MOHAMMADI added 16 commits

    added 16 commits

    • 2404614e...e396e970 - 13 commits from branch master
    • 188f4b88 - Modified experiments.rst to add overview on biometric verification + illustratio…
    • 7b885940 - Changed all mention of "verification" to the more general term "recognition"
    • f0289d57 - Added some phrases to separate identification and verification

    Compare with previous version

  • Hi I rebased this branch to master since the installation instructions were changed in master and now is based on conda. It's because even though I kept saying installation instructions are changed !72 (merged) and already merged in master, you guys kept commenting here on old installation instructions which btw is not the goal of this merge request :)

  • Oh, I wasn't sure. I though you were talking about the installation instructions in the README.rst. Now I see.

    However, the points 3 and 4 in my comments about the installation instructions are still unsolved. Shall I open a new issue for these, or will we solve them in this PR?

  • Please open a new issue and preferably with a new merge request addressing them :)

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading