Commit 8f5cf008 authored by Tiago de Freitas Pereira's avatar Tiago de Freitas Pereira

Merge branch 'new-doc' into 'master'

new documentation incl. vanilla-pad

See merge request !79
parents 10b60056 2644fe23
Pipeline #48073 passed with stages
in 11 minutes and 4 seconds
This diff is collapsed.
.. vim: set fileencoding=utf-8 :
.. @author: Manuel Guenther <manuel.guenther@idiap.ch>
.. author: Pavel Korshunov <pavel.korshunov@idiap.ch>
.. date: Wed Apr 27 14:58:21 CEST 2016
====================================
User's Guide for PAD File List API
====================================
The low-level Database Interface
--------------------------------
The :py:class:`bob.pad.base.database.FileListPadDatabase` complies with the standard PAD database as described in :ref:`bob.pad.base`.
All functions defined in that interface are properly instantiated, as soon as the user provides the required file lists.
Creating File Lists
-------------------
The initial step for using this package is to provide file lists specifying the ``'train'`` (training), ``'dev'`` (development) and ``'eval'`` (evaluation) sets to be used by the PAD algorithm.
The summarized complete structure of the list base directory (here denoted as ``basedir``) containing all the files should be like this::
basedir -- train -- for_real.lst
| |-- for_attack.lst
|
|-- dev -- for_real.lst
| |-- for_attack.lst
|
|-- eval -- for_real.lst
|-- for_attack.lst
The file lists should contain the following information for PAD experiments to run properly:
* ``filename``: The name of the data file, **relative** to the common root of all data files, and **without** file name extension.
* ``client_id``: The name or ID of the subject the biometric traces of which are contained in the data file.
These names are handled as :py:class:`str` objects, so ``001`` is different from ``1``.
* ``attack_type``: This is not contained in `for_real.lst` files, only in `for_attack.lst` files.
The type of attack (:py:class:`str` object).
The following list files need to be created:
- **For real**:
* *real file*, with default name ``for_real.lst``, in the default sub-directories ``train``, ``dev`` and ``eval``, respectively.
It is a 2-column file with format:
.. code-block:: text
filename client_id
* *attack file*, with default name ``for_attack.lst``, in the default sub-directories ``train``, ``dev`` and ``eval``, respectively.
It is a 3-column file with format:
.. code-block:: text
filename client_id attack_type
.. note:: If the database does not provide an evaluation set, the ``eval`` files can be omitted.
Protocols and File Lists
------------------------
When you instantiate a database, you have to specify the base directory that contains the file lists.
If you have only a single protocol, you could specify the full path to the file lists described above as follows:
.. code-block:: python
>>> db = bob.pad.base.database.FileListPadDatabase('basedir/protocol')
Next, you should query the data, WITHOUT specifying any protocol:
.. code-block:: python
>>> db.objects()
Alternatively, if you have more protocols, you could do the following:
.. code-block:: python
>>> db = bob.pad.base.database.FileListPadDatabase('basedir')
>>> db.objects(protocol='protocol')
When a protocol is specified, it is appended to the base directory that contains the file lists.
This allows to use several protocols that are stored in the same base directory, without the need to instantiate a new database.
For instance, given two protocols 'P1' and 'P2' (with filelists contained in 'basedir/P1' and 'basedir/P2', respectively), the following would work:
.. code-block:: python
>>> db = bob.pad.base.database.FileListPadDatabase('basedir')
>>> db.objects(protocol='P1') # Get the objects for the protocol P1
>>> db.objects(protocol='P2') # Get the objects for the protocol P2
The high-level Database Interface
---------------------------------
the low-level FileList database interface is extended, so that filelist databases can be used to run both types:
vulnerability analysis experiments using :ref:`bob.bio.base <bob.bio.base>` verification framework
and PAD experiments using ``bob.pad.base`` framework.
For instance, provided the lists of files for database ``example_db`` in the correct format are located
inside ``lists`` directory (i.e., inside ``lists/example_db``), the PAD and verification versions of the
database can be created as following:
.. code-block:: python
>>> from bob.pad.base.database import HighBioDatabase, HighPadDatabase
>>> pad_db = HighPadDatabase(db_name='example_db')
>>> bio_db = HighBioDatabase(db_name='example_db')
This diff is collapsed.
This diff is collapsed.
......@@ -22,7 +22,7 @@ For any of these parts, several different types are implemented in the ``bob.pad
combination of the five parts can be executed.
For each type, several meta-parameters can be tested.
This results in a nearly infinite amount of possible experiments that can be run using the current setup.
But it is also possible to use your own database, preprocessor, feature extractor, or PAD algorithm and test this against the baseline algorithms implemented in the our packages.
But it is also possible to use your own database, preprocessor, feature extractor, or PAD algorithm and test this against the baseline algorithms implemented in our packages.
.. note::
The ``bob.pad.*`` packages are derived from the `bob.bio.* <http://pypi.python.org/pypi/bob.bio.base>`__, packages that are designed for biometric recognition experiments.
......@@ -47,10 +47,9 @@ Users Guide
:maxdepth: 2
installation
experiments
implementation
high_level_db_interface_guide
filedb_guide
pad_intro
vanilla_pad_intro
vanilla_pad_features
Reference Manual
......
......@@ -5,15 +5,16 @@
.. _bob.pad.base.installation:
===========================
Installation Instructions
===========================
As noted before, this package is part of the ``bob.pad`` packages, which in
turn are part of the signal-processing and machine learning toolbox Bob_. To
turn are part of the signal processing and machine learning toolbox Bob_. To
install Bob_, please read the `Installation Instructions <bobinstall_>`_.
Then, to install the ``bob.pad`` packages and in turn maybe the database
Then, to install the ``bob.pad`` packages and in turn, maybe the database
packages that you want to use, use conda_ to install them:
.. code-block:: sh
......@@ -55,23 +56,19 @@ Databases
With ``bob.pad`` you will run biometric recognition experiments using databases that contain presentation attacks.
Though the PAD protocols are implemented in ``bob.pad``, the original data are **not included**.
To download the original data of the databases, please refer to the according Web-pages.
To download the original data of the databases, please refer to the corresponding Web-pages.
For a list of supported databases including their download URLs,
please refer to the `spoofing_databases <https://gitlab.idiap.ch/bob/bob/wikis/Packages>`_.
After downloading the original data for the databases, you will need to tell ``bob.pad``, where these databases can be found.
For this purpose, we have decided to implement a special file, where you can set your directories.
Similar to ``bob.bio.base``, by default, this file is located in ``~/.bob_bio_databases.txt``, and it contains several lines, each line looking somewhat like:
.. code-block:: text
For this purpose, a command exist to define your directories:
[DEFAULT_DATABASE_DIRECTORY] = /path/to/your/directory
.. code-block:: sh
.. note::
If this file does not exist, feel free to create and populate it yourself.
$ bob config set bob.db.<dbname> /path/to/the/db/data/folder
Please use ``./bin/databases.py`` for a list of known databases, where you can see the raw ``[YOUR_DATABASE_PATH]`` entries for all databases that you haven't updated, and the corrected paths for those you have.
Please use ``resources.py -t database`` for a list of known databases, where you can see the default entries for all databases that you haven't updated, and the corrected paths for those you have.
.. note::
......
.. vim: set fileencoding=utf-8 :
.. author: Yannick Dayer <yannick.dayer@idiap.ch>
.. date: 2020-11-27 15:14:11 +01
.. _bob.pad.base.pad_intro:
=============================================
Introduction to presentation attack detection
=============================================
Presentation Attack Detection, or PAD, is a branch of biometrics aiming at detecting an attempt to dupe a biometric recognition system by modifying the sample presented to the sensor.
The goal of PAD is to develop countermeasures to presentation attacks that can detect whether a biometric sample is a `bonafide` sample or a presentation attack.
For an introduction to biometrics, take a look at the :ref:`documentation of bob.bio.base <bob.bio.base.biometrics_intro>`.
The following introduction to PAD is inspired by chapters 2.4 and 2.6 of [mohammadi2020trustworthy]_, and from [marcel2014handbook]_.
Presentation attack
===================
Biometric recognition systems contain different points of attack. Attacks on certain points are either called direct or indirect attacks.
An indirect attack would consist of modifying data after the capture, in any of the steps between the capture and the decision stages. To prevent such attacks is relevant to classical cybersecurity, hardware protection, and data protection.
Presentation Attacks (PA), on the other hand, are the only direct attacks that can be performed on a biometric system, and countering those attacks is relevant to biometrics.
For a face recognition system, for example, one of the possible presentation attacks would be to wear a mask resembling another individual so that the system identifies the attacker as that other person.
New PAI (Presentation Attack Instrument) can be developed to counteract the countermeasures put in place in the first place, so the field is in constant evolution, to adapt to new threats and try to anticipate them.
Presentation attack detection
=============================
A PAD system works much like a biometric recognition system, but with the expected ability to identify and reject a sample if it is detected as an attack.
This means that multiple cases are possible and should be detected by a biometric system with PAD:
- A Registered subject presents itself, the captured sample is called **Genuine** sample, and should be accepted by the system (positive),
- An Attacker presents itself without trying to pass for another subject, the sample is categorized as **ZEI** (**Zero Effort Impostor**) sample, and should be rejected by the system (negative),
- And the special case in PAD versus "standard" biometric systems: an Attacker uses a `Presentation Attack Instrument` (`PAI`) to pass as a genuine subject. This is a **PA** (**Presentation Attack**) sample, and should be rejected (negative).
The term 'bonafide' is used for biometric samples presented without intention to change their identity (Genuine samples and ZEI samples).
.. figure:: img/pad-classes.png
:figwidth: 75%
:align: center
:alt: Four different samples organized to display the different classes of PAD.
Categorization of samples in terms of biometric recognition and PAD systems.
A PAD system makes the distinction between the left samples (`bonafide`, positives) and the right samples (`presentation attack`, negatives).
In a biometric recognition system, genuine samples are the positives, and both types of impostors are the negatives.
Typical implementations of PAD
------------------------------
PAD for face recognition is the most advanced in this field, face PAD systems can be categorized in several ways:
- **Frame-based vs Video-based**: Some PAD systems classify a sample based on one image, searching for inconsistencies of resolution or lighting, and others base themselves on temporal cues like small movements or blinking.
- **Type of light**: Some PAD systems work on visible light, using samples captured by a standard camera. A more advanced system would require a specific sensor to capture, for example, infrared light.
- **User interaction**: Another way of asserting the authenticity of a sample is to request the presented user to respond to a challenge, like smiling or blinking at a specific moment.
PAD systems using a frame-based approach on visible light with no user interaction are the least robust but are more developed, as they can be easily integrated with existing biometric systems.
Evaluation of PAD systems
=========================
To evaluate a biometric system with PAD, a set of samples is fed to the system. Each sample is scored, and a post-processing step is used to analyze those scores.
Licit scenario
--------------
When no PA samples are in the input set (only Genuine and ZEI samples), the situation is the same as a simple biometric experiment and is called a `licit` scenario. See :ref:`biometric introduction<bob.bio.base.biometrics_intro>`.
Spoof scenario
--------------
If no ZEI samples are present in the set (only Genuine and PA samples), the evaluation of a PAD system is seen as a two classes problem, and the same metrics as in a biometric evaluation can be used to assess its performance, where:
- the False Positive Rate is called IAPMR (Impostor Attack Presentation Match Rate),
- the False Negative Rate is called FNMR (False Non-Match Rate),
The ROC and DET can be plotted to represent the performance of the system over a range of operation points.
This two-classes case is referred to as the `spoof` scenario.
PAD evaluation
--------------
When a mix of Zero Effort Impostor and PA are present in the input set, two possibilities arise.
The bonafide (Genuine and ZEI) samples are treated as `positives` and PA samples are considered `negatives` (This will show the ability of the system to detect PA).
The problem becomes binary, allowing the use of similar metrics as before, albeit with different denominations:
- the False Positive Rate is named APCER (Attack Presentation Classification Error Rate),
- the False Negative Rate is named BPCER (Bonafide Presentation Classification Error Rate),
- the Half Total Error Rate is named ACER (Average Classification Error Rate).
The ZEI and PA samples can also be considered two separate negative classes, leading to a ternary classification with one positive class (genuine samples) and two distinct negative classes: ZEI and PA.
The EPS (Expected Performance and Spoofability) framework was introduced to assess the reliability of a biometric system with PAD by defining two parameters determining how much importance is given to each class of samples:
- ω represents the importance of the PA scores against the ZEI scores.
- β represents the importance of the negative classes (PA and ZEI scores) relative to the positive class (Genuine).
From the scores and those two parameters, the following metrics can be measured:
- The weighted error rate for the two negative classes (IAPMR for the PA scores and FMR for the ZEI scores):
:math:`\text{FAR}_\omega = \omega \cdot \text{IAPMR} + (1-\omega) \cdot \text{FMR}`
- The weighted error rate between the previously defined :math:`\text{FAR}_\omega` and the :math:`\text{FNMR}` (between Genuine and both negatives), computed as:
:math:`\text{WER}_{\omega.\beta} = \beta \cdot \text{FAR}_\omega + (1-\beta) \cdot \text{FNMR}`
ω and β are chosen by minimizing :math:`\text{WER}_{\omega,\beta}` on the `development set` scores. Then by using those values, the :math:`\text{WER}_{\omega,\beta}` is computed on the `evaluation set` scores.
.. note:: :math:`\text{HTER}_\omega` is also defined when :math:`\beta = 0.5` : :math:`\text{HTER}_\omega = {\text{FAR}_\omega + \text{FNMR} \over 2}`
The EPSC curve can be plotted to assess the performance of the system on various ω, by fixing β. It plots the error rate :math:`\text{WER}_{\omega,\beta}` against the weight ω.
The EPSC can also be in 3D if β is not fixed, showing the :math:`\text{WER}_{\omega,\beta}` against both weights ω and β.
References
==========
.. [mohammadi2020trustworthy] * Mohammadi Amir **Trustworthy Face Recognition: Improving Generalization of Deep Face Presentation Attack Detection**, 2020, EPFL
.. [marcel2014handbook] * Marcel, Sébastien and Nixon, Mark S and Li, Stan Z **Handbook of biometric anti-spoofing**, 2014, Springer
.. include:: links.rst
.. vim: set fileencoding=utf-8 :
.. author: Yannick Dayer <yannick.dayer@idiap.ch>
.. date: 2020-11-27 15:26:09 +01
.. _bob.pad.base.vanilla_pad_features:
======================
Vanilla PAD features
======================
Most of the available features are equivalent to the ones defined in :ref:`bob.bio.base.vanilla_biometrics_advanced_features`.
However, there are some variations, and those are presented below.
Database interface
==================
The database interface definition follows closely the one in :ref:`bob.bio.base.database_interface`. However, the naming of the methods to retrieve data is different:
- :py:meth:`database.fit_samples` returns the samples (or delayed samples) used to train the classifier;
- :py:meth:`database.predict_samples` returns the samples that will be used for evaluating the system. This is where the group (`dev` or `eval`) is specified.
A difference with the bob.bio.base database interface is the presence of an ``attack_type`` annotation. It stores the type of PAI to allow the scoring of each different type of attack separately.
File list interface
-------------------
A class with those methods returning the corresponding data can be implemented for each dataset, but an easier way to do it is with the `file list` interface.
This allows the creation of multiple protocols and various groups by editing some CSV files.
The dataset configuration file will then be as simple as:
.. code-block:: python
from bob.pad.base.database import CSVPADDataset
database = CSVPADDataset("path/to/my_dataset", "my_protocol")
And the command to run an experiment with that configuration on the `svm-frames` pipeline::
$ bob pad vanilla-pad -d my_db_config_file.py svm-frames -o output_dir
The files must follow the following structure and naming:
.. code-block:: text
my_dataset
|
+-- my_protocol
|
+-- train
| |
| +-- for_real.csv
| +-- for_attack.csv
|
+-- dev
| |
| +-- for_real.csv
| +-- for_attack.csv
|
+-- eval
|
+-- for_real.csv
+-- for_attack.csv
The content of the files in the ``train`` folder is used when a protocol contains data for training the classifier.
The files in the ``eval`` folder are optional and are used in case a protocol contains data for evaluation.
These CSV files should contain at least the path to raw data and an identifier to the identity of the subject in the image (subject field).
The structure of each CSV file should be as below:
.. code-block:: text
PATH,SUBJECT
path_1,subject_1
path_2,subject_2
path_i,subject_j
...
Metadata can be shipped within the Samples (e.g gender, age, annotations, ...) by adding a column in the CSV file for each metadata:
.. code-block:: text
PATH,SUBJECT,TYPE_OF_ATTACK,GENDER,AGE
path_1,subject_1,A,B,C
path_2,subject_2,A,B,1
path_i,subject_j,2,3,4
...
Checkpoints and Dask
====================
In the same way as in :ref:`bob.bio.base <bob.bio.base.vanilla_biometrics_advanced_features>`, it is possible to activate the checkpointing of experiments by passing the ``-c`` (``--checkpoint``) option in the command line.
The Dask integration can also be used by giving a client configuration to the ``-l`` (``--dask-client``) argument.
Basic Idiap SGE configurations are defined by bob.pipelines: ``sge`` and ``sge-gpu``::
$ bob pad vanilla-pad replay-attack svm-frames -o output_dir -l sge -c
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment