Commit db885020 authored by Yannick DAYER's avatar Yannick DAYER
Browse files

[doc] Removing of older files

parent 748a9676
Pipeline #47078 failed with stage
in 7 minutes and 38 seconds
.. vim: set fileencoding=utf-8 :
.. author: Manuel Günther <>
.. author: Pavel Korshunov <>
.. date: Wed Apr 27 14:58:21 CEST 2016
.. _bob.pad.base.experiments:
Running Presentation Attack Detection Experiments
Now, you are almost ready to run presentation attack detection (PAD) experiment.
Structure of a PAD Experiment
Each biometric recognition experiment that is run with ``bob.pad`` is divided into the following several steps:
1. Data preprocessing: Raw data is preprocessed, e.g., for speech, voice activity is detected.
2. Feature extraction: Features are extracted from the preprocessed data.
3. Feature projector training: Models of genuine data and attacks are learnt.
4. Feature projection: The extracted features are projected into corresponding subspaces.
5. Scoring: The spoofing scores for genuine data and attacks are computed.
6. Evaluation: The computed scores are evaluated and curves are plotted.
These 6 steps are divided into four distinct groups:
* Preprocessing (step 1)
* Feature extraction (step 2)
* Attack detection (steps 3 to 5)
* Evaluation (step 6)
The communication between two steps is file-based, usually using a binary HDF5_ interface, which is implemented in the :py:class:`` class.
The output of one step usually serves as the input of the subsequent step(s).
Depending on the algorithm, some of the steps are not applicable/available.
``bob.pad`` takes care that always the correct files are forwarded to the subsequent steps.
.. _running_part_1:
Running Experiments
To run an experiment, we provide a generic script ``./bin/``.
To get a complete list of command line options, please run:
.. code-block:: sh
$ ./bin/ --help
.. note::
Sometimes, command line options have a long version starting with ``--`` and a short one starting with a single ``-``.
In this section, only the long names of the arguments are listed, please refer to ``./bin/ --help``.
There are five command line options, which are required and sufficient to define the complete biometric recognition experiment.
These five options are:
* ``--database``: The database to run the experiments on
* ``--preprocessor``: The data preprocessor
* ``--extractor``: The feature extractor
* ``--algorithm``: The presentation attack detection algorithm
* ``--sub-directory``: A descriptive name for your experiment, which will serve as a sub-directory
The first four parameters, i.e., the ``database``, the ``preprocessor``, the ``extractor`` and the ``algorithm`` can be specified in several different ways.
For the start, we will use only the registered Resources.
These resources define the source code that will be used to compute the experiments, as well as all the meta-parameters of the algorithms (which we will call the *configuration*).
To get a list of registered resources, please call:
.. code-block:: sh
$ ./bin/
Each package in ``bob.pad`` defines its own resources, and the printed list of registered resources differs according to the installed packages.
If only ``bob.pad.base`` is installed, no databases and no preprocessors will be listed.
.. note::
You will also find some ``grid`` resources being listed.
These type of resources will be explained :ref:`later <running_in_parallel>`.
One command line option, which is not required, but recommended, is the ``--verbose`` option.
By default, the algorithms are set up to execute quietly, and only errors are reported.
To change this behavior, you can use the ``--verbose`` option several times to increase the verbosity level to show:
1) Warning messages
2) Informative messages
3) Debug messages
When running experiments, it is a good idea to set verbose level 2, which can be enabled by using the short version: ``-vv``.
So, a typical PAD experiment (in this case, attacks detection in speech) would look like the following:
.. code-block:: sh
$ ./bin/ --database <database-name> --preprocessor <preprocessor> --extractor <extractor> --algorithm <algorithm> --sub-directory <folder_name> -vv
Before running an experiment, it is recommended to add the ``--dry-run`` option, so that it will only print, which steps would be executed, without actually executing them, and make sure that everything works as expected.
The final result of the experiment will be one (or more) score file(s).
Usually, they will be called something like ``scores-dev-real`` for genuine data, ``scores-dev-attack`` for attacks, and ``scores-dev`` for the results combined in one file.
By default, you can find them in a sub-directory the ``result`` directory, but you can change this option using the ``--result-directory`` command line option.
.. note::
At Idiap_, the default result directory differs, see ``./bin/ --help`` for your directory.
.. _bob.pad.base.evaluate:
Evaluating Experiments
After the experiment has finished successfully, one or more text file containing
all the scores are written. In this section, commands that helps to quickly
evaluate a set of scores by generating metrics or plots are presented here.
The scripts take as input either a 4-column or 5-column data format as specified
in the documentation of :py:func:`` or
Two sets of commands, ``bob pad`` and ``bob vuln`` are available for
Presentation Attack Detection and
Vulnerability analysis, respectively.
Several PAD metrics based on a selected thresholds (bpcer20: when APCER is set to 5%,
eer, when BPCER == APCER and min-hter, when HTER is minimum) on the development
set and apply them on evaluation sets (if provided) are generated used
``metrics`` command. The reported `standard metrics`_ are:
* APCER: Attack Presentation Classification Error Rate
* BPCER: Bona-fide Presentation Classification Error Rate
* HTER (non-ISO): Half Total Error Rate ((BPCER+APCER)/2)
For example:
.. code-block:: sh
$ bob pad metrics -e scores-{dev,eval} --legends ExpA
Threshold of 11.639561 selected with the bpcer20 criteria
====== ======================== ===================
ExpA Development scores-dev Eval. scores-eval
====== ======================== ===================
APCER 5.0% 5.0%
BPCER 100.0% 100.0%
ACER 52.5% 52.5%
====== ======================== ===================
Threshold of 3.969103 selected with the eer criteria
====== ======================== ===================
ExpA Development scores-dev Eval. scores-eval
====== ======================== ===================
APCER 100.0% 100.0%
BPCER 100.0% 100.0%
ACER 100.0% 100.0%
====== ======================== ===================
Threshold of -0.870550 selected with the min-hter criteria
====== ======================== ===================
ExpA Development scores-dev Eval. scores-eval
====== ======================== ===================
APCER 100.0% 100.0%
BPCER 19.5% 19.5%
ACER 59.7% 59.7%
====== ======================== ===================
.. note::
When evaluation scores are provided, the ``--eval`` option must be passed.
See metrics --help for further options.
Metrics for vulnerability analysis are also avaible trhough:
.. code-block:: sh
$ bob vuln metrics -e .../{licit,spoof}/scores-{dev,test}
========= ===================
None EER (threshold=4)
========= ===================
APCER (%) 100.0%
BPCER (%) 100.0%
ACER (%) 100.0%
IAPMR (%) 100.0%
========= ===================
Customizable plotting commands are available in the :py:mod:`bob.pad.base` module.
They take a list of development and/or evaluation files and generate a single PDF
file containing the plots.
Available plots for PAD are:
* ``hist`` (Bona fida and PA histograms along with threshold criterion)
* ``epc`` (expected performance curve)
* ``gen`` (Generate random scores)
* ``roc`` (receiver operating characteristic)
* ``det`` (detection error trade-off)
* ``evaluate`` (Summarize all the above commands in one call)
Available plots for PAD are:
* ``hist`` (Vulnerability analysis distributions)
* ``epc`` (expected performance curve)
* ``gen`` (Generate random scores)
* ``roc`` (receiver operating characteristic)
* ``det`` (detection error trade-off)
* ``epsc`` (expected performance spoofing curve)
* ``fmr_iapmr`` (Plot FMR vs IAPMR)
* ``evaluate`` (Summarize all the above commands in one call)
Use the ``--help`` option on the above-cited commands to find-out about more
For example, to generate a EPC curve from development and evaluation datasets:
.. code-block:: sh
$bob pad epc -e -o 'my_epc.pdf' scores-{dev,eval}
where `my_epc.pdf` will contain EPC curves for all the experiments.
Vulnerability commands require licit and spoof development and evaluation
datasets. Far example, to generate EPSC curve:
.. code-block:: sh
$bob vuln epsc -e .../{licit,spoof}/scores-{dev,eval}
.. note::
IAPMR curve can be plotted along with EPC and EPSC using option
``--iapmr``. 3D EPSC can be generated using the ``--three-d``. See metrics
--help for further options.
.. _running_in_parallel:
Running in Parallel
One important property of the ``./bin/`` script is that it can run in parallel, using either several threads on the local machine, or an SGE grid.
To achieve that, ``bob.pad`` is well-integrated with our SGE grid toolkit GridTK_, which we have selected as a python package in the :ref:`Installation <bob.pad.base.installation>` section.
The ``./bin/`` script can submit jobs either to the SGE grid, or to a local scheduler, keeping track of dependencies between the jobs.
The GridTK_ keeps a list of jobs in a local database, which by default is called ``submitted.sql3``, but which can be overwritten with the ``--gridtk-database-file`` option.
Please refer to the `GridTK documentation <>`_ for more details on how to use the Job Manager ``./bin/jman``.
Two different types of ``grid`` resources are defined, which can be used with the ``--grid`` command line option.
The first type of resources will submit jobs to an SGE grid.
They are mainly designed to run in the Idiap_ SGE grid and might need some adaptations to run on your grid.
The second type of resources will submit jobs to a local queue, which needs to be run by hand (e.g., using ``./bin/jman --local run-scheduler --parallel 4``), or by using the command line option ``--run-local-scheduler``.
The difference between the two types of resources is that the local submission usually starts with ``local-``, while the SGE resource does not.
Hence, to run the same experiment as above using four parallel threads on the local machine, re-nicing the jobs to level 10, simply call:
.. code-block:: sh
$ ./bin/ --database <database-name> --preprocessor <preprocessor> --extractor <extractor> --algorithm <algorithm> --sub-directory <folder_name> -vv --grid local-p4 --run-local-scheduler --nice 10
.. note::
You might realize that the second execution of the same experiment is much faster than the first one.
This is due to the fact that those parts of the experiment, which have been successfully executed before (i.e., the according files already exist), are skipped.
To override this behavior, i.e., to always regenerate all parts of the experiments, you can use the ``--force`` option.
Command Line Options to change Default Behavior
Additionally to the required command line arguments discussed above, there are several options to modify the behavior of the experiments.
One set of command line options change the directory structure of the output.
By default, intermediate (temporary) files are by default written to the ``temp`` directory, which can be overridden by the ``--temp-directory`` command line option, which expects relative or absolute paths:
Re-using Parts of Experiments
If you want to re-use parts previous experiments, you can specify the directories (which are relative to the ``--temp-directory``, but you can also specify absolute paths):
* ``--preprocessed-data-directory``
* ``--extracted-directory``
* ``--projected-directory``
or even trained projector, i.e., the results of the projector:
* ``--projector-file``
For that purpose, it is also useful to skip parts of the tool chain.
To do that you can use:
* ``--skip-preprocessing``
* ``--skip-extraction``
* ``--skip-projector-training``
* ``--skip-projection``
* ``--skip-score-computation``
although by default files that already exist are not re-created.
You can use the ``--force`` argument combined with the ``--skip...`` arguments (in which case the skip is preferred).
To run just a sub-selection of the tool chain, you can also use the ``--execute-only`` option, which takes a list of options out of: ``preprocessing``, ``extraction``, ``projector-training``, ``projection``, or ``score-computation``.
Database-dependent Arguments
Many databases define several protocols that can be executed.
To change the protocol, you can either modify the configuration file, or simply use the ``--protocol`` option.
Some databases define several kinds of evaluation setups.
For example, often two groups of data are defined, a so-called *development set* and an *evaluation set*.
The scores of the two groups will be concatenated into several files called **scores-dev** and **scores-eval**, which are located in the score directory (see above).
In this case, by default only the development set is employed.
To use both groups, just specify ``--groups dev eval`` (of course, you can also only use the ``'eval'`` set by calling ``--groups eval``).
.. include:: links.rst
.. _`standard metrics`:
.. vim: set fileencoding=utf-8 :
.. @author: Manuel Guenther <>
.. author: Pavel Korshunov <>
.. date: Wed Apr 27 14:58:21 CEST 2016
User's Guide for PAD File List API
The low-level Database Interface
The :py:class:`bob.pad.base.database.FileListPadDatabase` complies with the standard PAD database as described in :ref:`bob.pad.base`.
All functions defined in that interface are properly instantiated, as soon as the user provides the required file lists.
Creating File Lists
The initial step for using this package is to provide file lists specifying the ``'train'`` (training), ``'dev'`` (development) and ``'eval'`` (evaluation) sets to be used by the PAD algorithm.
The summarized complete structure of the list base directory (here denoted as ``basedir``) containing all the files should be like this::
basedir -- train -- for_real.lst
| |-- for_attack.lst
|-- dev -- for_real.lst
| |-- for_attack.lst
|-- eval -- for_real.lst
|-- for_attack.lst
The file lists should contain the following information for PAD experiments to run properly:
* ``filename``: The name of the data file, **relative** to the common root of all data files, and **without** file name extension.
* ``client_id``: The name or ID of the subject the biometric traces of which are contained in the data file.
These names are handled as :py:class:`str` objects, so ``001`` is different from ``1``.
* ``attack_type``: This is not contained in `for_real.lst` files, only in `for_attack.lst` files.
The type of attack (:py:class:`str` object).
The following list files need to be created:
- **For real**:
* *real file*, with default name ``for_real.lst``, in the default sub-directories ``train``, ``dev`` and ``eval``, respectively.
It is a 2-column file with format:
.. code-block:: text
filename client_id
* *attack file*, with default name ``for_attack.lst``, in the default sub-directories ``train``, ``dev`` and ``eval``, respectively.
It is a 3-column file with format:
.. code-block:: text
filename client_id attack_type
.. note:: If the database does not provide an evaluation set, the ``eval`` files can be omitted.
Protocols and File Lists
When you instantiate a database, you have to specify the base directory that contains the file lists.
If you have only a single protocol, you could specify the full path to the file lists described above as follows:
.. code-block:: python
>>> db = bob.pad.base.database.FileListPadDatabase('basedir/protocol')
Next, you should query the data, WITHOUT specifying any protocol:
.. code-block:: python
>>> db.objects()
Alternatively, if you have more protocols, you could do the following:
.. code-block:: python
>>> db = bob.pad.base.database.FileListPadDatabase('basedir')
>>> db.objects(protocol='protocol')
When a protocol is specified, it is appended to the base directory that contains the file lists.
This allows to use several protocols that are stored in the same base directory, without the need to instantiate a new database.
For instance, given two protocols 'P1' and 'P2' (with filelists contained in 'basedir/P1' and 'basedir/P2', respectively), the following would work:
.. code-block:: python
>>> db = bob.pad.base.database.FileListPadDatabase('basedir')
>>> db.objects(protocol='P1') # Get the objects for the protocol P1
>>> db.objects(protocol='P2') # Get the objects for the protocol P2
The high-level Database Interface
the low-level FileList database interface is extended, so that filelist databases can be used to run both types:
vulnerability analysis experiments using :ref:` <>` verification framework
and PAD experiments using ``bob.pad.base`` framework.
For instance, provided the lists of files for database ``example_db`` in the correct format are located
inside ``lists`` directory (i.e., inside ``lists/example_db``), the PAD and verification versions of the
database can be created as following:
.. code-block:: python
>>> from bob.pad.base.database import HighBioDatabase, HighPadDatabase
>>> pad_db = HighPadDatabase(db_name='example_db')
>>> bio_db = HighBioDatabase(db_name='example_db')
.. vim: set fileencoding=utf-8 :
.. @author: Olegs Nikisins <>
.. @date: May 2017
High Level Database Interface How-To Guide
The *high level database interface* (HLDI) is needed to run biometric experiments using non-filelist databases (e.g. if one wants to use SQL-based database package).
This tutorial explains how to create a *high level* database
interface, using as an example ``bob.pad.*`` framework (e.g.
``bob.pad.face``). The process is similar for ```` frameworks,
e.g. ````, ````). High level database interface
is a link between low level database interface/package (e.g. ``bob.db.replay``) and a
corresponding framework used to run biometric experiments (e.g.
``bob.pad.face``). Generally speaking, the low level interface has lot's
of querying options, which are not always used in the corresponding biometric
framework. High level interface only contains the functionality, which
is needed to run biometric experiments. This, must have functionality,
is defined in the corresponding base classes and is discussed next.
First thing you need to do is to create a ``*.py`` file containing
your high level implementation, for example:
``bob/pad/face/database/`` for the Replay database. This file
must be placed into corresponding biometric framework, which in this
case is ``bob.pad.face`` package. The file **must** contain the
implementation of two classes:
- ``<YourDatabaseName><Bio/Pad/Other>File``
- ``<YourDatabaseName><Bio/Pad/Other>Database``
For example, the names of the above classes for the *Replay* database used in
the ``bob.pad.face`` framework are: ``ReplayPadFile`` and
Implementation of the ``*File`` class
First of all, the ``*File`` class must inherit from the **base file
class** of the corresponding biometric framework. An example:
- ``*File`` class for the Replay database used in PAD (Presentation
Attack Detection) experiments: ``class ReplayPadFile(PadFile):``
- ``*File`` class for the Biowave V1 database used in verification
experiments: ``class BiowaveV1BioFile(BioFile):``
Base class defines the elements, which must be implemented in the derived
class. For example, the implementation of ``ReplayPadFile`` class must
set the following elements of the base class: ``client_id``, ``path``,
``attack_type`` and ``file_id``. The corresponding high level
implementation of the ``ReplayPadFile`` class might look as follows:
.. code:: python
from bob.pad.base.database import PadFile
class ReplayPadFile(PadFile):
def __init__(self, f):
self.__f = f # here ``f`` is an instance of the File class defined in the low level database interface
if f.is_real():
attack_type = None
attack_type = 'attack'
super(ReplayPadFile, self).__init__(client_id=f.client, path=f.path,
def load(self, directory=None, extension='.mov'):
path = self.f.make_path(directory=directory, extension=extension)
frame_selector = = 'all')
video_data = frame_selector(path)
bbx_data = one_file.bbx(directory=directory)
return_dictionary = {}
return_dictionary["data"] = video_data
return_dictionary["annotations"] = bbx_data
return return_dictionary
Please, note, that in our case the ``ReplayPadFile`` also has a
``load()`` method. *Note: the load() method of the high level
``*File`` class is used by the preprocessor (a very first block in every
biometric pipeline) to read the data from the database.* Not all high
level database interfaces require this method, but let's try to
understand why ``ReplayPadFile`` class has it. The necessity to have
this method comes from the fact, that Replay database contains **video**
files, not images. To understand why ``load()`` method is needed in the
case of video-based database we need to take a look at the inheritance
structure of the class. For the ``ReplayPadFile`` class it looks as
- ``ReplayPadFile`` -> ``bob.pad.base.database.PadFile`` ->
```` -> ``bob.db.base.File``
Here the notation ``A`` -> ``B`` means ``A`` inherits from ``B``. Well,
the inheritance is pretty deep, but no need to worry about this. The
class of interest for us is ``bob.db.base.File`` containing the default
file managing methods, which might be overridden if necessary. One of
methods is ``load()`` **not** supporting video files by default. Since a
different behavior is desired, we need to override it in the high level
implementation of the ``*File`` class, ``ReplayPadFile`` in this case.
In this example the ``load()`` method returns the dictionary, which
contains the video frames, and annotations defining the face bounding
box in each frame. The preprocessor has to be "ready to deal" with that
type of input. With this, we are done configuring the high level
implementation of the ``*File`` class.
Implementation of the ``*Database`` class
The second unit to be implemented in HLDI is the ``*Database`` class.
First of all the ``*Database`` class must inherit from the **base
database class** of the corresponding biometric framework. An example:
- ``*Database`` class for the Replay database used in PAD (Presentation
Attack Detection) experiments:
``class ReplayPadDatabase(PadDatabase):``
- ``*Database`` class for the Biowave V1 database used in verification
experiments: ``class BiowaveV1BioDatabase(BioDatabase):``
Let's consider an example of the ``ReplayPadDatabase`` class. The implementation might look as follows, but don't dive into the code yet:
.. code:: python
from bob.pad.base.database import PadDatabase
class ReplayPadDatabase(PadDatabase):
def __init__(
# here I have said grandtest because this is the name of the default
# protocol for this database
self.db = LowLevelDatabase()
# Since the high level API expects different group names than what the low
# level API offers, you need to convert them when necessary
self.low_level_group_names = ('train', 'devel', 'test') # group names in the low-level database interface
self.high_level_group_names = ('train', 'dev', 'eval') # names are expected to be like that in objects() function
super(ReplayPadDatabase, self).__init__(
def objects(self, groups=None, protocol=None, purposes=None, model_ids=None, **kwargs):
# Convert group names to low-level group names here.
groups = self.convert_names_to_lowlevel(groups, self.low_level_group_names, self.high_level_group_names)
files = self.db.objects(protocol=protocol, groups=groups, cls=purposes, **kwargs)
files = [ReplayPadFile(f) for f in files]
return files
def annotations(self, file):
Do nothing. In this particular implementation the annotations are returned in the *File class above.
return None
Instead, let's try to understand why the implementation looks like this. Again, the methods to be implemented are defined by the corresponding base class of our ``*Database`` class.
In the case of PAD ``*Database`` the inheritance structure is as follows:
- ``ReplayPadDatabase`` -> ``bob.pad.base.database.PadDatabase`` -> ```` -> ``bob.db.base.Database``
For the verification database the inheritance would be:
- ``bob.pad.base.database.PadDatabase`` -> ```` -> ``bob.db.base.Database``
For other biometric experiments it might look differently.
In the given example the behavior of the ``ReplayPadDatabase`` class is defined by the ``bob.pad.base.database.PadDatabase`` base class, which sates that two methods must be implemented in the high level database implementation: ``objects()`` and ``annotations()``. The ``objects()`` method returns a list of instances of ``ReplayPadFile`` class. The ``annotations()`` method is empty, since the developer of the code decided to return the annotations in the ``*File`` class. Note: you are not obliged to do it that way, it's just a matter of taste.
At this point, having all necessary classes in place, we are done with implementation of the high level database interface!
Just a few small things have to be done to register our high level interface in the corresponding biometric framework.
- First, import your package in the ```` file located in the folder containing the implementation of HLDI: ``from .replay import ReplayPadDatabase``
- Next, create an instance of the ``*Database`` class with default configuration. For example, for the ``ReplayPadDatabase`` class used in ``bob.pad.face`` framework, the default configuration file ``/bob/pad/face/config/database/`` is as follows:
.. code:: python
# The original_directory is taken from the .bob_bio_databases.txt file located in your home directory
original_directory = "[YOUR_REPLAY_ATTACK_DIRECTORY]"
original_extension = ".mov" # extension of the data files
database = ReplayPadDatabase(