Commit e1659aed authored by Amir MOHAMMADI's avatar Amir MOHAMMADI
Browse files

Merge branch...

Merge branch '78-the-documentation-does-not-mention-usage-of-single-configuration-files' into 'master'

Resolve "The documentation does not mention usage of single configuration files"

Closes #78

See merge request !92
parents 4f1ae620 830489b3
Pipeline #11765 passed with stages
in 11 minutes and 6 seconds
......@@ -15,9 +15,10 @@ class Preprocessor (object):
writes_data : bool
Select, if the preprocessor actually writes preprocessed images, or if it is simply returning values.
read_original_data: callable
read_original_data: callable or ``None``
This function is used to read the original data from file.
It takes three inputs: A :py:class:`bob.bio.base.database.BioFile` (or one of its derivatives), the original directory (as ``str``) and the original extension (as ``str``).
If ``None``, the default function :py:func:`bob.bio.base.read_original_data` is used.
min_preprocessed_file_size: int
The minimum file size of a saved preprocessd data in bytes. If the saved
......
#!/bin/python
# This file describes an exemplary configuration file that can be used in combination with the parameter_test.py script.
......
......@@ -35,7 +35,7 @@ def command_line_config_group(parser, package_prefix='bob.bio.', exclude_resourc
help='A configuration file containing one or more of "database", "preprocessor", '
'"extractor", "algorithm" and/or "grid"')
config_group.add_argument('-H', '--create-configuration-file', metavar='PATH',
help='If selected, an empty configuration file will be created')
help='If selected, an empty configuration file will be created, and no further process is executed')
config_group.add_argument('-d', '--database', metavar='x', nargs='+',
help='Database and the protocol; registered databases are: %s' % utils.resource_keys(
'database', exclude_resources_from, package_prefix=package_prefix))
......@@ -198,8 +198,8 @@ def command_line_parser(description=__doc__, exclude_resources_from=[]):
"database can be processed; missing scores will be NaN.")
flag_group.add_argument('-r', '--parallel', type=int,
help='This flag is a shortcut for running the commands on the local machine with the given amount of '
'parallel threads; equivalent to --grid bob.bio.base.grid.Grid("local", '
'number_of_parallel_threads=X) --run-local-scheduler --stop-on-failure.')
'parallel processes; equivalent to --grid bob.bio.base.grid.Grid("local", '
'number_of_parallel_processes=X) --run-local-scheduler --stop-on-failure.')
flag_group.add_argument('-t', '--environment', dest='env', nargs='*', default=[],
help='Passes specific environment variables to the job.')
......
......@@ -59,12 +59,11 @@ def check_file(filename, force, expected_file_size=1):
def read_original_data(biofile, directory, extension):
"""read_original_data(biofile, directory, extension) -> data
This function reads the original data using the given ``biofile`` instance.
"""This function reads the original data using the given ``biofile`` instance.
It simply calls ``load(directory, extension)`` from :py:class:`bob.bio.base.database.BioFile` or one of its derivatives.
**Parameters:**
Parameters
----------
``biofile`` : :py:class:`bob.bio.base.database.BioFile` or one of its derivatives
The file to read the original data.
......@@ -76,9 +75,10 @@ def read_original_data(biofile, directory, extension):
The extension of the original data.
Might be ``None`` if the ``biofile`` itself has the extension stored.
**Returns**
Returns
-------
``data`` : object
object:
Whatver ``biofile.load`` returns; usually a :py:class:`numpy.ndarray`
"""
assert isinstance(biofile, database.BioFile)
......
This diff is collapsed.
......@@ -52,11 +52,11 @@ If a class returns data that is **not** of type :py:class:`numpy.ndarray`, it ov
* ``write_data(data, data_file)``: Writes the given data (that has been generated using the ``__call__`` function of this class) to file.
* ``read_data(data_file)``: Reads the preprocessed data from file.
By default, the original data is read by :py:func:`bob.io.base.load`.
Hence, data is given as :py:class:`numpy.ndarray`\s.
When a different IO for the original data is required (for example to read videos in :py:class:`bob.bio.video.preprocessor.Video`), the following function is overridden:
* ``read_original_data(filename)``: Reads the original data from file.
The preprocessor is also responsible for reading the original data.
How to read original data can be specified by the ``read_original_data`` parameter of the constructor.
The ``read_original_data`` function gets three parameters: the :py:class:`bob.bio.base.database.BioFile` object from the database, the base ``directory`` where to read the data from, and the ``extension`` in which the original data is stored.
By default, this function is :py:func:`bob.bio.base.read_original_data`, which simply calls: ``biofile.load(directory, extension)``, so that each database implementation can define an appropriate way, how data is read or written.
In the rare case that this is not the way that the preprocessor expects the data, another function can be passed to the constructor, i.e., in a configuration file of an experiment.
.. _bob.bio.base.extractors:
......@@ -222,28 +222,24 @@ For Bob_'s ZT-norm databases, we provide the :py:class:`bob.bio.base.database.ZT
Defining your own Database
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. note::
If you have your own database that you want to execute the recognition experiments on, you should
first check if you could use the ``Verification File List Database`` interface by defining appropriate
file lists for the training set, the model set, and the probes.
If you have your own database that you want to execute the recognition experiments on, you should first check if you could use the `File List Database` interface by defining appropriate file lists for the training set, the model set, and the probes.
Please refer to the documentation :doc:`filelist-guide` of this database for more instructions on how to setup this database.
For an example, you might want to have a look into the implementation of the `BANCA FileList database <http://gitlab.idiap.ch/bob/bob.bio.spear/tree/master/bob/bio/spear/config/database/banca>`_, where the protocol with the name ``G`` is implemented, and its according `database configuration file <https://gitlab.idiap.ch/bob/bob.bio.spear/blob/master/bob/bio/spear/config/database/banca_audio_G.py>`_.
For an example, you might want to have a look into the implementation of the `Timit FileList database <http://gitlab.idiap.ch/bob/bob.bio.spear/tree/master/bob/bio/spear/config/database/timit>`_, where the protocol with the name ``2`` is implemented, and its according `database configuration file <https://gitlab.idiap.ch/bob/bob.bio.spear/blob/master/bob/bio/spear/config/database/timit.py>`_.
To "plug" your own (non-file-list-based) database in this framework you have to write your own database class by deriving :py:class:`bob.bio.base.database.BioDatabase`.
In this case, you have to derive your class from the :py:class:`bob.bio.base.database.BioDatabase`, and provide the following functions:
* ``__init__(self, <your-parameters>, **kwargs)`` Constructor of your database interface.
Please call the base class constructor, providing all the required parameters, e.g. by ``super(<your_db>,self).__init__(self, **kwargs)``.
Usually, providing ids for the group ``'dev'`` should be sufficient.
* ``objects(self, groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)``
* ``objects(self, groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)``
This function must return a list of ``bob.bio.base.database.BioFile`` objects with your data.
The keyword arguments are possible filters that you may use.
The keyword arguments are filters that you should use.
* ``model_ids_with_protocol(self, groups, protocol, **kwargs)``
This function must return a list of model ids for the given groups and given protocol.
In this context models are basically them "templates" used for enrollment.
Additionally, you can define more lists that can be used for ZT score normalization.
If you don't know what ZT score normalization is, just forget about it and move on.
......@@ -258,16 +254,15 @@ If you know and want to use it, just derive your class from :py:class:`bob.bio.b
* ``tmodel_ids_with_protocol(self, protocol=None, groups=None, **kwargs)``
The ids for the T norm models for the given group and protocol.
.. note:
.. note::
For a proper biometric recognition protocol, the identities from the models and the T-Norm models, as well as the Z-probes should be different.
..
For some protocols, a single probe consists of several features, see :ref:`bob.bio.base.algorithms` about strategies how to incorporate several probe files into one score.
If your database should provide this functionality, please overwrite:
For some protocols, a single probe consists of several features, see :ref:`bob.bio.base.algorithms` about strategies how to incorporate several probe files into one score.
If your database should provide this functionality, please overwrite:
* ``uses_probe_file_sets(self)``: Return ``True`` if the current protocol of the database provides multiple files for one probe.
* ``probe_file_sets(self, model_id=None, group='dev')``: Returns a list of lists of :py:class:`bob.bio.base.database.FileSet` objects.
* ``z_probe_file_sets(self, model_id=None, group='dev')``: Returns a list of lists of Z-probe :py:class:`bob.bio.base.database.FileSet` objects (only needed if the base class is :py:class:`bob.bio.base.database.DatabaseZT`).
* ``uses_probe_file_sets(self)``: Return ``True`` if the current protocol of the database provides multiple files for one probe.
* ``probe_file_sets(self, model_id=None, group='dev')``: Returns a list of lists of :py:class:`bob.bio.base.database.BioFileSet` objects.
* ``z_probe_file_sets(self, model_id=None, group='dev')``: Returns a list of lists of Z-probe :py:class:`bob.bio.base.database.BioFileSet` objects.
.. _bob.bio.base.configuration-files:
......@@ -298,7 +293,6 @@ For example, the configuration file for a PCA algorithm, which uses 80% of varia
algorithm = bob.bio.base.algorithm.PCA(subspace_dimension = 0.8, distance_function = scipy.spatial.distance.cosine, is_distance_function = True)
Some default configuration files can be found in the ``bob/bio/*/config`` directories of all ``bob.bio`` packages, but you can create configuration files in any directory you like.
In fact, since all tools have a different keyword, you can define a complete experiment in a single configuration file.
.. _bob.bio.base.resources:
......@@ -324,7 +318,7 @@ Particularly, we use a specific list of entry points, which are:
For each of the tools, several resources are defined, which you can list with the ``resources.py`` command line.
When you want to register your own resource, make sure that your configuration file is importable (usually it is sufficient to have an empty ``__init__.py`` file in the same directory as your configuration file).
Then, you can simply add a line inside the according ``entry_points`` section of the ``setup.py`` file (you might need to create that section, just follow the example of the ``setup.py`` file that you can find online in the base directory of our `bob.bio.base Gitlab page <http://gitlab.idiap.ch/bob/bob.bio.base>`__).
Then, you can simply add a line inside the according ``entry_points`` section of the ``setup.py`` file (you might need to create that section, just follow the example of the ``setup.py`` file `that you can find online in bob.bio.base Gitlab page <https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/setup.py>`__).
After re-running ``buildout``, your new resource should be listed in the output of ``resources.py``.
......
......@@ -12,33 +12,49 @@ Now that we have learned the implementation details, we can have a closer look i
Running Experiments (part II)
-----------------------------
As mentioned before, running biometric recognition experiments can be achieved using the ``verify.py`` command line.
As mentioned before, running biometric recognition experiments can be achieved using configuration files for the ``verify.py`` script.
In section :ref:`running_part_1`, we have used registered resources to run an experiment.
However, the command line options of ``verify.py`` is more flexible, as you can have three different ways of defining tools:
However, the variables (and also the :ref:`bob.bio.base.command_line` of ``verify.py``) are more flexible, as you can have three different ways of defining tools:
1. Choose a resource (see ``resources.py`` or ``verify.py --help`` or the result of ``verify.py --create-configuration-file`` for a list of registered resources):
.. code-block:: py
algorithm = "pca"
2. Use a (pre-defined) configuration file, see: :ref:`bob.bio.base.configuration-files`.
In case several tools are specified inside the configuration file, only the variable that matches to your variable will be used.
For example, the file "bob/bio/base/config/algorithm/pca.py" might define several variables, but only the ``algorithm`` variable is used when setting:
1. Choose a resource (see ``resources.py`` or ``verify.py --help`` for a list of registered resources):
.. code-block:: py
.. code-block:: sh
algorithm = "bob/bio/base/config/algorithm/pca.py"
$ verify.py --algorithm pca
3. Instantiate a class and pass all desired parameters to its constructor:
2. Use a configuration file. Make sure that your configuration file has the correct variable name:
.. code-block:: py
.. code-block:: sh
import bob.bio.base
import scipy.spatial
algorithm = bob.bio.base.algorithm.PCA(
subspace_dimension = 30,
distance_function = scipy.spatial.distance.euclidean,
is_distance_function = True
)
$ verify.py --algorithm bob/bio/base/config/algorithm/pca.py
.. note::
When specified on the command line, usually quotes ``"..."`` are required, and the ``--imports`` need to be provided:
3. Instantiate a class on the command line. Usually, quotes ``"..."`` are required, and the ``--imports`` need to be specified:
.. code-block:: sh
.. code-block:: sh
$ verify.py --algorithm "bob.bio.base.algorithm.PCA(subspace_dimension = 30, distance_function = scipy.spatial.distance.euclidean, is_distance_function = True)" --imports bob.bio.base scipy.spatial
$ verify.py --algorithm "bob.bio.base.algorithm.PCA(subspace_dimension = 30, distance_function = scipy.spatial.distance.euclidean, is_distance_function = True)" --imports bob.bio.base scipy.spatial
All these three ways can be used for any of the five command line options: ``--database``, ``--preprocessor``, ``--extractor``, ``--algorithm`` and ``--grid``.
You can even mix these three types freely in a single command line.
All these three ways can be used for any of the five variables: ``database``, ``preprocessor``, ``extractor``, ``algorithm`` and ``grid``.
You can even mix these three types freely in a single configuration file.
Score Level Fusion of Different Algorithms on the same Database
......@@ -57,16 +73,15 @@ Afterwards, the fusion is applied to the ``--dev-files`` and the resulting score
If ``--eval-files`` are specified, the same fusion that is trained on the development set is now applied to the evaluation set as well, and the ``--fused-eval-file`` is written.
.. note::
When ``--eval-files`` are specified, they need to be in the same order as the ``dev-files``, otherwise the result is undefined.
When ``--eval-files`` are specified, they need to be in the same order as the ``--dev-files``, otherwise the result is undefined.
The resulting ``--fused-dev-file`` and ``fused-eval-file`` can then be evaluated normally, e.g., using the ``evaluate.py`` script.
The resulting ``--fused-dev-file`` and ``--fused-eval-file`` can then be evaluated normally, e.g., using the ``evaluate.py`` script.
.. _grid-search:
Finding the Optimal Configuration
---------------------------------
Sometimes, configurations of tools (preprocessors, extractors or algorithms) are highly dependent on the database or even the employed protocol.
Additionally, configuration parameters depend on each other.
``bob.bio`` provides a relatively simple set up that allows to test different configurations in the same task, and find out the best set of configurations.
......@@ -90,7 +105,7 @@ The configuration file is a common python file, which can contain certain variab
The variables from 1. to 3. usually contain instantiations for classes of :ref:`bob.bio.base.preprocessors`, :ref:`bob.bio.base.extractors` and :ref:`bob.bio.base.algorithms`, but also registered :ref:`bob.bio.base.resources` can be used.
For any of the parameters of the classes, a *placeholder* can be put.
By default, these place holders start with a # character, followed by a digit or character.
By default, these place holders start with a ``#`` character, followed by a digit or character.
The variables 1. to 3. can also be overridden by the command line options ``--preprocessor``, ``--extractor`` and ``--algorithm`` of the ``grid_search.py`` script.
The ``replace`` variable has to be set as a dictionary.
......@@ -130,7 +145,7 @@ In the above example, the results of the experiments will be placed into a direc
.. note::
Please note that we are using a dictionary structure to define the replacements.
Hence, the order of the directories inside the same step might not be in the same order as written in the configuration file.
For the above example, a directory structure of `results/[...]/Dir_b1/Dir_a1/Dir_c1/[...]`` might be possible as well.
For the above example, a directory structure of ``results/[...]/Dir_b1/Dir_a1/Dir_c1/[...]`` might be possible as well.
Additionally, tuples of place holders can be defined, in which case always the full tuple will be replaced in one shot.
......@@ -147,7 +162,7 @@ Continuing the above example, it is possible to add:
}
.. warning::
*All possible combinations* of the configuration parameters are tested, which might result in a *huge number of executed experiments*.
**All possible combinations** of the configuration parameters are tested, which might result in a **huge number of executed experiments**.
Some combinations of parameters might not make any sense.
In this case, a set of requirements on the parameters can be set, using the ``requirement`` variable.
......@@ -165,6 +180,8 @@ If you, e.g., test, which ``scipy.spatial`` distance function works best for you
imports = ['scipy', 'bob.bio.base', 'bob.bio.face']
For a complete example of the grid search configuration file, you might want to have a look into `the actual file that is used to test the grid search <https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/test/dummy/grid_search.py>`__.
Further Command Line Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``grid_search.py`` script has a further set of command line options.
......@@ -184,10 +201,8 @@ The ``grid_search.py`` script has a further set of command line options.
- With the ``--executable`` flag, you might select a different script rather that ``bob.bio.base.script.verify`` to run the experiments (such as the ``bob.bio.gmm.script.verify_gmm``).
- Finally, additional options might be sent to the ``verify.py`` script directly. These options might be put after a ``--`` separation.
Evaluation of Results
~~~~~~~~~~~~~~~~~~~~~
To evaluate a series of experiments, a special script iterates through all the results and computes EER on the development set and HTER on the evaluation set, for both the ``nonorm`` and the ``ztnorm`` directories.
Simply call:
......@@ -203,6 +218,4 @@ Hence, to find the best results of your grid search experiments (with default di
$ collect_results.py -vv --directory results/grid_search --sort --criterion EER --sort-key nonorm-dev
.. include:: links.rst
......@@ -10,6 +10,7 @@ IO-related functions
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
bob.bio.base.read_original_data
bob.bio.base.load
bob.bio.base.save
bob.bio.base.load_compressed
......
......@@ -94,7 +94,7 @@ setup(
'bob.bio.preprocessor': [
'dummy = bob.bio.base.test.dummy.preprocessor:preprocessor', # for test purposes only
'filename = bob.bio.base.config.preprocessor.filename:preprocessor', # for test purposes only
'filename = bob.bio.base.config.preprocessor.filename:preprocessor',
],
'bob.bio.extractor': [
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment