From cadbb8aba08d5862573f42b4aba43434680301e2 Mon Sep 17 00:00:00 2001
From: Olegs NIKISINS <onikisins@italix03.idiap.ch>
Date: Tue, 9 May 2017 15:29:03 +0200
Subject: [PATCH] Added the High Level Database Interface How-To Guide to the
 documentation

---
 doc/high_level_db_interface_guide.rst | 216 ++++++++++++++++++++++++++
 doc/index.rst                         |   1 +
 2 files changed, 217 insertions(+)
 create mode 100644 doc/high_level_db_interface_guide.rst

diff --git a/doc/high_level_db_interface_guide.rst b/doc/high_level_db_interface_guide.rst
new file mode 100644
index 0000000..6457d9c
--- /dev/null
+++ b/doc/high_level_db_interface_guide.rst
@@ -0,0 +1,216 @@
+.. vim: set fileencoding=utf-8 :
+.. @author: Olegs Nikisins <olegs.nikisins@idiap.ch>
+.. @date:   May 2017
+
+
+=============================================
+High Level Database Interface How-To Guide
+=============================================
+
+The *high level database interface* (HLDI) is needed to run biometric experiments using non-filelist databases (e.g. if one wants to use SQL-based database package).
+
+This tutorial explains how to create a *high level* database
+interface, using as an example ``bob.pad.*`` framework (e.g.
+``bob.pad.face``). The process is similar for ``bob.bio`` frameworks,
+e.g. ``bob.bio.face``, ``bob.bio.vein``). High level database interface
+is a link between low level database interface/package (e.g. ``bob.db.replay``) and a
+corresponding framework used to run biometric experiments (e.g.
+``bob.pad.face``). Generally speaking, the low level interface has lot's
+of querying options, which are not always used in the corresponding biometric
+framework. High level interface only contains the functionality, which
+is needed to run biometric experiments. This, must have functionality,
+is defined in the corresponding base classes and is discussed next.
+
+First thing you need to do is to create a ``*.py`` file containing
+your high level implementation, for example:
+``bob/pad/face/database/replay.py`` for the Replay database. This file
+must be placed into corresponding biometric framework, which in this
+case is ``bob.pad.face`` package. The file **must** contain the
+implementation of two classes:
+
+-  ``<YourDatabaseName><Bio/Pad/Other>File``
+-  ``<YourDatabaseName><Bio/Pad/Other>Database``
+
+For example, the names of the above classes for the *Replay* database used in
+the ``bob.pad.face`` framework are: ``ReplayPadFile`` and
+``ReplayPadDatabase``.
+
+Implementation of the ``*File`` class
+---------------------------------------------------
+
+First of all, the ``*File`` class must inherit from the **base file
+class** of the corresponding biometric framework. An example:
+
+-  ``*File`` class for the Replay database used in PAD (Presentation
+   Attack Detection) experiments: ``class ReplayPadFile(PadFile):``
+-  ``*File`` class for the Biowave V1 database used in verification
+   experiments: ``class BiowaveV1BioFile(BioFile):``
+
+Base class defines the elements, which must be implemented in the derived
+class. For example, the implementation of ``ReplayPadFile`` class must
+set the following elements of the base class: ``client_id``, ``path``,
+``attack_type`` and ``file_id``. The corresponding high level
+implementation of the ``ReplayPadFile`` class might look as follows:
+
+.. code:: python
+
+    import bob.bio.video
+
+    from bob.pad.base.database import PadFile
+
+    class ReplayPadFile(PadFile):
+        def __init__(self, f):
+            self.__f = f # here ``f`` is an instance of the File class defined in the low level database interface
+            if f.is_real():
+                attack_type = None
+            else:
+                attack_type = 'attack'
+            super(ReplayPadFile, self).__init__(client_id=f.client, path=f.path,
+                                                attack_type=attack_type, file_id=f.id)
+        def load(self, directory=None, extension='.mov'):
+            path = self.f.make_path(directory=directory, extension=extension)
+            frame_selector = bob.bio.video.FrameSelector(selection_style = 'all')
+            video_data = frame_selector(path)
+            bbx_data = one_file.bbx(directory=directory)
+            return_dictionary = {}
+            return_dictionary["data"] = video_data
+            return_dictionary["annotations"] = bbx_data
+            return return_dictionary
+
+Please, note, that in our case the ``ReplayPadFile`` also has a
+``load()`` method. *Note: the load() method of the high level
+``*File`` class is used by the preprocessor (a very first block in every
+biometric pipeline) to read the data from the database.* Not all high
+level database interfaces require this method, but let's try to
+understand why ``ReplayPadFile`` class has it. The necessity to have
+this method comes from the fact, that Replay database contains **video**
+files, not images. To understand why ``load()`` method is needed in the
+case of video-based database we need to take a look at the inheritance
+structure of the class. For the ``ReplayPadFile`` class it looks as
+follows:
+
+-  ``ReplayPadFile`` -> ``bob.pad.base.database.PadFile`` ->
+   ``bob.bio.base.database.BioFile`` -> ``bob.db.base.File``
+
+Here the notation ``A`` -> ``B`` means ``A`` inherits from ``B``. Well,
+the inheritance is pretty deep, but no need to worry about this. The
+class of interest for us is ``bob.db.base.File`` containing the default
+file managing methods, which might be overridden if necessary. One of
+methods is ``load()`` **not** supporting video files by default. Since a
+different behavior is desired, we need to override it in the high level
+implementation of the ``*File`` class, ``ReplayPadFile`` in this case.
+In this example the ``load()`` method returns the dictionary, which
+contains the video frames, and annotations defining the face bounding
+box in each frame. The preprocessor has to be "ready to deal" with that
+type of input. With this, we are done configuring the high level
+implementation of the ``*File`` class.
+
+Implementation of the ``*Database`` class
+---------------------------------------------------
+
+The second unit to be implemented in HLDI is the ``*Database`` class.
+First of all the ``*Database`` class must inherit from the **base
+database class** of the corresponding biometric framework. An example:
+
+-  ``*Database`` class for the Replay database used in PAD (Presentation
+   Attack Detection) experiments:
+   ``class ReplayPadDatabase(PadDatabase):``
+-  ``*Database`` class for the Biowave V1 database used in verification
+   experiments: ``class BiowaveV1BioDatabase(BioDatabase):``
+
+
+Let's consider an example of the ``ReplayPadDatabase`` class. The implementation might look as follows, but don't dive into the code yet:
+
+.. code:: python
+
+    from bob.pad.base.database import PadDatabase
+
+    class ReplayPadDatabase(PadDatabase):
+
+        def __init__(
+            self,
+            all_files_options={},
+            check_original_files_for_existence=False,
+            original_directory=None,
+            original_extension=None,
+            # here I have said grandtest because this is the name of the default
+            # protocol for this database
+            protocol='grandtest',
+            **kwargs):
+
+            self.db = LowLevelDatabase()
+
+            # Since the high level API expects different group names than what the low
+            # level API offers, you need to convert them when necessary
+            self.low_level_group_names = ('train', 'devel', 'test') # group names in the low-level database interface
+            self.high_level_group_names = ('train', 'dev', 'eval') # names are expected to be like that in objects() function
+
+            super(ReplayPadDatabase, self).__init__(
+                'replay',
+                all_files_options,
+                check_original_files_for_existence,
+                original_directory,
+                original_extension,
+                protocol,
+                **kwargs)
+
+        def objects(self, groups=None, protocol=None, purposes=None, model_ids=None, **kwargs):
+            # Convert group names to low-level group names here.
+            groups = self.convert_names_to_lowlevel(groups, self.low_level_group_names, self.high_level_group_names)
+            files = self.db.objects(protocol=protocol, groups=groups, cls=purposes, **kwargs)
+            files = [ReplayPadFile(f) for f in files]
+            return files
+
+        def annotations(self, file):
+            """
+            Do nothing. In this particular implementation the annotations are returned in the *File class above.
+            """
+            return None
+
+
+Instead, let's try to understand why the implementation looks like this. Again, the methods to be implemented are defined by the corresponding base class of our ``*Database`` class.
+In the case of PAD ``*Database`` the inheritance structure is as follows:
+
+- ``ReplayPadDatabase`` -> ``bob.pad.base.database.PadDatabase`` -> ``bob.bio.base.database.BioDatabase`` -> ``bob.db.base.Database``
+
+For the verification database the inheritance would be:
+
+- ``bob.pad.base.database.PadDatabase`` -> ``bob.bio.base.database.BioDatabase`` -> ``bob.db.base.Database``
+
+For other biometric experiments it might look differently.
+In the given example the behavior of the ``ReplayPadDatabase`` class is defined by the ``bob.pad.base.database.PadDatabase`` base class, which sates that two methods must be implemented in the high level database implementation: ``objects()`` and ``annotations()``. The ``objects()`` method returns a list of instances of ``ReplayPadFile`` class. The ``annotations()`` method is empty, since the developer of the code decided to return the annotations in the ``*File`` class. Note: you are not obliged to do it that way, it's just a matter of taste.
+
+At this point, having all necessary classes in place, we are done with implementation of the high level database interface!
+
+Just a few small things have to be done to register our high level interface in the corresponding biometric framework.
+
+- First, import your package in the ``__init__.py`` file located in the folder containing the implementation of HLDI: ``from .replay import ReplayPadDatabase``
+
+- Next, create an instance of the ``*Database`` class with default configuration. For example, for the ``ReplayPadDatabase`` class used in ``bob.pad.face`` framework, the default configuration file ``/bob/pad/face/config/database/replay.py`` is as follows:
+
+.. code:: python
+
+    # The original_directory is taken from the .bob_bio_databases.txt file located in your home directory
+    original_directory = "[YOUR_REPLAY_ATTACK_DIRECTORY]"
+    original_extension = ".mov" # extension of the data files
+
+    database = ReplayPadDatabase(
+        protocol='grandtest',
+        original_directory=original_directory,
+        original_extension=original_extension,
+        training_depends_on_protocol=True,
+    )
+
+- Finally, in the ``setup.py`` file of the corresponding biometric framework, add the entry pointing to your default configuration. In the case of observed PAD example the code is:
+
+.. code:: python
+
+    entry_points = {
+
+        'bob.pad.database': [
+            'replay = bob.pad.face.config.database.replay:database',
+            ],
+
+    },
+
+That's it! Now we are ready to use our database in the corresponding biometric framework.
\ No newline at end of file
diff --git a/doc/index.rst b/doc/index.rst
index 4a09718..346c3ce 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -49,6 +49,7 @@ Users Guide
     installation
     experiments
     implementation
+    high_level_db_interface_guide
 
 ================
 Reference Manual
-- 
GitLab