Commit 7362631a authored by Zohreh MOSTAANI's avatar Zohreh MOSTAANI

[general][doc] added information about json declaration of databases

parent 4d5c0bd5
Pipeline #25375 passed with stages
in 5 minutes and 57 seconds
......@@ -36,7 +36,7 @@ evaluate the performance of this model.
Structure of a database
-----------------------
=======================
A database has the following structure on disk::
......@@ -46,15 +46,17 @@ A database has the following structure on disk::
...
outputN_name.data
For a given database, the BEAT system will typically stores information
For a given database, BEAT will typically stores information
about the root folder containing this raw data as well as a description of
it.
.. _beat-system-databases-protocols:
Evaluation protocols
--------------------
====================
A BEAT evaluation protocol consists of several ``datasets``, each datasets
A BEAT evaluation protocol consists of several ``datasets``, each dataset
having several ``outputs`` with well-defined data formats. In practice,
each dataset will typically be used for a different purpose.
......@@ -64,16 +66,97 @@ client-specific model, and one for testing these models.
The training dataset may have two outputs: grayscale images as two-dimensional
array of type `uint8` and client id as `uint64` integers.
The BEAT system is data-driven, which means that all the outputs of a given
BEAT is data-driven, which means that all the outputs of a given
dataset are synchronized. The way the data is generated by each template
is defined in a piece of code called the ``database view``. It is important
that a database view has a deterministic behavior for reproducibility
purposes.
Creating new database views in beat
-----------------------------------
Databases in BEAT such as other building blocks are consisting of two main components, a JSON declaration and a source code (``database view`` that is written in Python). We will describe each component in the following.
.. _beat-system-databases-protocols-json:
JSON declaration
----------------
Each database has a JSON_ declaration. This file has the information about the protocols, datasets included in each protocol, the ``database view`` used by each dataset, and much more. Here is an example of the JSON_ declaration file for `atnt` database that only has one protocol named "idiap". This protocol is used for a simple face recognition system and has three datasets, "train, "templates", and "probes".
.. code-block:: javascript
{
"description": "The AT&T Database of Faces",
"protocols": [
{
"name": "idiap",
"sets": [
{
"name": "train",
"outputs": {
"client_id": "system/uint64/1",
"file_id": "system/uint64/1",
"image": "system/array_2d_uint8/1"
},
"parameters": {},
"template": "train",
"view": "Train"
},
{
"name": "templates",
"outputs": {
"client_id": "system/uint64/1",
"file_id": "system/uint64/1",
"image": "system/array_2d_uint8/1",
"template_id": "system/uint64/1"
},
"parameters": {},
"template": "templates",
"view": "Templates"
},
{
"name": "probes",
"outputs": {
"client_id": "system/uint64/1",
"file_id": "system/uint64/1",
"image": "system/array_2d_uint8/1",
"probe_id": "system/uint64/1",
"template_ids": "system/array_1d_uint64/1"
},
"parameters": {},
"template": "probes",
"view": "Probes"
}
],
"template": "simple_face_recognition"
}
],
"root_folder": "/path_to_db_folder/att_faces"
}
The JSON_ file for a database has three main field.
* **description:** A short description of the database.
* **protocols:** a list of protocols defined for the database.
* **root_folder:** path to the directory where the data is stored.
The "protocols" field is where the datasets for each protocol is defined. In the example above only one protocol is defined. Implementing a new protocol means adding a new entry to the list of protocols. Each protocol has three main component as well.
* **name:** The name of the protocol which is "idiap" in this case.
* **sets:** The datasets which are included in this protocol. In this case the "idiap" protocol consists of three datasets; "train", "templates", and "probes".
* **template:** A short description for the protocol.
Each set in the list of "sets" in the above example is a dataset that is used for a particular purpose. For example in case of simple face recognition, dataset "train" is used for training a model, "templates" is used for making templates for each identity and "probes" is used to measure the performance of the system. Each set has the following components:
* **name:** The name of the set.
* **outputs:** The outputs provided by the set. Each output has a name and a specific data format which should be taken into consideration when using the data.
* **parameters:** Extra parameters which might be used for specific databases.
* **template:**
* **view:** The ``database view`` that is used to provide this data samples to the system. More information about the implementation of ``database view`` is given in :ref:`beat-system-databases-protocols-view`.
.. _beat-system-databases-protocols-view:
Database View
-------------
A ``database view`` is a piece of code that defines how the raw data should be fed
to the system based on defined protocols. Each database view is a class that
......@@ -134,9 +217,16 @@ of an ``index()`` method:
return [Entry(x.client_id, x.id, x.make_path(root_folder, '.pgm')) for x in objs]
The database views that are available in the BEAT platform is using `bob`_ database packages
that have well defined protocols. However defining new database views are not limited to using such
packages.
The database views available in the BEAT platform are using `bob`_ database packages
that have well defined protocols and datasets (e.g. train/dev/test). For more information see `database interfaces`_. Some examples:
* https://pypi.python.org/pypi/bob.db.atvskeystroke
* https://pypi.python.org/pypi/bob.db.gbu
* https://pypi.python.org/pypi/bob.db.mobio
However defining new database views are not limited to using such
packages.
The ``get()`` method is used every time a block is fetching raw data from the database.
......@@ -168,7 +258,7 @@ The dataformat for the outputs of database is defined in this method. for exampl
'value': bob.io.base.load(obj.image)
}
More information about the implementation of these two methods can be found `here <https://gitlab.idiap.ch/beat/beat.backend.python/blob/master/beat/backend/python/database.py>`_
If you want to know more about the underlying source code of these two methods, you can refer to `here <https://gitlab.idiap.ch/beat/beat.backend.python/blob/master/beat/backend/python/database.py>`_
......@@ -238,12 +328,13 @@ is ordered in a logical order (here: entries are grouped by ``client_id``).
For each entry in the dataset (represented as a named tuple), all the necessary data is
provided by ``index()``. For performance reasons, it is expected that we don’t need to instantiate ``bob.db.atnt.Database()`` anymore in the ``get()`` method. The user can put any information in the index method, except for the names that are reserved by python named tuple such as c`class`. If the user wants to use such names they should add it to a dictionary before defining the index method.
provided by ``index()``. For performance reasons, it is expected that we don’t need to instantiate ``bob.db.atnt.Database()`` anymore in the ``get()`` method. The user can put any information in the index method, except for the names that are reserved by python named tuple such as `class`. If the user wants to use such names they should add it to a dictionary before defining the index method.
.. code-block:: python
super(All, self)
self.output_member_map = {'class': 'cls'}
def __init__(self):
super(All, self)
self.output_member_map = {'class': 'cls'}
Some information from the database can be stored directly in the ``index()``
(in the given example: ``client_id`` and ``file_id``). For others, that require
......
......@@ -19,4 +19,5 @@
.. _beat editor: https://www.idiap.ch/software/beat/docs/beat/docs/new/beat.editor/doc/index.html
.. _bob: https://www.idiap.ch/software/bob/docs/bob/docs/stable/bob/doc/index.html
.. _idiap: http://www.idiap.ch
.. _eigenface: https://en.wikipedia.org/wiki/Eigenface
\ No newline at end of file
.. _eigenface: https://en.wikipedia.org/wiki/Eigenface
.. _database interfaces: http://pythonhosted.org/bob/temp/bob.db.base/doc/index.html
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment