Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
beat.backend.python
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
beat
beat.backend.python
Merge requests
!15
Documentation for making a new database views in beat
Code
Review changes
Check out branch
Download
Patches
Plain diff
Merged
Documentation for making a new database views in beat
fix_doc
into
1.5.x
Overview
6
Commits
3
Pipelines
2
Changes
1
Merged
Flavio TARSETTI
requested to merge
fix_doc
into
1.5.x
6 years ago
Overview
6
Commits
3
Pipelines
2
Changes
1
Expand
@andre.anjos
@zmostaani
:
the documentation for making a new database views in beat
Edited
6 years ago
by
Flavio TARSETTI
0
0
Merge request reports
Compare
1.5.x
version 1
66020f10
6 years ago
1.5.x (base)
and
latest version
latest version
ebe32652
3 commits,
6 years ago
version 1
66020f10
2 commits,
6 years ago
1 file
+
157
−
0
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
doc/index.rst
+
157
−
0
Options
@@ -40,6 +40,163 @@ for scientific reports.
This package defines a backend to execute algorithms written in the Python
programming language.
Creating new database views in beat
===================================
To implement a view, one needs to write a class that inherits from
``beat.backend.python.database.View``, and implement two methods: ``index()`` and ``get()``.
Here are the `documentation <https://gitlab.idiap.ch/beat/beat.backend.python/blob/master/beat/backend/python/database.py>`_ of those methods:
The ``index()`` function:
.. code-block:: python
def index(self, root_folder, parameters):
"""Returns a list of (named) tuples describing the data provided by the view.
The ordering of values inside the tuples is free, but it is expected
that the list is ordered in a consistent manner (ie. all train images of
person A, then all train images of person B, ...).
For instance, assuming a view providing that kind of data:
----------- ----------- ----------- ----------- ----------- -----------
| image | | image | | image | | image | | image | | image |
----------- ----------- ----------- ----------- ----------- -----------
----------- ----------- ----------- ----------- ----------- -----------
| file_id | | file_id | | file_id | | file_id | | file_id | | file_id |
----------- ----------- ----------- ----------- ----------- -----------
----------------------------------- -----------------------------------
| client_id | | client_id |
----------------------------------- -----------------------------------
a list like the following should be generated:
[
(client_id=1, file_id=1, image=filename1),
(client_id=1, file_id=2, image=filename2),
(client_id=1, file_id=3, image=filename3),
(client_id=2, file_id=4, image=filename4),
(client_id=2, file_id=5, image=filename5),
(client_id=2, file_id=6, image=filename6),
...
]
DO NOT store images, sound files or data loadable from a file in the list!
Store the path of the file to load instead.
"""
The ``get()`` function:
.. code-block:: python
def get(self, output, index):
"""Returns the data of the provided output at the provided index in the list
of (named) tuples describing the data provided by the view (accessible at
self.objs)"""
So if we take as an example the ``atnt/5 database``, the view named ``“Train”`` is implemented like this way
(note that each view comes with a documentation describing the way the different outputs are synchronised together):
.. code-block:: python
class Train(View):
"""Outputs:
- image: "{{ system_user.username }}/array_2d_uint8/1"
- file_id: "{{ system_user.username }}/uint64/1"
- client_id: "{{ system_user.username }}/uint64/1"
One "file_id" is associated with a given "image".
Several "image" are associated with a given "client_id".
--------------- --------------- --------------- --------------- --------------- ---------------
| image | | image | | image | | image | | image | | image |
--------------- --------------- --------------- --------------- --------------- ---------------
--------------- --------------- --------------- --------------- --------------- ---------------
| file_id | | file_id | | file_id | | file_id | | file_id | | file_id |
--------------- --------------- --------------- --------------- --------------- ---------------
----------------------------------------------- -----------------------------------------------
| client_id | | client_id |
----------------------------------------------- -----------------------------------------------
"""
def index(self, root_folder, parameters):
Entry = namedtuple('Entry', ['client_id', 'file_id', 'image'])
# Open the database and load the objects to provide via the outputs
db = bob.db.atnt.Database()
objs = sorted(db.objects(groups='world', purposes=None),
key=lambda x: (x.client_id, x.id))
return [ Entry(x.client_id, x.id, x.make_path(root_folder, '.pgm')) for x in objs ]
def get(self, output, index):
obj = self.objs[index]
if output == 'client_id':
return {
'value': np.uint64(obj.client_id)
}
elif output == 'file_id':
return {
'value': np.uint64(obj.file_id)
}
elif output == 'image':
return {
'value': bob.io.base.load(obj.image)
}
Note that:
1) This view exactly matches the example from the documentation of the View class. In particular, ``index()``
returns a list looking like:
.. code-block:: python
[
(client_id=1, file_id=1, image=filename1),
(client_id=1, file_id=2, image=filename2),
(client_id=1, file_id=3, image=filename3),
(client_id=2, file_id=4, image=filename4),
(client_id=2, file_id=5, image=filename5),
(client_id=2, file_id=6, image=filename6),
...
(client_id=100, file_id=10000, image=filename10000),
]
If there are 10000 images in the dataset, there will be 10000 entries in that list. The platform will use this
information to efficiently split the jobs on several machines during the experiment. It is expected that the list
is ordered in a logical order (here: entries are grouped by ``client_id``).
2) For each entry in the dataset (represented as a named tuple), all the necessary data is provided by ``index()``.
For performance reasons, it is expected that we don’t need to instantiate ``bob.db.atnt.Database()`` anymore in the ``get()`` method.
3) You’re free to put any info in the index, with the names you want for the field (here for simplicity, we have one field in the tuple
per output of the view, with the same name). The platform doesn’t care.
4) Some data from the database can be stored directly in the ``index`` (here: ``client_id`` and ``file_id``). For others, that require
opening a file, put the filename in the ``index`` and process the file later in ``get()``
5) The implementation of ``get()`` is straightforward: the full index is available as ``“self.objs”``, just return the data
corresponding to the provided output at the given index.
As for the JSON file describing the database, the format hasn’t changed. For an example of the usage of the parameters defined in the
JSON file and given to ``index()``, you can look at ``mnist/4``.
Once the view is written, you must index the database with the command-line tool, something like this:
.. code-block:: sh
./bin/beat —prefix=… db index mydatabase/1/myview
.. toctree::
api
Loading