introduction.rst 9.39 KB
Newer Older
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1
2
3
4
5
.. vim: set fileencoding=utf-8 :

.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/          ..
.. Contact: beat.support@idiap.ch                                             ..
..                                                                            ..
6
.. This file is part of the beat.docs module of the BEAT platform.            ..
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
..                                                                            ..
.. Commercial License Usage                                                   ..
.. Licensees holding valid commercial BEAT licenses may use this file in      ..
.. accordance with the terms contained in a written agreement between you     ..
.. and Idiap. For further information contact tto@idiap.ch                    ..
..                                                                            ..
.. Alternatively, this file may be used under the terms of the GNU Affero     ..
.. Public License version 3 as published by the Free Software and appearing   ..
.. in the file LICENSE.AGPL included in the packaging of this file.           ..
.. The BEAT platform is distributed in the hope that it will be useful, but   ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE.                                       ..
..                                                                            ..
.. You should have received a copy of the GNU Affero Public License along     ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/.          ..


24
25
.. _beat-system:

26
===========================
27
 Getting Started with BEAT
28
===========================
29
The Beat system has certain building blocks used by all the packages in the BEAT
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
30
31
32
33
34
35
36
37
38
39
40
41
software suite. These are:

* **Data formats**: the specification of data which is transmitted between
  blocks of a toolchain;
* **Libraries**: routines (source-code or binaries) that can be incorporated
  into other libraries or user code on algorithms;
* **Algorithms**: the program (source-code or binaries) that defines the user
  algorithm to be run within the blocks of a toolchain;
* **Databases** and **Datasets**: means to read raw-data from a disk and feed
  into a toolchain, respecting a certain usage protocol;
* **Toolchain**: the definition of the data flow in an experiment;
* **Experiment**: the reunion of algorithms, datasets, a toolchain and
42
  parameters that allow the system to run the prescribed recipe
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
43
44
  to produce displayable results.

45
46
47
.. note::
All this building blocks are stored in a folder typically named `prefix`. We will get back to this in :ref:`tutorial`

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
48

49
.. _beat-system-example:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
50
51

A Simple Example
52
================
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
53
54
55
56
57
58
59
60
61
62
63
64

The next figure shows a representation of a very simple toolchain, composed of
only a few color-coded components:

* To the left, the reader can identify two datasets, named ``set`` and ``set2``
  respectively. They emit data (of, at this point, an unspecified type) into
  the following processing blocks;
* Following the datasets, two processing blocks named ``echo1`` and ``echo2``
  receive the input from the dataset and emit data into a third block, named
  ``echo3``;
* The final component receives the inputs emitted from ``echo3`` and it is
  called ``analysis``. Because this block has no output, it is considered a
65
  final block, from which the BEAT expects to collect experiment
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
  results (that, at this point, are also unspecified).

.. Simple toolchain representation for the BEAT platform
.. graphviz:: img/toolchain-triangle.dot

The toolchain only defines the very basic data flow and connections that must
be respected by experiments. It does not define what is the type of data that
is produced or consumed by any of the existing blocks, the algorithms or
databases and protocols to use. From the toolchain description, it is possible
to devise a possible execution order, by taking into consideration the imposed
data flow. In this simple example, the datasets called ``set`` and ``set2``
may yield data in parallel, allowing the execution of blocks ``echo1`` and
``echo2``. Block ``echo3`` must come next, before the ``analysis`` block, which
comes by last.

81
In typical problems that can be implemented in the BEAT, datasets are
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
composed of multiple instances of raw data. For example, these could be images
for an object recognition problem, speech sequences for a speech recognition
task or model data for biometric recognition tasks. Computing blocks must
process these data by looping on these atomic data samples. The color-coding in
the figure indicates this extra data-flow information: for each dataset in the
drawing, it indicates how blocks loop on their atomic data. For the proposed,
toolchain, we can observe that blocks ``echo1``, ``echo3`` and ``analysis``
loop over the "raw" data samples from ``set``, while ``echo2`` loop over the
samples from ``set2``.

The next figure shows a complete experimental setup for the above toolchain.
The input blocks use a given database, called ``simple/1`` (the name is
``simple`` and the version is ``1``), using one of its protocols called
``protocol``. Each block is set to a specific data set inside the
database/protocol combination. Both datasets on this database/protocol yield
objects of type ``beat/integer/1`` (a format called ``integer`` from user
``beat``, version ``1``), which are consumed by algorithms running on the next
blocks. The block ``echo1`` uses the algorithm ``user/integers_echo/1`` (an
algorithm called ``integers_echo`` from user ``user``, version ``1``) and
also yields ``beat/integer/1`` objects. The same is valid for the algorithm
running on block ``echo2``.

The algorithm for block ``echo3`` cannot possibly be the same - it must deal
with 2 inputs, generated by blocks looping on different raw data. We'll be more
detailed about conceptual differences while writing algorithms which are not
synchronized with all of their inputs next. For this introduction, it suffices
you understand the organization of algorithms in an experiment is constrained
by its neighboring block requirements as well as the input and output
data flows determined for a given block.

Block ``echo3`` yields elements to the algorithm on the ``analysis`` block,
called ``user/integers_echo_analyzer/1``, which produces a single result named
``out_data``, which is of type ``int32`` (that is, a signed integer with 32
bits). Algorithms that do not communicate with other algorithms are typically
called ``analyzers``. They are set-up on the end of experiments so as to
produce quantifiable results you can use to measure the performance of your
experimental setup.

.. Simple experiment representation for the BEAT platform
.. graphviz:: img/experiment-triangle.dot


124
.. _beat-system-design:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
125
126

Design
127
======
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177

The next figure shows an UML representation of main BEAT components, showing
some of their interaction and interdependence. Experiments use algorithms, data
sets and a toolchain in order to define a complete runnable setup. Data sets
are grouped into protocols which are, in turn, grouped into databases.
Algorithms use data formats to defined input and output patterns. Most objects
are subject to versioning, possess a name and belong to a specific user. By
contracting those markers, it is possible to define unique identifiers for all
objects in the platform. In the example above, you can identify some examples.

.. High-level component interaction in the BEAT platform core
.. graphviz::

   digraph hierarchy {
     graph [fontname="helvetica", compound=true, splines=polyline]
     node [fontname="helvetica", shape=record, style=filled, fillcolor=gray95]
     edge [fontname="helvetica"]

     subgraph "algorithm_cluster" {
       1[label = "{Dataformat|...|+user\n+name\n+version}"]
       2[label = "{Algorithm|...|+user\n+name\n+version\n+code\n+language}"]
       6[label = "{Library|...|+user\n+name\n+version\n+code\n+language}"]
     }
     subgraph "database_cluster" {
       graph [label=datasets]
       3[label = "{Database|...|+name\n+version}"]
       4[label = "{Protocol|...|+template}"]
       5[label = "Set"]
     }
     subgraph "experiment_cluster" {
       graph [label=experiments]
       7[label = "{Toolchain|+execution_order()|+user\n+name\n+version}"]
       8[label = "{Experiment|...|+user\n+label}"]
     }

     1->1 [label = "0..*", arrowhead=empty]
     2->1 [label = "1..*", arrowhead=empty]
     2->6 [label = "0..*", arrowhead=empty]
     6->6 [label = "0..*", arrowhead=empty]
     4->3 [label = "1..*", arrowhead=odiamond]
     5->4 [label = "1..*", arrowhead=odiamond]
     5->1 [label = "1..*", arrowhead=empty]
     8->7 [label = "1..1", arrowhead=empty]
     8->2 [label = "1..*", arrowhead=empty]
     8->5 [label = "1..*", arrowhead=empty]

   }


The BEAT platform provides a graphical user interface so that you can program
178
179
180
181
182
183
data formats, algorithms, toolchains and define experiments rather intuitively. 
For expert users, we provide a command-line interface to the platform, allowing
such users to create, modify and dispose of such objects using their own private
editors. When using BEAT locally the graphical user interface is used in parallel 
with the command-line interface.  

184
185
186
BEAT Building Blocks
====================

187
188
For developers and programmers, the rest of this guide details each of
BEAT building blocks, their relationships and how to use such a command-line
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
189
190
191
interface to interact with the platform efficiently.


192
193


194
195
196
197
198
199
200
201
202
203
.. toctree::

    dataformats
    algorithms
    libraries
    toolchains
    experiments
    databases


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
204
.. include:: links.rst