introduction.rst 9.44 KB
Newer Older
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1 2 3 4 5
.. vim: set fileencoding=utf-8 :

.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/          ..
.. Contact: beat.support@idiap.ch                                             ..
..                                                                            ..
6
.. This file is part of the beat.docs module of the BEAT platform.            ..
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
..                                                                            ..
.. Commercial License Usage                                                   ..
.. Licensees holding valid commercial BEAT licenses may use this file in      ..
.. accordance with the terms contained in a written agreement between you     ..
.. and Idiap. For further information contact tto@idiap.ch                    ..
..                                                                            ..
.. Alternatively, this file may be used under the terms of the GNU Affero     ..
.. Public License version 3 as published by the Free Software and appearing   ..
.. in the file LICENSE.AGPL included in the packaging of this file.           ..
.. The BEAT platform is distributed in the hope that it will be useful, but   ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE.                                       ..
..                                                                            ..
.. You should have received a copy of the GNU Affero Public License along     ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/.          ..


24 25
.. _beat-system:

26
===========================
27
 Getting Started with BEAT
28
===========================
André Anjos's avatar
André Anjos committed
29 30 31

The BEAT framework describes experiments through fundamental building blocks
(object types):
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
32 33 34 35 36

* **Data formats**: the specification of data which is transmitted between
  blocks of a toolchain;
* **Algorithms**: the program (source-code or binaries) that defines the user
  algorithm to be run within the blocks of a toolchain;
André Anjos's avatar
André Anjos committed
37 38
* **Libraries**: routines (source-code or binaries) that can be incorporated
  into other libraries or user code on algorithms;
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
39 40 41 42
* **Databases** and **Datasets**: means to read raw-data from a disk and feed
  into a toolchain, respecting a certain usage protocol;
* **Toolchain**: the definition of the data flow in an experiment;
* **Experiment**: the reunion of algorithms, datasets, a toolchain and
43
  parameters that allow the system to run the prescribed recipe
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
44 45
  to produce displayable results.

André Anjos's avatar
André Anjos committed
46 47 48
Instances of these building blocks are represented in JSON_ using a specific
schema.  Multiple objects can be stored on disk as files following a directory
structure.
49

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
50

51
.. _beat-system-example:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
52 53

A Simple Example
54
================
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
55 56 57 58 59 60 61 62 63 64 65 66

The next figure shows a representation of a very simple toolchain, composed of
only a few color-coded components:

* To the left, the reader can identify two datasets, named ``set`` and ``set2``
  respectively. They emit data (of, at this point, an unspecified type) into
  the following processing blocks;
* Following the datasets, two processing blocks named ``echo1`` and ``echo2``
  receive the input from the dataset and emit data into a third block, named
  ``echo3``;
* The final component receives the inputs emitted from ``echo3`` and it is
  called ``analysis``. Because this block has no output, it is considered a
67
  final block, from which BEAT expects to collect experiment
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
68 69 70
  results (that, at this point, are also unspecified).

.. Simple toolchain representation for the BEAT platform
71
.. image:: img/toolchain-triangle.png
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
72 73 74 75 76 77 78 79 80 81 82

The toolchain only defines the very basic data flow and connections that must
be respected by experiments. It does not define what is the type of data that
is produced or consumed by any of the existing blocks, the algorithms or
databases and protocols to use. From the toolchain description, it is possible
to devise a possible execution order, by taking into consideration the imposed
data flow. In this simple example, the datasets called ``set`` and ``set2``
may yield data in parallel, allowing the execution of blocks ``echo1`` and
``echo2``. Block ``echo3`` must come next, before the ``analysis`` block, which
comes by last.

83
In typical problems that can be implemented in the BEAT, datasets are
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
composed of multiple instances of raw data. For example, these could be images
for an object recognition problem, speech sequences for a speech recognition
task or model data for biometric recognition tasks. Computing blocks must
process these data by looping on these atomic data samples. The color-coding in
the figure indicates this extra data-flow information: for each dataset in the
drawing, it indicates how blocks loop on their atomic data. For the proposed,
toolchain, we can observe that blocks ``echo1``, ``echo3`` and ``analysis``
loop over the "raw" data samples from ``set``, while ``echo2`` loop over the
samples from ``set2``.

The next figure shows a complete experimental setup for the above toolchain.
The input blocks use a given database, called ``simple/1`` (the name is
``simple`` and the version is ``1``), using one of its protocols called
``protocol``. Each block is set to a specific data set inside the
database/protocol combination. Both datasets on this database/protocol yield
objects of type ``beat/integer/1`` (a format called ``integer`` from user
``beat``, version ``1``), which are consumed by algorithms running on the next
blocks. The block ``echo1`` uses the algorithm ``user/integers_echo/1`` (an
algorithm called ``integers_echo`` from user ``user``, version ``1``) and
also yields ``beat/integer/1`` objects. The same is valid for the algorithm
running on block ``echo2``.

The algorithm for block ``echo3`` cannot possibly be the same - it must deal
with 2 inputs, generated by blocks looping on different raw data. We'll be more
detailed about conceptual differences while writing algorithms which are not
synchronized with all of their inputs next. For this introduction, it suffices
you understand the organization of algorithms in an experiment is constrained
by its neighboring block requirements as well as the input and output
data flows determined for a given block.

Block ``echo3`` yields elements to the algorithm on the ``analysis`` block,
called ``user/integers_echo_analyzer/1``, which produces a single result named
``out_data``, which is of type ``int32`` (that is, a signed integer with 32
bits). Algorithms that do not communicate with other algorithms are typically
called ``analyzers``. They are set-up on the end of experiments so as to
produce quantifiable results you can use to measure the performance of your
experimental setup.

.. Simple experiment representation for the BEAT platform
123
.. image:: img/experiment-triangle.png
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
124 125


126
.. _beat-system-design:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
127 128

Design
129
======
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
130 131 132 133 134 135 136 137 138 139 140

The next figure shows an UML representation of main BEAT components, showing
some of their interaction and interdependence. Experiments use algorithms, data
sets and a toolchain in order to define a complete runnable setup. Data sets
are grouped into protocols which are, in turn, grouped into databases.
Algorithms use data formats to defined input and output patterns. Most objects
are subject to versioning, possess a name and belong to a specific user. By
contracting those markers, it is possible to define unique identifiers for all
objects in the platform. In the example above, you can identify some examples.

.. High-level component interaction in the BEAT platform core
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176
.. graphviz:: 

    digraph hierarchy {
      graph [fontname="helvetica", compound=true, splines=polyline]
      node [fontname="helvetica", shape=record, style=filled, fillcolor=gray95]
      edge [fontname="helvetica"]

      subgraph "algorithm_cluster" {
        1[label = "{Dataformat|...|+user\n+name\n+version}"]
        2[label = "{Algorithm|...|+user\n+name\n+version\n+code\n+language}"]
        6[label = "{Library|...|+user\n+name\n+version\n+code\n+language}"]
      }
      subgraph "database_cluster" {
        graph [label=datasets]
        3[label = "{Database|...|+name\n+version}"]
        4[label = "{Protocol|...|+template}"]
        5[label = "Set"]
      }
      subgraph "experiment_cluster" {
        graph [label=experiments]
        7[label = "{Toolchain|+execution_order()|+user\n+name\n+version}"]
        8[label = "{Experiment|...|+user\n+label}"]
      }

      1->1 [label = "0..*", arrowhead=empty]
      2->1 [label = "1..*", arrowhead=empty]
      2->6 [label = "0..*", arrowhead=empty]
      6->6 [label = "0..*", arrowhead=empty]
      4->3 [label = "1..*", arrowhead=odiamond]
      5->4 [label = "1..*", arrowhead=odiamond]
      5->1 [label = "1..*", arrowhead=empty]
      8->7 [label = "1..1", arrowhead=empty]
      8->2 [label = "1..*", arrowhead=empty]
      8->5 [label = "1..*", arrowhead=empty]

    }
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
177 178 179


The BEAT platform provides a graphical user interface so that you can program
André Anjos's avatar
André Anjos committed
180
data formats, algorithms, toolchains and define experiments rather intuitively.
181 182
For expert users, we provide a command-line interface to the platform, allowing
such users to create, modify and dispose of such objects using their own private
André Anjos's avatar
André Anjos committed
183 184
editors. When using BEAT locally the graphical user interface is used in parallel
with the command-line interface.
185

186 187 188
BEAT Building Blocks
====================

189 190
For developers and programmers, the rest of this guide details each of
BEAT building blocks, their relationships and how to use such a command-line
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
191 192 193
interface to interact with the platform efficiently.


194 195


196 197 198 199 200 201 202 203 204 205
.. toctree::

    dataformats
    algorithms
    libraries
    toolchains
    experiments
    databases


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
206
.. include:: links.rst