Skip to content
Snippets Groups Projects
Commit 0edcbd49 authored by André Anjos's avatar André Anjos :speech_balloon:
Browse files

Merge branch 'docs' into 'master'

merge new documentation to master

See merge request !265
parents b0b627f2 c66e0439
No related branches found
No related tags found
1 merge request!265merge new documentation to master
Pipeline #25511 passed
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
###############################################################################
# #
# Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ #
# Contact: beat.support@idiap.ch #
# #
# This file is part of the beat.web module of the BEAT platform. #
# #
# Commercial License Usage #
# Licensees holding valid commercial BEAT licenses may use this file in #
# accordance with the terms contained in a written agreement between you #
# and Idiap. For further information contact tto@idiap.ch #
# #
# Alternatively, this file may be used under the terms of the GNU Affero #
# Public License version 3 as published by the Free Software and appearing #
# in the file LICENSE.AGPL included in the packaging of this file. #
# The BEAT platform is distributed in the hope that it will be useful, but #
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY #
# or FITNESS FOR A PARTICULAR PURPOSE. #
# #
# You should have received a copy of the GNU Affero Public License along #
# with the BEAT platform. If not, see http://www.gnu.org/licenses/. #
# #
###############################################################################
import os
import sys
import pkg_resources
# -- General configuration -----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
needs_sphinx = '1.3'
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.ifconfig',
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinxcontrib.programoutput',
]
import sphinx
if sphinx.__version__ >= "1.4.1":
extensions.append('sphinx.ext.imgmath')
else:
extensions.append('sphinx.ext.pngmath')
# Always includes todos
todo_include_todos = True
# Create numbers on figures with captions
numfig = True
# If we are on OSX, the 'dvipng' path maybe different
dvipng_osx = '/opt/local/libexec/texlive/binaries/dvipng'
if os.path.exists(dvipng_osx): pngmath_dvipng = dvipng_osx
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
top_module = 'beat'
# General information about the project.
project = u'BEAT Web Application' # (%s)' % top_module
authors = u'Idiap Research Institute'
import time
copyright = u'%s, Idiap Research Institute, Switzerland' % time.strftime('%Y')
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# Grab the setup entry
distribution = pkg_resources.require('beat.web')[0]
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = distribution.version
# The full version, including alpha/beta/rc tags.
release = distribution.version
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = []
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# -- Autodoc settings ---------------------------------------------------
autodoc_member_order = 'bysource'
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
import sphinx_rtd_theme
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
#html_static_path = ['_static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'BEATdoc'
# -- Options for LaTeX output --------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
'papersize': 'a4paper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
'preamble': '\setcounter{tocdepth}{4}',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
(master_doc, top_module + '.tex', project, authors, 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
# -- Options for manual page output --------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, top_module, project, authors, 1),
]
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output ------------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, top_module, project, authors, top_module, project, 'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'
def smaller_than(v1, v2):
"""Compares scipy/numpy version numbers"""
c1 = v1.split('.')
c2 = v2.split('.')[:len(c1)] #clip to the compared version
for i, k in enumerate(c2):
n1 = c1[i]
n2 = c2[i]
try:
n1 = int(n1)
n2 = int(n2)
except ValueError:
n1 = str(n1)
n2 = str(n2)
if n1 > n2: return False
return True
# Some name mangling to find the correct sphinx manuals for some packages
numpy_version = __import__('numpy').version.version
if smaller_than(numpy_version, '1.5.z'):
numpy_version = '.'.join(numpy_version.split('.')[:-1]) + '.x'
else:
numpy_version = '.'.join(numpy_version.split('.')[:-1]) + '.0'
numpy_manual = 'http://docs.scipy.org/doc/numpy-%s/' % numpy_version
# For inter-documentation mapping:
intersphinx_mapping = {
'http://docs.python.org/%d.%d/' % sys.version_info[:2]: None,
numpy_manual: None,
'http://matplotlib.sourceforge.net/': None,
}
.. vim: set fileencoding=utf-8 :
.. _beat_web:
======================
BEAT Web Application
======================
This documentation includes information about the BEAT platform.
For users
=========
.. toctree::
:maxdepth: 1
:titlesonly:
user/index.rst
For developers
==============
.. toctree::
:maxdepth: 1
:titlesonly:
admin/index.rst
api/index.rst
......@@ -27,478 +27,9 @@
Algorithms
============
Graphically represented, :ref:`toolchains` look like a set of interconnected
blocks. As illustrated in the figure below, each block can accommodate one
*Algorithm*, along with the necessary input and output interfaces. We also
refer to the inputs and outputs collectively as *endpoints*.
.. image:: img/block.*
Typically, an algorithm will process data units received at the input
endpoints, and push the relevant results to the output endpoint. Each algorithm
must have at least one input and at least one output. The links in a toolchain
connect the output of one block to the input of another effectively connecting
algorithms together, thus determining the information-flow through the
toolchain.
Blocks at the beginning of the toolchain are typically connected to datasets,
and blocks at the end of a toolchain are connected to analyzers (special
algorithms with no output). The |project| platform is responsible for
delivering inputs from the desired datasets into the toolchain and through your
algorithms. This drives the synchronization of information-flow through the
toolchain. Flow synchronization is determined by data units produced from a
dataset and injected into the toolchain.
Code for algorithms may be implemented in any programming language supported by
|project|. At present, only two backends have been integrated, supporting Python
and C++, therefore, algorithms are expected to be implemented in one of those
languages. (In future, other backends will be added to |project|.) Python code implementing a certain algorithm can be created using our web-based
:ref:`algorithm editor`. C++ based algorithms must be compiled using a provided
docker container, and uploaded on the platform (see :ref:`binary algorithms`).
|project| treats algorithms as objects that are derived from the class
``Algorithm`` (in Python) or ``IAlgorithm`` (in C++). To define a new algorithm,
at least one method must be implemented:
* ``process()``: the method that actually processes input and produces
outputs.
The code example below illustrates the implementation of an algorithm (in
Python):
.. code-block:: python
:linenos:
class Algorithm:
def process(self, inputs, outputs):
# here, you read inputs, process and write results to outputs
Here is the equivalent example in C++:
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// here, you read inputs, process and write results to outputs
}
};
One particularity of the |project| platform is how the data-flow through a
given toolchain is synchronized. The platform is responsible for extracting
data units (images, speech-segments, videos, etc.) from the database and
presenting them to the input endpoints of certain blocks, as specified in the
toolchain. Each time a new data unit is presented to the input of a block can
be thought of as a individual time-unit. The algorithm implemented in a block
is responsible for the synchronization between its inputs and its output. In
other words, every time a data unit is produced by a dataset on an experiment,
the ``process()`` method of your algorithm is called to act upon it.
An algorithm may have one of two kinds of sychronicities: one-to-one, and
many-to-one. These are discussed in detail in separate sections below.
One-to-one synchronization
--------------------------
Here, the algorithm generates one output for every input entity (e.g., image,
video, speech-file). For example, an image-based feature-extraction algorithm
would typically output one set of features every time it is called with a new
input image. A schematic diagram of one-to-one sychronization for an algorithm
is shown in the figure below:
.. image:: img/case-study-1.*
At the configuration shown in this figure, the algorithm-block has two
endpoints: one input, and one output. The inputs and outputs and the block are
synchronized together (notice the color information). Each red box represents
one input unit (e.g., an image, or a video), that is fed to the input interface
of the block. Corresponding to each input received, the block produces one
output unit, shown as a blue box in the figure.
An example code showing how to implement an algorithm in this configuration is shown below:
.. code-block:: python
:linenos:
class Algorithm:
def process(self, inputs, outputs):
# to read the field "value" on the "in" input, use "data"
# a possible declaration of "user/format/1" would be:
# {
# "value": ...
# }
value = inputs['in'].data.value
# do your processing and create the "output" value
output = magical_processing(value)
# to write "output" into the relevant endpoint use "write"
# a possible declaration of "user/other/1" would be:
# {
# "value": ...
# }
outputs['out'].write({'value': output})
# No error occurred, so return True
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// to read the field "value" on the "in" input, use "data"
// a possible declaration of "user/format/1" would be:
// {
// "value": ...
// }
auto value = inputs["in"]->data<user::format_1>()->value;
// do your processing and create the "output" value
auto output = magical_processing(value);
// to write "output" into the relevant endpoint use "write"
// a possible declaration of "user/other/1" would be:
// {
// "value": ...
// }
auto result = new user::other_1();
result->value = output;
outputs["out"]->write(result);
# No error occurred, so return true
return true;
}
};
In this example, the platform will call the user algorithm every time a new
input block with the format ``user/format/1`` is available at the input. Notice
no ``for`` loops are necessary on the user code. The platform controls the
looping for you.
A more complex case of one-to-one sychronization is shown the following figure:
.. image:: img/case-study-2.*
In such a configuration, the platform will ensure that each input unit at the
input-endpoint ``in`` is associated with the correct input unit at the
input-endpoint ``in2``. For example, referring to the figure above, the items
at the input ``in`` could be images, at the items at the input ``in2`` could be
labels, and the configuration depicted indicates that the first two input
images have the same label, say, ``l1``, whereas the next two input images have
the same label, say, ``l2``. The algorithm produces one output item at the
endpoint ``out``, for each input object presented at endpoint ``in``.
Example code implementing an algorithm processing data in this scenario is
shown below:
.. code-block:: python
:linenos:
class Algorithm:
def process(self, inputs, outputs):
i1 = inputs['in'].data.value
i2 = inputs['in2'].data.value
out = magical_processing(i1, i2)
outputs['out'].write({'value': out})
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
auto i1 = inputs["in"]->data<user::format_1>()->value;
auto i2 = inputs["in2"]->data<user::format_1>()->value;
auto out = magical_processing(i1, i2);
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
return true;
}
};
You should notice that we still don't require any sort of ``for`` loops! The
platform *synchronizes* the inputs ``in`` and ``in2`` so they are available to
your program as the dataset implementor defined.
Many-to-one synchronization
---------------------------
Here, the algorithm produces a single output after processing a batch of
inputs. For example, the algorithm may produce a model for a *dog* after
processing all input images for the *dog* class. A block diagram illustrating
many-to-one synchronization is shown below:
.. image:: img/case-study-3.*
Here the synchronization is driven by the endpoint ``in2``. For each data unit
received at the input ``in2``, the algorithm generates one output unit. Note
that, here, multiple units received at the input ``in`` are accumulated and
associated with a single unit received at ``in2``. The user does not have to
handle the internal indexing. Producing output data at the right moment is
enough for the platform understand the output is synchronized with ``in2``.
The example below illustrates how such an algorithm could be implemented:
.. code-block:: python
:linenos:
class Algorithm:
def __init__(self):
self.objs = []
def process(self, inputs, outputs):
self.objs.append(inputs['in'].data.value) # accumulates
if inputs['in2'].isDataUnitDone():
out = magical_processing(self.objs)
outputs['out'].write({'value': out})
self.objs = [] #reset accumulator for next label
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
objs.push_back(inputs["in"]->data<user::format_1>()->value); // accumulates
if (inputs["in2"]->isDataUnitDone())
{
auto out = magical_processing(objs);
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
objs.clear(); // reset accumulator for next label
}
return true;
}
public:
std::vector<float> objs;
};
Here, the units received at the endpoint ``in`` are accumulated as long as the
``isDataUnitDone()`` method attached to the input ``in2`` returns ``False``.
When ``isDataUnitDone()`` returns ``True``, the corresponding label is read
from ``in2``, and a result is produced at the endpoint ``out``. After an output
unit has been produced, the internal accumulator for ``in`` is cleared, and the
algorithm starts accumulating a new set of objects for the next label.
Unsynchronized Operation
------------------------
Not all inputs for a block need to be synchronized together. In the diagram
shown below, the block is synchronized with the inputs ``in`` and ``in2`` (as indicated by
the green circle which matches the colour of the input lines connecting ``in`` and ``in2``).
The output ``out`` is synchronized with the block (and as one can notice locking at the code
below, outputs signal after every ``in`` input). The input ``in3`` is not
synchronized with the endpoints ``in``, ``in2`` and with the block. A processing block
which receives a previously calculated model and must score test samples is a
good example for this condition. In this case, the user is responsible for
reading out the contents of ``in3`` explicitly.
.. image:: img/case-study-4.*
In this case the algorithm will include an explicit loop to read the
unsynchronized input (``in3``).
.. code-block:: python
:linenos:
class Algorithm:
def __init__(self):
self.models = None
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
while inputs['in3'].hasMoreData():
inputs['in3'].next()
self.models.append(inputs['in3'].data.value)
# Processes the current input in `in' and `in2', apply the
# model/models
out = magical_processing(inputs['in'].data.value,
inputs['in2'].data.value,
self.models)
# Writes the output
outputs.write({'value': out})
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// N.B.: this will be called for every unit in `in'
// Loads the "model" data at the beginning, once
if (models.empty())
{
while (inputs["in3"]->hasMoreData())
{
inputs["in3"]->next();
auto model = inputs["in3"]->data<user::model_1>();
models.push_back(*model);
}
}
// Processes the current input in `in' and `in2', apply the model/models
auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
inputs["in2"]->data<user::format_1>()->value,
models);
// Writes the output
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
return true;
}
public:
std::vector<user::model_1> models;
};
It may happen that you have several inputs which are synchronized together, but
unsynchronized with the block you're writing your algorithm for. In this case,
it is safer to treat inputs using their *group*. For example:
.. code-block:: python
:linenos:
class Algorithm:
def __init__(self):
self.models = None
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
group = inputs.groupOf('in3')
while group.hasMoreData():
group.next() #synchronously advances the data
self.models.append(group['in3'].data.value)
# Processes the current input in `in' and `in2', apply the model/models
out = magical_processing(inputs['in'].data.value,
inputs['in2'].data.value,
self.models)
# Writes the output
outputs.write({'value': out})
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// N.B.: this will be called for every unit in `in'
// Loads the "model" data at the beginning, once
if (models.empty())
{
auto group = inputs->groupOf("in3");
while (group->hasMoreData())
{
group->next(); // synchronously advances the data
auto model = group["in3"]->data<user::model_1>();
models.push_back(*model);
}
}
// Processes the current input in `in' and `in2', apply the model/models
auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
inputs["in2"]->data<user::format_1>()->value,
models);
// Writes the output
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
return true;
}
public:
std::vector<user::model_1> models;
};
In practice, encoding your algorithms using *groups* instead of looping over
individual inputs makes the code more robust to changes.
Algorithms are user-defined piece of software that run within the blocks of a
toolchain. An algorithm can read data on the input(s) of the block and write
processed data on its output(s). For detailed information see :ref:`beat-system-algorithms`
.. _Algorithm Editor:
......@@ -515,20 +46,6 @@ the following:
.. image:: img/SS_algorithms_info.*
.. note:: **Naming Convention**
Algorithms are named using three values joined by a ``/`` (slash) operator:
* **username**: indicates the author of the algorithm
* **name**: indicates the name of the algorithm
* **version**: indicates the version (integer starting from ``1``) of the
algorithm
Each tuple of these three components defines a *unique* algorithm name
inside the platform. For a grasp, you may browse `publicly available
algorithms`_.
Note the search-box and the privacy-filter above the list of algorithms
displayed on the page. You can use these to limit your search. For example,
entering "anjos" in the search-box will allow you to list only those algorithms
......@@ -538,13 +55,15 @@ image below.
.. image:: img/SS_algorithms_anjos_search.*
There are two types of algorithm in the editor: Analyzer, and Splittable.
Analyzer algorithms are special algorithms where the purpose is to generate
There are several options when defining algorithms. They can be *Analyzer*, and *Splittable*.
*Analyzer* algorithms are special algorithms where the purpose is to generate
statistics about the processing results (graphs, means, variances, etc.).
Usually, biometric data processing algorithms are of type Splittable, indicating
Usually, biometric data processing algorithms are *Splittable*, indicating
to the platform that these algorithms can be executed in a distributed fashion,
depending on the available computing resources.
There are also two types of algorithms depending on the way they handle data samples that are fed to them. They can be *Sequential* or *Autonomous*. For more information see :ref:`beat-system-algorithms-types`.
There are two basic ways to create an algorithm at the |project| platform. You
may either start from scratch or fork a new copy of an existing algorithm and edit that.
......@@ -560,7 +79,8 @@ You should see a web-page similar to what is displayed below:
.. image:: img/algorithm_new.*
For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_.
For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_ and see :ref:`beat-system-algorithms-definition-code:
` to understand how to write the source code for new algorithms.
Edit an existing algorithm
......@@ -594,12 +114,14 @@ Editor
To create an algorithm, there are seven sections which are:
* Name: the name of algorithm.
* Algorithm type: Analyzer or Splittable.
* Algorithm option: Analyzer or Splittable.
* Algorithm type: Sequential or Autonomous.
* Language: The language used to implement the algorithm (Python or C++).
* Documentation: This is used to describe your algorithm.
* Inputs / Outputs: Define the properties of the Input and Output endpoints for this algorithm.
* Endpoints: Define the properties of the Input and Output endpoints for this algorithm.
* Parameters: Define the configuration-parameters for the algorithm.
When you have saved the algorithm you can add documentation that describes that algorithm as well.
For Python-based algorithms only:
* Libraries: If there are functions in a library, you can add them for the algorithm to use.
......@@ -642,13 +164,13 @@ Prerequisite: Configure your command-line client
================================================
In order to ensure that your compiled algorithm will works on the |project| platform,
you must compile it using our docker image called *beats/client*. Once downloaded,
you'll need to configure the command-line tool to access your account on the |project|
platform:
you must compile it using our docker image *docker.idiap.ch/beat/beat.env.client*.
Once downloaded, you'll need to configure the command-line tool to access your
account on the |project| platform:
.. code-block:: bash
$ docker run -ti beats/client:0.1.5 bash
$ docker run -ti docker.idiap.ch/beat/beat.env.client:2.0.0r1 bash
/# cd home
/home# beat config set user <your_user_name>
/home# beat config set token "<your_token>"
......
......@@ -36,7 +36,7 @@ configuration page, the declaration of this experiment transmitted to the
scheduler, that now must run the experiment until it finishes, you press the
``stop`` button, or an error condition is produced.
As it is described in the :ref:`toolchains` section, the scheduler first breaks
As it is described in the :ref:`beat-system-toolchains` section, the scheduler first breaks
the toolchain into a sequence of executable blocks with dependencies. For
example: block ``B`` must be run after block ``A``. Each block is then
scheduled for execution depending on current resource availability. If no more
......
......@@ -30,37 +30,15 @@
Data formats specify the transmitted data between the blocks of a toolchain.
They describe the format of the data blocks that circulate between algorithms
and formalize the interaction between algorithms and data sets, so they can
communicate in an orderly manner. Inputs and outputs of the algorithms and
datasets **must** be formally declared. Two algorithms that communicate
directly must produce and consume the **same** type of data objects.
communicate in an orderly manner. For more detailed information see :ref:`beataformats`.
A data format specifies a list of typed fields. An algorithm or data set
generating a block of data (via one of its outputs) must fill all the fields
declared in that data format. An algorithm consuming a block of data (via one
of its inputs) must not expect the presence of any other field than the ones
defined by the data format.
The |project| platform provides a number of pre-defined formats to facilitate
experiments. They are implemented in an extensible way. This allows users to
define their own formats, based on existing ones, while keeping some level of
compatibility with other existing algorithms.
.. note:: **Naming Convention**
.. note::
Data formats are named using three values joined by a ``/`` (slash)
operator:
* **username**: indicates the author of the dataformat
* **name**: an identifier for the object
* **version**: an integer (starting from one), indicating the version of
the object
operator. The first value is the **username**.
Each tuple of these three components defines a *unique* data format name
inside the platform. For example, ``system/float/1``.
The ``system`` user, provides a number of `pre-defined formats such as
integers, booleans, floats and arrays
<https://www.beat-eu.org/platform/dataformats/system/>`_. You may also
The ``system`` user, provides a number of pre-defined formats such as
integers, booleans, floats and arrays (see `here <https://www.beat-eu.org/platform/dataformats/system/>`_). You may also
browse `publicly available data formats`_ to see all available data formats
from the ``system`` and other users.
......@@ -74,218 +52,6 @@ to that data format, like shown on the image below:
.. image:: img/system-defined-info.*
A data format is declared as a JSON_ object with several fields. For example,
the following declaration could represent the coordinates of a bounding box in
a video frame:
.. code-block:: javascript
{
"value": [
0,
{
"frame_id": "uint64",
"height": "int32",
"width": "int32",
"top-left-y": "int32",
"top-left-x": "int32"
}
]
}
The special field ``#description`` can be used to store a short description of
the declared data format. It is ignored in practice and only used for
informational purposes. Each field in a declaration has a well-defined type, as
explained next.
Simple type (primitive object)
==============================
The |project| platform supports the following *primitive* types.
* signed integers: ``int8``, ``int16``, ``int32``, ``int64``
* unsigned integers: ``uint8``, ``uint16``, ``uint32``, ``uint64``
* floating-point numbers: ``float32``, ``float64``
* complex numbers: ``cpmplex64``, ``complex128``
* a boolean value: ``bool``
* a string: ``string``
Aggregation
===========
A data format can be composed of complex objects formed by aggregating other
*declared* types. For example, we could define the positions of the eyes of a
face in an image like this:
.. code-block:: javascript
{
"left": "system/coordinates/1",
"right": "system/coordinates/1"
}
Arrays
======
A field can be a multi-dimensional array of any other type. Here ``array1`` is
declared as a one dimensional array of 10 32-bit signed integers (``int32``)
and ``array2`` as a two-dimensional array with 10 rows and 5 columns of
booleans:
.. code-block:: javascript
{
"array1": [10, "int32"],
"array2": [10, 5, "bool"]
}
An array can have up to 32 dimensions. It can also contain objects (either
declared inline, or using another data format). It is also possible to declare
an array without specifying the number of elements in some of its dimensions,
by using a size of 0 (zero). For example, here is a two-dimensional grayscale
image of unspecified size:
.. code-block:: javascript
{
"value": [0, 0, "uint8"]
}
You may also fix the some of dimensions extent. For example, here is a
possible representation for a three-dimensional RGB image of unspecified size
(width and height):
.. code-block:: javascript
{
"value": [3, 0, 0, "uint8"],
}
In this representation, the image must have 3 color planes (no more, no less).
The width and the height are unspecified.
.. note:: **Unspecified Dimensions**
Because of the way the |project| platform stores data, not all combinations
of unspecified extents will work for arrays. As a rule of thumb, only the
last dimensions may remain unspecified. These are valid:
.. code-block:: javascript
{
"value1": [0, "float64"],
"value2": [3, 0, "float64"],
"value3": [3, 2, 0, "float64"],
"value4": [3, 0, 0, "float64"],
"value5": [0, 0, 0, "float64"]
}
Whereas this would be invalid declarations for arrays:
.. code-block:: javascript
{
"value": [0, 3, "float64"],
"value": [4, 0, 3, "float64"]
}
Object Representation
---------------------
As you'll read in our :ref:`Algorithms` section, data is available via our
backend API to the user algorithms. For example, in Python, the |project|
platform uses NumPy_ data types to pass data to and from algorithms. For
example, when the algorithm reads data for which the format is defined like:
.. code-block:: javascript
{
"value": "float64"
}
The field ``value`` of an instance named ``object`` of this format is
accessible as ``object.value`` and will have the type ``numpy.float64``. If the
format would be, instead:
.. code-block:: javascript
{
"value": [0, 0, "float64"]
}
It would be accessed in the same way (i.e., via ``object.value``), except that
the type would be ``numpy.ndarray`` and ``object.value.dtype`` would be
``numpy.float64``. Naturally, objects which are instances of a format like
this:
.. code-block:: javascript
{
"x": "int32",
"y": "int32"
}
Could be accessed like ``object.x``, for the ``x`` value and ``object.y``, for
the ``y`` value. The type of ``object.x`` and ``object.y`` would be
``numpy.int32``.
Conversely, if you *write* output data in an algorithm, the type of the output
objects are checked for compatibility with respect to the value declared on the
format. For example, this would be a valid use of the format above, in Python:
.. code-block:: python
import numpy
class Algorithm:
def process(self, inputs, outputs):
# read data
# prepares object to be written
myobj = {"x": numpy.int32(4), "y": numpy.int32(6)}
# write it
outputs["point"].write(myobj) #OK!
If you try to write into an object that is supposed to be of type ``int32``, a
``float64`` object, an exception will be raised. For example:
.. code-block:: python
import numpy
class Algorithm:
def process(self, inputs, outputs):
# read data
# prepares object to be written
myobj = {"x": numpy.int32(4), "y": numpy.float64(3.14)}
# write it
outputs["point"].write(myobj) #Error: cannot downcast!
The bottomline is: **all type casting in the platform must be explicit**. It
will not automatically downcast or upcast objects for you as to avoid
unexpected precision loss leading to errors.
Editing Operations
......
......@@ -37,23 +37,6 @@ such as different databases and algorithms. Each experiment has its own
:ref:`toolchains` which cannot be changed after the experiment is created.
Experiments can be shared and forked, to ensure maximum re-usability.
.. note:: **Naming Convention**
Experiments are named using five values joined by a ``/`` (slash)
operator:
* **username**: indicates the author of the experiment
* **toolchain username**: indicates the author of the toolchain used for
that experiment
* **toolchain name**: indicates the name of the toolchain used for that
experiment
* **toolchain version**: indicates the version (integer starting from
``1``) of the toolchain used for the experiment
* **name**: an identifier for the object
Each tuple of these five components defines a *unique* experiment name
inside the platform. For a grasp, you may browse `publicly available
experiments`_.
Displaying an existing experiment
......@@ -111,7 +94,7 @@ These icons represent the following options (from left to right):
* red cross: delete the experiment
* blue tag: rename the experiment
* gold medal: request attestation
* circular arrow: reset the experiment
* circular arrow: reset the experiment (if some of the blocks in the experiment have been ran before the platform will use the cache available for the outputs of those blocks)
* ``fork``: fork a new, editable copy of this experiment
* page: add experiment to report
* blue lens: search for similar experiments
......@@ -193,22 +176,11 @@ toolchain:
results. Options for this block are similar for normal blocks.
.. note:: **Algorithms, Datasets and Blocks**
While configuring the experiment, your objective is to fill-in all
containers defined by the toolchain with valid datasets and algorithms or
analyzers. **The platform will check connected datasets, algorithms and
analyzers produce or consume data in the right format**. It only presents
options which are *compatible* with adjacent blocks.
.. note::
For example, if you chose dataset ``A`` for block ``train`` of your
experiment that outputs objects in the format ``user/format/1``, then the
algorithm running on the block following ``train``, **must** consume
``user/format/1`` on its input. Therefore, the choices for algorithms that
can run after ``train`` become limited at the moment you chose the dataset
``A``. The configuration system will *dynamically* update to take those
constraints into consideration everytime you make a selection, increasing
the global constraints for the experiment.
As mentioned in :ref:`beat-system-experiments-blocks`, BEAT checks that connected datasets, algorithms and
analyzers produce or consume data in the right format. It only presents
options which are *compatible* with adjacent blocks.
Tip: If you reach a situation where no algorithms are available for a given
block, reset the experiment and try again, making sure the algorithms you'd
......
......@@ -38,8 +38,7 @@ provides an attestation mechanism for your reports (scientific papers,
technical documents or certifications).
This guide contains detailed and illustrated information on how to interact
with the |project| platform using its web interface. It is the primary resource
for information concerning how to use and run evaluations using the platform.
with the |project| platform using its web interface. Before you continue with this guide you should familiar yourself with different components of BEAT (see `Getting Started with BEAT`_).
In order to take full advantage of the guide, we recommend you register_ into
the platform and follow the tutorials in the order defined in this guide.
......
......@@ -31,23 +31,11 @@ Libraries
functions. Instead of re-implementing every function from scratch, you can
reuse functions already implemented by other users and published in the form of
|project| libraries. Similarly, you can create and publish your own libraries
of functions that you consider may be useful to other users.
of functions that you consider may be useful to other users. For more information see :ref:`beat-system-libraries`
Usage of libraries in encouraged in the |project| platform. Besides saving you
time and effort, this also promotes reproducibility in research.
.. note:: **Naming Convention**
Libraries are named using three values joined by a ``/`` (slash) operator:
* **username**: indicates the author of the library
* **name**: indicates the name of the library
* **version**: indicates the version (integer starting from ``1``) of the
library
Each tuple of these three components defines a *unique* name inside the
platform. For a grasp, you may browse `publicly available libraries`_.
You can access the Libraries section from your home-page on |project| by
clicking the ``User Resources`` tab and selecting ``Libraries`` from the
drop-down list. You should see a page similar to that shown below:
......@@ -98,8 +86,7 @@ To create a library you will need to provide the following information:
Of course, functions implemented in a new library may also call functions from
other shared libraries in the |project| platform. You can indicate the
dependencies on other libraries via the ``External library usage`` section (to
open this section, click on the ``v`` symbol on the right).
dependencies on other libraries via the ``External library usage`` section.
To save your work, click on the green ``Save`` button (in the top-right region
of the page). After you have saved your library, you will be able to use
......
......@@ -93,8 +93,7 @@ Script
* Outcome: The library will be saved, Edit and Delete buttons will be appeared on
the right-top corner.
11. Say: "To share the library, click on the 'Share' button. A pop-up window
specifying sharing preferences will appear"
11. Say: "To share the library, click on the 'Share' button. A pop-up window specifying sharing preferences will appear"
* Action: Click on the sharing button for a private search (the one you
saved before)
......@@ -108,4 +107,4 @@ Script
* Action: Click on the 'Public' radio box, click on 'Share it'
* Outcome: A pop-up window says the Search is now shared.
13. END OF THE CLIP
13. END OF THE CLIP
\ No newline at end of file
......@@ -51,3 +51,8 @@
.. _numpy: http://www.numpy.org/
.. _our gitlab repository: https://gitlab.idiap.ch/beat/
.. _gnu affero gpl v3 license: http://www.gnu.org/licenses/agpl-3.0.en.html
.. _Getting Started with BEAT: https://www.idiap.ch/software/beat/docs/beat/docs/master/beat/introduction.html
.. _Algorithms: https://www.idiap.ch/software/beat/docs/beat/docs/master/beat/algorithms.html
.. _Experiments: https://www.idiap.ch/software/beat/docs/beat/docs/master/beat/experiments.html#beat-system-experiments
.. _Toolchains: https://www.idiap.ch/software/beat/docs/beat/docs/master/beat/toolchains.html#beat-system-toolchains
.. _Dataformats: https://www.idiap.ch/software/beat/docs/beat/docs/master/beat/dataformats.html#beat-system-dataformats
......@@ -35,20 +35,20 @@ reproducible and experiments share parts with each other as much as possible.
For example in a simple experiment, the database, the algorithm, and the
environment used to run the experiment can be shared between experiments.
A fundamental part of the :ref:`experiments` in the |project| platform is a
A fundamental part of the `Experiments`_ in the |project| platform is a
toolchain. You can see an example of a toolchain below:
.. image:: img/toolchain.*
:ref:`toolchains` are sequences of blocks and links (like a block diagram)
`Toolchains`_ are sequences of blocks and links (like a block diagram)
which represent the data flow on an experiment. Once you have defined the
toolchain against which you'd like to run, the experiment can be further
configured by assigning different datasets, algorithms and analyzers to the
different toolchain blocks.
The data that circulates at each toolchain connection in a configured
experiment is formally defined using :ref:`dataformats`. The platform
experiment is formally defined using `Dataformats`_. The platform
experiment configuration system will prevent you from using incompatible
data formats between connecting blocks. For example, in the toolchain depicted
above, if one decides to use a dataset on the block ``train`` (top-left of the
......@@ -68,7 +68,7 @@ toolchain, datasets, and algorithms.
:ref:`faq` for more information.
- You can learn about how to develop new algorithms or change existing ones by
looking at our :ref:`algorithms` section. A special kind of algorithms are
looking at our `Algorithms`_ section. A special kind of algorithms are
result *analyzers* that are used to generate the results of the experiments.
You can check the results of an experiment once it finishes and share the
......
......@@ -110,16 +110,14 @@ Script
to the report (2 experiments matching the analyzer and one failing)
* Outcome: Added 2 (out of 3 in total) experiment(s) to report
10. Say: "Let's get back to our report list we can see that our report has now 3 experiments. It is
now time to add some interesting tables and figures to our report."
10. Say: "Let's get back to our report list we can see that our report has now 3 experiments. It is now time to add some interesting tables and figures to our report."
* Action: click on "User Resources: Reports" point at the 3 experiments in the report, then click
on the report "myfirstreport"
* Outcome: the empty report will be displayed.
* Action: click on "User Resources: Reports" point at the 3 experiments in the report, then click
on the report "myfirstreport"
* Outcome: the empty report will be displayed.
11. Say: "Some general information is displayed such as the unique report id of this report for
review and publication purpose, the date of creation, the status of the report, currently Editable,
and the common analyzer among the experiments of the report.
review and publication purpose, the date of creation, the status of the report, currently Editable, and the common analyzer among the experiments of the report.
On the right side 4 action buttons are also displayed in order to
let you delete the report, save the report after doing some additional changes, lock your
report for review and finally the unique report id link".
......@@ -134,14 +132,11 @@ Script
if you wish to export the data in a csv format, you have this possibility with the 'Export Table'
button".
* Action: Click on 'Add a report item', select 'Table/Results' and select
a few data you would like to see on your figure, then point out with the mouse the different action
buttons
* Outcome: a table shoud appear.
* Action: Click on 'Add a report item', select 'Table/Results' and select
a few data you would like to see on your figure, then point out with the mouse the different action buttons
* Outcome: a table shoud appear.
13. Say: "Now let's add figure to this report. Let's click on 'Add a report item' and let's
select Figure and 'scores_distribution' we can see that a figure has appeared below the
previously created table.
13. Say: "Now let's add figure to this report. Let's click on 'Add a report item' and let's select Figure and 'scores_distribution' we can see that a figure has appeared below the previously created table.
Some action buttons let us delete or export the figure in PNG, JPEG or PDF format.
If many plotters or plotters parameters are available, you will have the possibility
to modify this in order to suit your needs with the plot.
......
......@@ -58,7 +58,7 @@ This is a panel with two buttons. The green button which says ``Show``, makes a
pop-up window appear showing your current API token. You may use this token
(64-byte character string) in outside programs that communicate with the
platform programmatically. For example, our command-line interface requires a
token to be able to pull/push contributions for the user.
token to be able to pull/push contributions for the user (see :ref:`beat-cmdline-configuration`).
If your token is compromised, you may change it by clicking on the ``Modify``
button. A pop-up window will appear confirming the modification. You may cancel
......
......@@ -28,94 +28,7 @@
============
Toolchains are the backbone of experiments within the |project| platform. They
determine the data flow for experiments in the |project| platform.
You can see an example toolchain for a toy-`eigenface`_ system on the image
below.
.. image:: img/eigenfaces.*
From this block diagram, the platform can identify all it requires to conduct
an experiment with this workflow:
* There are three types of blocks:
1. **Dataset blocks** (light yellow, left side): are the input blocks of a
toolchain. They only have outputs.
2. **Regular blocks** (gray): represent processing steps of the toolchain.
3. **Analysis blocks** (light blue, right side): is the output of a
toolchain. They only have inputs.
* Each block defines *place holders* for datasets and algorithms to be
inserted when the user wants to execute an experiment based on such a
toolchain (see the :ref:`experiments` section).
* Each block is linked to the next one via a **connection**. The sequence of
blocks in a toolchain and their connectivity defines a natural data flow.
Data is output by data sets on the left and flow to the right until a
result is produced.
* Each dataset block (light yellow, left side) define a unique
*synchronization channel*, which is encoded in the platform via a color.
For example, the sychronization channel ``train`` is blue. The
synchronization channel ``templates`` is green and, finally, the
synchronization channel ``probes`` is red.
* Each regular or analysis block on the toolchain respects exactly one of
these synchronization channels. This is indicated by the colored circle on
the top-right of each block. For example, the block called ``scoring`` is
said to be *synchronized with* the ``probes`` channel.
When a block is synchronized with a channel, it means the platform will
iterate on that channel contents when calling the user algorithm on that
block. For example, the block ``linear_(machine)_training``, on the top
left of the image, following the data set block ``train``, is synchronized
with that dataset block. Therefore, it will be executed as many times as
the dataset block outputs objects through its ``image`` output. I.e., the
``linear_machine_training`` block *loops* or *iterates* over the ``train``
data.
Notice, the toolchain does not define what an ``image`` will be. That is
defined by the concrete dataset implementation chosen by the user when an
experiment is constructed. The block ``linear_machine_training`` also does not
define which type of images it can input. That is defined by the algorithm
chosen by the user when an experiment is constructed. For example, if the user
chooses a data set that outputs objects with the data format
``system/array_2d_uint8/1`` objects then, an algorithm that can input those
types of objects must be chosen for the block following that dataset. Don't
worry! The |project| platform experiment configuration will check that for you!
The order of execution can also be abstracted from this diagram. We sketched
that for you in this overlay:
.. image:: img/eigenfaces-ordered.*
The backend processing farm will first "push" the data out of the datasets. It
will then run the code on the block ``linear_machine_training`` (numbered 2).
The blocks ``template_builder`` and ``probe_builder`` are then ready to run.
The platform may choose to run them at the *same time* if enough computing
resources are available. The ``scoring`` block runs by fourth. The last block
to be executed is the ``analysis`` block. In the figure above, you can also see
marked what is the channel data in which the block *loops* on. When you read
about :ref:`Algorithms`, you'll understand, concretely, how synchronization is
handled on algorithm code.
.. note:: **Naming Convention**
Toolchains are named using three values joined by a ``/`` (slash) operator:
* **username**: indicates the author of the toolchain
* **name**: indicates the name of the toolchain
* **version**: indicates the version (integer starting from ``1``) of the
toolchain
Each tuple of these three components defines a *unique* toolchain name
inside the platform. For a grasp, you may browse `publicly available
toolchains`_.
determine the data flow for experiments in the |project| platform. For more information about toolchanis see `Toolchains`.
The *Toolchains* tab
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment