Skip to content
Snippets Groups Projects
Commit e969f824 authored by Zohreh MOSTAANI's avatar Zohreh MOSTAANI
Browse files

[web][doc] remove more extra infor from platform user guide

parent ccf0ac54
No related branches found
No related tags found
1 merge request!265merge new documentation to master
......@@ -27,478 +27,9 @@
Algorithms
============
Graphically represented, :ref:`toolchains` look like a set of interconnected
blocks. As illustrated in the figure below, each block can accommodate one
*Algorithm*, along with the necessary input and output interfaces. We also
refer to the inputs and outputs collectively as *endpoints*.
.. image:: img/block.*
Typically, an algorithm will process data units received at the input
endpoints, and push the relevant results to the output endpoint. Each algorithm
must have at least one input and at least one output. The links in a toolchain
connect the output of one block to the input of another effectively connecting
algorithms together, thus determining the information-flow through the
toolchain.
Blocks at the beginning of the toolchain are typically connected to datasets,
and blocks at the end of a toolchain are connected to analyzers (special
algorithms with no output). The |project| platform is responsible for
delivering inputs from the desired datasets into the toolchain and through your
algorithms. This drives the synchronization of information-flow through the
toolchain. Flow synchronization is determined by data units produced from a
dataset and injected into the toolchain.
Code for algorithms may be implemented in any programming language supported by
|project|. At present, only two backends have been integrated, supporting Python
and C++, therefore, algorithms are expected to be implemented in one of those
languages. (In future, other backends will be added to |project|.) Python code implementing a certain algorithm can be created using our web-based
:ref:`algorithm editor`. C++ based algorithms must be compiled using a provided
docker container, and uploaded on the platform (see :ref:`binary algorithms`).
|project| treats algorithms as objects that are derived from the class
``Algorithm`` (in Python) or ``IAlgorithm`` (in C++). To define a new algorithm,
at least one method must be implemented:
* ``process()``: the method that actually processes input and produces
outputs.
The code example below illustrates the implementation of an algorithm (in
Python):
.. code-block:: python
:linenos:
class Algorithm:
def process(self, inputs, outputs):
# here, you read inputs, process and write results to outputs
Here is the equivalent example in C++:
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// here, you read inputs, process and write results to outputs
}
};
One particularity of the |project| platform is how the data-flow through a
given toolchain is synchronized. The platform is responsible for extracting
data units (images, speech-segments, videos, etc.) from the database and
presenting them to the input endpoints of certain blocks, as specified in the
toolchain. Each time a new data unit is presented to the input of a block can
be thought of as a individual time-unit. The algorithm implemented in a block
is responsible for the synchronization between its inputs and its output. In
other words, every time a data unit is produced by a dataset on an experiment,
the ``process()`` method of your algorithm is called to act upon it.
An algorithm may have one of two kinds of sychronicities: one-to-one, and
many-to-one. These are discussed in detail in separate sections below.
One-to-one synchronization
--------------------------
Here, the algorithm generates one output for every input entity (e.g., image,
video, speech-file). For example, an image-based feature-extraction algorithm
would typically output one set of features every time it is called with a new
input image. A schematic diagram of one-to-one sychronization for an algorithm
is shown in the figure below:
.. image:: img/case-study-1.*
At the configuration shown in this figure, the algorithm-block has two
endpoints: one input, and one output. The inputs and outputs and the block are
synchronized together (notice the color information). Each red box represents
one input unit (e.g., an image, or a video), that is fed to the input interface
of the block. Corresponding to each input received, the block produces one
output unit, shown as a blue box in the figure.
An example code showing how to implement an algorithm in this configuration is shown below:
.. code-block:: python
:linenos:
class Algorithm:
def process(self, inputs, outputs):
# to read the field "value" on the "in" input, use "data"
# a possible declaration of "user/format/1" would be:
# {
# "value": ...
# }
value = inputs['in'].data.value
# do your processing and create the "output" value
output = magical_processing(value)
# to write "output" into the relevant endpoint use "write"
# a possible declaration of "user/other/1" would be:
# {
# "value": ...
# }
outputs['out'].write({'value': output})
# No error occurred, so return True
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// to read the field "value" on the "in" input, use "data"
// a possible declaration of "user/format/1" would be:
// {
// "value": ...
// }
auto value = inputs["in"]->data<user::format_1>()->value;
// do your processing and create the "output" value
auto output = magical_processing(value);
// to write "output" into the relevant endpoint use "write"
// a possible declaration of "user/other/1" would be:
// {
// "value": ...
// }
auto result = new user::other_1();
result->value = output;
outputs["out"]->write(result);
# No error occurred, so return true
return true;
}
};
In this example, the platform will call the user algorithm every time a new
input block with the format ``user/format/1`` is available at the input. Notice
no ``for`` loops are necessary on the user code. The platform controls the
looping for you.
A more complex case of one-to-one sychronization is shown the following figure:
.. image:: img/case-study-2.*
In such a configuration, the platform will ensure that each input unit at the
input-endpoint ``in`` is associated with the correct input unit at the
input-endpoint ``in2``. For example, referring to the figure above, the items
at the input ``in`` could be images, at the items at the input ``in2`` could be
labels, and the configuration depicted indicates that the first two input
images have the same label, say, ``l1``, whereas the next two input images have
the same label, say, ``l2``. The algorithm produces one output item at the
endpoint ``out``, for each input object presented at endpoint ``in``.
Example code implementing an algorithm processing data in this scenario is
shown below:
.. code-block:: python
:linenos:
class Algorithm:
def process(self, inputs, outputs):
i1 = inputs['in'].data.value
i2 = inputs['in2'].data.value
out = magical_processing(i1, i2)
outputs['out'].write({'value': out})
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
auto i1 = inputs["in"]->data<user::format_1>()->value;
auto i2 = inputs["in2"]->data<user::format_1>()->value;
auto out = magical_processing(i1, i2);
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
return true;
}
};
You should notice that we still don't require any sort of ``for`` loops! The
platform *synchronizes* the inputs ``in`` and ``in2`` so they are available to
your program as the dataset implementor defined.
Many-to-one synchronization
---------------------------
Here, the algorithm produces a single output after processing a batch of
inputs. For example, the algorithm may produce a model for a *dog* after
processing all input images for the *dog* class. A block diagram illustrating
many-to-one synchronization is shown below:
.. image:: img/case-study-3.*
Here the synchronization is driven by the endpoint ``in2``. For each data unit
received at the input ``in2``, the algorithm generates one output unit. Note
that, here, multiple units received at the input ``in`` are accumulated and
associated with a single unit received at ``in2``. The user does not have to
handle the internal indexing. Producing output data at the right moment is
enough for the platform understand the output is synchronized with ``in2``.
The example below illustrates how such an algorithm could be implemented:
.. code-block:: python
:linenos:
class Algorithm:
def __init__(self):
self.objs = []
def process(self, inputs, outputs):
self.objs.append(inputs['in'].data.value) # accumulates
if inputs['in2'].isDataUnitDone():
out = magical_processing(self.objs)
outputs['out'].write({'value': out})
self.objs = [] #reset accumulator for next label
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
objs.push_back(inputs["in"]->data<user::format_1>()->value); // accumulates
if (inputs["in2"]->isDataUnitDone())
{
auto out = magical_processing(objs);
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
objs.clear(); // reset accumulator for next label
}
return true;
}
public:
std::vector<float> objs;
};
Here, the units received at the endpoint ``in`` are accumulated as long as the
``isDataUnitDone()`` method attached to the input ``in2`` returns ``False``.
When ``isDataUnitDone()`` returns ``True``, the corresponding label is read
from ``in2``, and a result is produced at the endpoint ``out``. After an output
unit has been produced, the internal accumulator for ``in`` is cleared, and the
algorithm starts accumulating a new set of objects for the next label.
Unsynchronized Operation
------------------------
Not all inputs for a block need to be synchronized together. In the diagram
shown below, the block is synchronized with the inputs ``in`` and ``in2`` (as indicated by
the green circle which matches the colour of the input lines connecting ``in`` and ``in2``).
The output ``out`` is synchronized with the block (and as one can notice locking at the code
below, outputs signal after every ``in`` input). The input ``in3`` is not
synchronized with the endpoints ``in``, ``in2`` and with the block. A processing block
which receives a previously calculated model and must score test samples is a
good example for this condition. In this case, the user is responsible for
reading out the contents of ``in3`` explicitly.
.. image:: img/case-study-4.*
In this case the algorithm will include an explicit loop to read the
unsynchronized input (``in3``).
.. code-block:: python
:linenos:
class Algorithm:
def __init__(self):
self.models = None
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
while inputs['in3'].hasMoreData():
inputs['in3'].next()
self.models.append(inputs['in3'].data.value)
# Processes the current input in `in' and `in2', apply the
# model/models
out = magical_processing(inputs['in'].data.value,
inputs['in2'].data.value,
self.models)
# Writes the output
outputs.write({'value': out})
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// N.B.: this will be called for every unit in `in'
// Loads the "model" data at the beginning, once
if (models.empty())
{
while (inputs["in3"]->hasMoreData())
{
inputs["in3"]->next();
auto model = inputs["in3"]->data<user::model_1>();
models.push_back(*model);
}
}
// Processes the current input in `in' and `in2', apply the model/models
auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
inputs["in2"]->data<user::format_1>()->value,
models);
// Writes the output
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
return true;
}
public:
std::vector<user::model_1> models;
};
It may happen that you have several inputs which are synchronized together, but
unsynchronized with the block you're writing your algorithm for. In this case,
it is safer to treat inputs using their *group*. For example:
.. code-block:: python
:linenos:
class Algorithm:
def __init__(self):
self.models = None
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
group = inputs.groupOf('in3')
while group.hasMoreData():
group.next() #synchronously advances the data
self.models.append(group['in3'].data.value)
# Processes the current input in `in' and `in2', apply the model/models
out = magical_processing(inputs['in'].data.value,
inputs['in2'].data.value,
self.models)
# Writes the output
outputs.write({'value': out})
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// N.B.: this will be called for every unit in `in'
// Loads the "model" data at the beginning, once
if (models.empty())
{
auto group = inputs->groupOf("in3");
while (group->hasMoreData())
{
group->next(); // synchronously advances the data
auto model = group["in3"]->data<user::model_1>();
models.push_back(*model);
}
}
// Processes the current input in `in' and `in2', apply the model/models
auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
inputs["in2"]->data<user::format_1>()->value,
models);
// Writes the output
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
return true;
}
public:
std::vector<user::model_1> models;
};
In practice, encoding your algorithms using *groups* instead of looping over
individual inputs makes the code more robust to changes.
Algorithms are user-defined piece of software that run within the blocks of a
toolchain. An algorithm can read data on the input(s) of the block and write
processed data on its output(s). For detailed information see :ref:`beat-system-algorithms`
.. _Algorithm Editor:
......@@ -515,20 +46,6 @@ the following:
.. image:: img/SS_algorithms_info.*
.. note:: **Naming Convention**
Algorithms are named using three values joined by a ``/`` (slash) operator:
* **username**: indicates the author of the algorithm
* **name**: indicates the name of the algorithm
* **version**: indicates the version (integer starting from ``1``) of the
algorithm
Each tuple of these three components defines a *unique* algorithm name
inside the platform. For a grasp, you may browse `publicly available
algorithms`_.
Note the search-box and the privacy-filter above the list of algorithms
displayed on the page. You can use these to limit your search. For example,
entering "anjos" in the search-box will allow you to list only those algorithms
......@@ -538,13 +55,15 @@ image below.
.. image:: img/SS_algorithms_anjos_search.*
There are two types of algorithm in the editor: Analyzer, and Splittable.
Analyzer algorithms are special algorithms where the purpose is to generate
There are several options when defining algorithms. They can be *Analyzer*, and *Splittable*.
*Analyzer* algorithms are special algorithms where the purpose is to generate
statistics about the processing results (graphs, means, variances, etc.).
Usually, biometric data processing algorithms are of type Splittable, indicating
Usually, biometric data processing algorithms are *Splittable*, indicating
to the platform that these algorithms can be executed in a distributed fashion,
depending on the available computing resources.
There are also two types of algorithms depending on the way they handle data samples that are fed to them. They can be *Sequential* or *Autonomous*. For more information see :ref:`beat-system-algorithms-types`.
There are two basic ways to create an algorithm at the |project| platform. You
may either start from scratch or fork a new copy of an existing algorithm and edit that.
......@@ -560,7 +79,8 @@ You should see a web-page similar to what is displayed below:
.. image:: img/algorithm_new.*
For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_.
For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_ and see :ref:`beat-system-algorithms-definition-code:
` to understand how to write the source code for new algorithms.
Edit an existing algorithm
......@@ -594,12 +114,14 @@ Editor
To create an algorithm, there are seven sections which are:
* Name: the name of algorithm.
* Algorithm type: Analyzer or Splittable.
* Algorithm option: Analyzer or Splittable.
* Algorithm type: Sequential or Autonomous.
* Language: The language used to implement the algorithm (Python or C++).
* Documentation: This is used to describe your algorithm.
* Inputs / Outputs: Define the properties of the Input and Output endpoints for this algorithm.
* Endpoints: Define the properties of the Input and Output endpoints for this algorithm.
* Parameters: Define the configuration-parameters for the algorithm.
When you have saved the algorithm you can add documentation that describes that algorithm as well.
For Python-based algorithms only:
* Libraries: If there are functions in a library, you can add them for the algorithm to use.
......
......@@ -36,7 +36,7 @@ configuration page, the declaration of this experiment transmitted to the
scheduler, that now must run the experiment until it finishes, you press the
``stop`` button, or an error condition is produced.
As it is described in the :ref:`toolchains` section, the scheduler first breaks
As it is described in the :ref:`beat-system-toolchains` section, the scheduler first breaks
the toolchain into a sequence of executable blocks with dependencies. For
example: block ``B`` must be run after block ``A``. Each block is then
scheduled for execution depending on current resource availability. If no more
......
......@@ -38,8 +38,7 @@ communicate in an orderly manner. For more detailed information see :ref:`beat-s
operator. The first value is the **username**.
The ``system`` user, provides a number of pre-defined formats such as
integers, booleans, floats and arrays
<https://www.beat-eu.org/platform/dataformats/system/>`_. You may also
integers, booleans, floats and arrays (see `here <https://www.beat-eu.org/platform/dataformats/system/>`_). You may also
browse `publicly available data formats`_ to see all available data formats
from the ``system`` and other users.
......
......@@ -31,23 +31,11 @@ Libraries
functions. Instead of re-implementing every function from scratch, you can
reuse functions already implemented by other users and published in the form of
|project| libraries. Similarly, you can create and publish your own libraries
of functions that you consider may be useful to other users.
of functions that you consider may be useful to other users. For more information see :ref:`beat-system-libraries`
Usage of libraries in encouraged in the |project| platform. Besides saving you
time and effort, this also promotes reproducibility in research.
.. note:: **Naming Convention**
Libraries are named using three values joined by a ``/`` (slash) operator:
* **username**: indicates the author of the library
* **name**: indicates the name of the library
* **version**: indicates the version (integer starting from ``1``) of the
library
Each tuple of these three components defines a *unique* name inside the
platform. For a grasp, you may browse `publicly available libraries`_.
You can access the Libraries section from your home-page on |project| by
clicking the ``User Resources`` tab and selecting ``Libraries`` from the
drop-down list. You should see a page similar to that shown below:
......@@ -98,8 +86,7 @@ To create a library you will need to provide the following information:
Of course, functions implemented in a new library may also call functions from
other shared libraries in the |project| platform. You can indicate the
dependencies on other libraries via the ``External library usage`` section (to
open this section, click on the ``v`` symbol on the right).
dependencies on other libraries via the ``External library usage`` section.
To save your work, click on the green ``Save`` button (in the top-right region
of the page). After you have saved your library, you will be able to use
......
......@@ -58,7 +58,7 @@ This is a panel with two buttons. The green button which says ``Show``, makes a
pop-up window appear showing your current API token. You may use this token
(64-byte character string) in outside programs that communicate with the
platform programmatically. For example, our command-line interface requires a
token to be able to pull/push contributions for the user.
token to be able to pull/push contributions for the user (see :ref:`beat-cmdline-configuration`).
If your token is compromised, you may change it by clicking on the ``Modify``
button. A pop-up window will appear confirming the modification. You may cancel
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment