Skip to content
Snippets Groups Projects

Algorithm api v2

Merged Samuel GAIST requested to merge algorithm_api_v2 into new
+ 263
49
@@ -41,11 +41,39 @@ defined at a higher-level in the platform. It is expected that the
implementation of the algorithm respects whatever was declared on the
platform.
By default, the algorithm is **data-driven**; algorithm is typically provided
.. _beat-system-algorithms-types:
Algorithm types
---------------
Previous versions of the beat.core package implemented only one type of
algorithm (referred to as v1 algorithm). The current version of beat.core
defines two type of algorithms:
- Sequential
- Autonomous
The sequential algorithm type is the direct successor of the v1 algorithm. For
migration information, see :ref:`_beat-system-algorithms-api-migration`.
Sequential
..........
The sequential algorithm is **data-driven**; algorithm is typically provided
one data sample at a time and must immediately produce some output data.
Autonomous
..........
The autonomous algorithm as its name suggest is responsible for loading the data
samples it needs in order to do its work. He's also responsible for writing the
appropriate amount of data on its outputs.
Furthermore, the way the algorithm handle the data is highly configurable and
covers a huge range of possible scenarios.
:numref:`beat-system-overview-block` displays the relationship between a
processing block and its algorithm.
@@ -84,7 +112,10 @@ probabilistic component analysis (PCA):
.. code-block:: javascript
{
"schema_version": 2,
"language": "python",
"api_version": 2,
"type": "sequential",
"splittable": false,
"groups": [
{
@@ -109,23 +140,27 @@ probabilistic component analysis (PCA):
"description": "Principal Component Analysis (PCA)"
}
The field `language` specifies the language in which the algorithm is
implemented. The field `splittable` indicates, whether the algorithm can be
parallelized into chunks or not. The field `parameters` lists the parameters
of the algorithm, describing both default values and their types. The field
`groups` gives information about the inputs and outputs of the algorithm.
They are provided into a list of dictionary, each element in this list being
associated to a database `channel`. The group, which contains outputs, is
the **synchronization channel**. By default, a loop is automatically performs
The field `schema_version` specifies which schema version must be used to
validate the file content. The field `api_version` specifies the version of the
API implemented by the algorithm. The field `type` specifies the type of the
algorithm. Depending on that, the execution model will change. The field
`language` specifies the language in which the algorithm is implemented. The
field `splittable` indicates, whether the algorithm can be parallelized into
chunks or not. The field `parameters` lists the parameters of the algorithm,
describing both default values and their types. The field `groups` gives
information about the inputs and outputs of the algorithm. They are provided
into a list of dictionary, each element in this list being associated to a
database `channel`. The group, which contains outputs, is the
**synchronization channel**. By default, a loop is automatically performed
by the platform on the synchronization channel, and user-code must not loop
on this group. In contrast, it is the responsibility of the user to load data
from the other groups. This is described in more details in the following
subsections. Finally, the field `description` is optional and gives a short
description of the algorithm.
The graphical interface of BEAT provides editor for algorithm,
which simplifies its `JSON`_ declaration definition. It also includes a simple
Python code editor.
The graphical interface of BEAT provides editors for the main components of the
system (for example: algorithms, data formats, etc.), which simplifies their
`JSON`_ declaration definition.
.. _beat-system-algorithms-definition-analyzer:
@@ -138,10 +173,10 @@ kind of algorithm, which does not yield any output, but in contrast so called
`results`. These algorithms are called **analyzers**.
`Results` of an experiment are reported back to the user. Data privacy is very
important in the BEAT system and therefore only a limited number of data formats can be
employed as results in an analyzer, such as boolean, integers, floating point
values, strings (of limited size), as well as plots (such as scatter or bar
plots).
important in the BEAT framework and therefore only a limited number of data
formats can be employed as results in an analyzer, such as boolean, integers,
floating point values, strings (of limited size), as well as plots (such as
scatter or bar plots).
For example, the following declaration is the one of a simple analyzer, which
generates an ROC curve as well as few other metrics.
@@ -213,10 +248,10 @@ conventions. In the following, examples of such classes are provided.
Examples
--------
.. _beat-system-algorithms-examples-simple:
.. _beat-system-algorithms-examples-simple-sequential:
Simple algorithm (no parametrization)
.....................................
Simple sequential algorithm (no parametrization)
................................................
At the very minimum, an algorithm class must look like this:
@@ -224,7 +259,7 @@ At the very minimum, an algorithm class must look like this:
class Algorithm:
def process(self, inputs, outputs):
def process(self, inputs, data_loaders, outputs):
# Read data from inputs, compute something, and write the result
# of the computation on outputs
...
@@ -232,21 +267,51 @@ At the very minimum, an algorithm class must look like this:
The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of inputs (see section
:ref:`beat-system-algorithms-input-inputlist`) and a list of outputs (see
section :ref:`beat-system-algorithms-output-outputlist`). This method must
:ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section
:ref:`beat-system-algorithms-dataloader-dataloaderlist`) and a list of outputs
(see section :ref:`beat-system-algorithms-output-outputlist`). This method must
return ``True`` if everything went correctly, and ``False`` if an error
occurred.
The platform will call this method once per block of data available on the
`synchronized` inputs of the block.
.. _beat-system-algorithms-examples-simple-autonomous:
.. _beat-system-algorithms-examples-parametrizable:
Simple autonomous algorithm (no parametrization)
................................................
At the very minimum, an algorithm class must look like this:
.. code-block:: python
class Algorithm:
def process(self, data_loaders, outputs):
# Read data from data_loaders, compute something, and write the
# result of the computation on outputs
...
return True
The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of data loader (see section
:ref:`beat-system-algorithms-dataloaders`) and a list of outputs (see
section :ref:`beat-system-algorithms-output-outputlist`). This method must
return ``True`` if everything went correctly, and ``False`` if an error
occurred.
The platform will call this method only once as it is its responsibility to load
the appropriate amount of data and process it.
.. _beat-system-algorithms-examples-parameterizable:
Parameterizable algorithm
.........................
........................
The following is valid for all types of algorithms
To implement a parameterizable algorithm, two things must be added to the class:
To implement a parametrizable algorithm, two things must be added to the class:
(1) a field in the JSON declaration of the algorithm containing their default
values as well as the type of the parameters, and (2) a method called
``setup()``, that takes one argument, a map containing the parameters of the
@@ -274,7 +339,7 @@ algorithm.
self.threshold = parameters['threshold']
return True
def process(self, inputs, outputs):
def process(self, inputs, data_loaders, outputs):
# Read data from inputs, compute something, and write the result
# of the computation on outputs
...
@@ -284,6 +349,36 @@ When retrieving the value of the parameters, one must not assume that a value
was provided for each parameter. This is why we may use a *try: ... except: ...*
construct in the ``setup()`` method.
.. _beat-system-algorithms-preparation:
Preparation of an algorithm
...........................
The following is valid for all types of algorithms
Often algorithms need to compute some values or retrieve some data prior to
applying their mathematical logic.
This is possible using the prepare method.
.. code-block:: python
class Algorithm:
def prepare(self, data_loaders):
data_loader = data_loaders.loaderOf('in2')
(data, _, _) = data_loader[0]
self.offset = data['in2'].value
return True
def process(self, inputs, data_loaders, outputs):
# Read data from inputs, compute something, and write the result
# of the computation on outputs
...
return True
.. _beat-system-algorithms-input:
Handling input data
@@ -340,7 +435,7 @@ Each input provides the following informations:
.. py:attribute:: data_index
*(integer)* Index of the last block of data received on the input (See section
:ref:`beat-core-algorithms-input-synchronization`)
:ref:`beat-system-algorithms-input-synchronization`)
.. py:attribute:: data
@@ -366,7 +461,7 @@ not in the same order. The two blocks use different algorithms, which both
refers to their inputs and outputs using names of their choice
Nevertheless, Joe can choose to use Bill's algorithm instead of his own one.
When the algorithm to use is changed on the graphical interface, the system will
When the algorithm to use is changed on the web interface, the platform will
attempt to match each input with the names (and types) declared by the
algorithm. In case of ambiguity, the user will be asked to manually resolve it.
@@ -462,7 +557,7 @@ its inputs, the algorithm would do:
class Algorithm:
def process(self, inputs, outputs):
def process(self, inputs, data_loaders, outputs):
# Iterate over all the unsynchronized data
while inputs.hasMoreData():
@@ -523,7 +618,7 @@ keeps the others synchronized and iterate over all their data:
class Algorithm:
def process(self, inputs, outputs):
def process(self, inputs, data_loaders, utputs):
# Desynchronize the third input. From now on, inputs['desynchronized'].data
# and inputs['desynchronized'].data_index won't change
@@ -561,6 +656,51 @@ This will be addressed in a later version.
Feedback loop
.. _beat-system-algorithms-dataloaders:
Data loaders
------------
.. _beat-system-algorithms-dataloaders-dataloaderlist:
DataLoader list
...............
An algorithm is given accessto the **list of data loaders** of the processing
block. This list can be used to access each data loader individually, either by
their channel name (see :ref:`beat-system-algorithms-dataloaders-name`), their
index or by iterating over the list:
.. code-block:: python
# 'data_loaders' is the list of data loaders of the processing block
# Retrieve a data loader by name
data_loader = data_loaders['labels']
# Retrieve a data loader by index
for index in range(0, len(data_loaders)):
data_loader = data_loaders[index]
# Iteration over all data loaders
for data_loader in data_loaders:
...
# Retrieve the data loader an input belongs to, by input name
data_loader = data_loaders.loaderOf('label')
.. _beat-system-algorithms-dataloaders-dataloader:
DataLoader
..........
Provides access to data from a group of inputs synchronized together.
See .. :py:class::DataLoader:
.. _beat-system-algorithms-output:
Handling output data
@@ -608,19 +748,14 @@ Each output provides the following informations:
*(string)* Format of the data written on the output
And the following methods:
.. py:method:: createData()
Retrieve an initialized block of data corresponding to the data format of
the output
And the following method:
.. py:method:: write(data, end_data_index=None)
Write a block of data on the output
We'll look at the usage of those methods through some examples in the following
We'll look at the usage of this method through some examples in the following
sections.
@@ -703,12 +838,9 @@ block for each image received on its inputs. This is the simplest case.
def process(self, inputs, outputs):
# Ask the output to create a data object according to its data format
data = outputs['features'].createData()
# Compute something from inputs['images'].data and inputs['labels'].data
# and store the result in 'data'
...
data = ...
# Write our data block on the output
outputs['features'].write(data)
@@ -765,17 +897,14 @@ available.
class Algorithm:
def process(self, inputs, outputs):
def process(self, inputs, data_loaders, outputs):
# Use a criterion on the image to determine if we can perform our
# computation on it or not
if can_compute(inputs['images'].data):
# Ask the output to create a data object according to its data format
data = outputs['features'].createData()
# Compute something from inputs['images'].data and inputs['labels'].data
# and store the result in 'data'
...
data = ...
# Write our data block on the output
outputs['features'].write(data)
@@ -852,7 +981,7 @@ the data block on the output.
self.previous_data_index = None # Data index of the input list during the
# processing of the previous image
def process(self, inputs, outputs):
def process(self, inputs, data_loaders, outputs):
# Determine if we already processed some image(s)
if self.data is not None:
# Determine if the label has changed since the last image we processed
@@ -866,8 +995,7 @@ the data block on the output.
# Create a new block of data if necessary
if self.data is None:
# Ask the output to create a data object according to its data format
self.data = outputs['features'].createData()
self.data = ...
# Remember the label we are currently processing
self.current_label = inputs['labels'].data.name
@@ -884,4 +1012,90 @@ the data block on the output.
return True
.. _beat-system-algorithms-api-migration
Migrating from API v1 to API v2
-------------------------------
Algorithm that have been written using BEAT's algorithm v1 can still be run under
v2 execution model. They are now considered legacy algorithm and should be ported
quickly to the API v2.
API v2 provides two different types of algorithms:
- Sequential
- Autonomous
The Sequential type follows the same code execution model as the v1 API, meaning
that the process function is called once for each input item.
The Autonomous type allows the developer to load the input data at will therefor
the process method will only be called once. This allows for example to optimize
loading of data to the GPU memory for faster execution.
The straightforward migration path from v1 to v2 is to make a Sequential algorithm
which will require only a few changes regarding the code.
API V1:
.. code-block:: python
class Algorithm:
def setup(self, parameters):
self.sync = parameters['sync']
return True
def process(self, inputs, outputs):
if inputs[self.sync].isDataUnitDone():
outputs['out'].write({
'value': inputs['in1'].data.value + inputs['in2'].data.value,
})
return True
API V2 sequential:
.. code-block:: python
class Algorithm:
def setup(self, parameters):
self.sync = parameters['sync']
return True
def process(self, inputs, data_loaders, outputs):
if inputs[self.sync].isDataUnitDone():
outputs['out'].write({
'value': inputs['in1'].data.value + inputs['in2'].data.value,
})
return True
API V2 automous:
.. code-block:: python
class Algorithm:
def setup(self, parameters):
self.sync = parameters['sync']
return True
def process(self, data_loaders, outputs):
data_loader = data_loaders.loaderOf('in1')
for i in range(data_loader.count(self.sync)):
view = data_loader.view(self.sync, i)
(data, start, end) = view[view.count() - 1]
outputs['out'].write({
'value': data['in1'].value + data['in2'].value,
},
end
)
return True
.. include:: links.rst
Loading