Samuel GAIST · 9ebb39bb · f92d3774 · 755967ae · 9ebb39bb · f92d3774
--- a/doc/beat/algorithms.rst

+ 263

− 49
+++ b/doc/beat/algorithms.rst

+ 263

− 49
 @@ -41,11 +41,39 @@ defined at a higher-level in the platform. It is expected that the
 implementation of the algorithm respects whatever was declared on the
 platform.

-By default, the algorithm is **data-driven**; algorithm is typically provided
+.. _beat-system-algorithms-types:
+
+Algorithm types
+---------------
+
+Previous versions of the beat.core package implemented only one type of
+algorithm (referred to as v1 algorithm). The current version of beat.core
+defines two type of algorithms:
+
+- Sequential
+- Autonomous
+
+The sequential algorithm type is the direct successor of the v1 algorithm. For
+migration information, see :ref:`_beat-system-algorithms-api-migration`.
+
+Sequential
+..........
+
+The sequential algorithm is **data-driven**; algorithm is typically provided
 one data sample at a time and must immediately produce some output data.
+
+Autonomous
+..........
+
+The autonomous algorithm as its name suggest is responsible for loading the data
+samples it needs in order to do its work. He's also responsible for writing the
+appropriate amount of data on its outputs.
+
+
 Furthermore, the way the algorithm handle the data is highly configurable and
 covers a huge range of possible scenarios.

+
 :numref:`beat-system-overview-block` displays the relationship between a
 processing block and its algorithm.

 @@ -84,7 +112,10 @@ probabilistic component analysis (PCA):
 .. code-block:: javascript

    {
+        "schema_version": 2,
        "language": "python",
+        "api_version": 2,
+        "type": "sequential",
        "splittable": false,
        "groups": [
            {
 @@ -109,23 +140,27 @@ probabilistic component analysis (PCA):
        "description": "Principal Component Analysis (PCA)"
    }

-The field `language` specifies the language in which the algorithm is
-implemented. The field `splittable` indicates, whether the algorithm can be
-parallelized into chunks or not. The field `parameters` lists the parameters
-of the algorithm, describing both default values and their types. The field
-`groups` gives information about the inputs and outputs of the algorithm.
-They are provided into a list of dictionary, each element in this list being
-associated to a database `channel`. The group, which contains outputs, is
-the **synchronization channel**. By default, a loop is automatically performs
+The field `schema_version` specifies which schema version must be used to
+validate the file content. The field `api_version` specifies the version of the
+API implemented by the algorithm. The field `type` specifies the type of the
+algorithm. Depending on that, the execution model will change. The field
+`language` specifies the language in which the algorithm is implemented. The
+field `splittable` indicates, whether the algorithm can be parallelized into
+chunks or not. The field `parameters` lists the parameters of the algorithm,
+describing both default values and their types. The field `groups` gives
+information about the inputs and outputs of the algorithm. They are provided
+into a list of dictionary, each element in this list being associated to a
+database `channel`. The group, which contains outputs, is the
+**synchronization channel**. By default, a loop is automatically performed
 by the platform on the synchronization channel, and user-code must not loop
 on this group. In contrast, it is the responsibility of the user to load data
 from the other groups. This is described in more details in the following
 subsections. Finally, the field `description` is optional and gives a short
 description of the algorithm.

-The graphical interface of BEAT provides editor for algorithm,
-which simplifies its `JSON`_ declaration definition. It also includes a simple
-Python code editor.
+The graphical interface of BEAT provides editors for the main components of the
+system (for example: algorithms, data formats, etc.), which simplifies their
+`JSON`_ declaration definition.


 .. _beat-system-algorithms-definition-analyzer:
 @@ -138,10 +173,10 @@ kind of algorithm, which does not yield any output, but in contrast so called
 `results`. These algorithms are called **analyzers**.

 `Results` of an experiment are reported back to the user. Data privacy is very
-important in the BEAT system and therefore only a limited number of data formats can be
-employed as results in an analyzer, such as boolean, integers, floating point
-values, strings (of limited size), as well as plots (such as scatter or bar
-plots).
+important in the BEAT framework and therefore only a limited number of data
+formats can be employed as results in an analyzer, such as boolean, integers,
+floating point values, strings (of limited size), as well as plots (such as
+scatter or bar plots).

 For example, the following declaration is the one of a simple analyzer, which
 generates an ROC curve as well as few other metrics.
 @@ -213,10 +248,10 @@ conventions. In the following, examples of such classes are provided.
 Examples
 --------

-.. _beat-system-algorithms-examples-simple:
+.. _beat-system-algorithms-examples-simple-sequential:

-Simple algorithm (no parametrization)
-.....................................
+Simple sequential algorithm (no parametrization)
+................................................

 At the very minimum, an algorithm class must look like this:

 @@ -224,7 +259,7 @@ At the very minimum, an algorithm class must look like this:

    class Algorithm:

-        def process(self, inputs, outputs):
+        def process(self, inputs, data_loaders, outputs):
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
 @@ -232,21 +267,51 @@ At the very minimum, an algorithm class must look like this:

 The class must be called ``Algorithm`` and must have a method called
 ``process()``, that takes as parameters a list of inputs (see section
-:ref:`beat-system-algorithms-input-inputlist`) and a list of outputs (see
-section :ref:`beat-system-algorithms-output-outputlist`). This method must
+:ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section
+:ref:`beat-system-algorithms-dataloader-dataloaderlist`) and a list of outputs
+(see section :ref:`beat-system-algorithms-output-outputlist`). This method must
 return ``True`` if everything went correctly, and ``False`` if an error
 occurred.

 The platform will call this method once per block of data available on the
 `synchronized` inputs of the block.

+.. _beat-system-algorithms-examples-simple-autonomous:

-.. _beat-system-algorithms-examples-parametrizable:
+Simple autonomous algorithm (no parametrization)
+................................................
+
+At the very minimum, an algorithm class must look like this:
+
+.. code-block:: python
+
+    class Algorithm:
+
+        def process(self, data_loaders, outputs):
+            # Read data from data_loaders, compute something, and write the
+            # result of the computation on outputs
+            ...
+            return True
+
+The class must be called ``Algorithm`` and must have a method called
+``process()``, that takes as parameters a list of data loader (see section
+:ref:`beat-system-algorithms-dataloaders`) and a list of outputs (see
+section :ref:`beat-system-algorithms-output-outputlist`). This method must
+return ``True`` if everything went correctly, and ``False`` if an error
+occurred.
+
+The platform will call this method only once as it is its responsibility to load
+the appropriate amount of data and process it.
+
+
+.. _beat-system-algorithms-examples-parameterizable:

 Parameterizable algorithm
-.........................
+........................
+
+The following is valid for all types of algorithms

-To implement a parameterizable algorithm, two things must be added to the class:
+To implement a parametrizable algorithm, two things must be added to the class:
 (1) a field in the JSON declaration of the algorithm containing their default
 values as well as the type of the parameters, and (2) a method called
 ``setup()``, that takes one argument, a map containing the parameters of the
 @@ -274,7 +339,7 @@ algorithm.
            self.threshold = parameters['threshold']
            return True

-        def process(self, inputs, outputs):
+        def process(self, inputs, data_loaders, outputs):
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
 @@ -284,6 +349,36 @@ When retrieving the value of the parameters, one must not assume that a value
 was provided for each parameter. This is why we may use a *try: ... except: ...*
 construct in the ``setup()`` method.

+.. _beat-system-algorithms-preparation:
+
+Preparation of an algorithm
+...........................
+
+The following is valid for all types of algorithms
+
+Often algorithms need to compute some values or retrieve some data prior to
+applying their mathematical logic.
+
+This is possible using the prepare method.
+
+.. code-block:: python
+
+    class Algorithm:
+
+        def prepare(self, data_loaders):
+            data_loader = data_loaders.loaderOf('in2')
+            (data, _, _) = data_loader[0]
+            self.offset = data['in2'].value
+            return True
+
+        def process(self, inputs, data_loaders, outputs):
+            # Read data from inputs, compute something, and write the result
+            # of the computation on outputs
+            ...
+            return True
+
+
+
 .. _beat-system-algorithms-input:

 Handling input data
 @@ -340,7 +435,7 @@ Each input provides the following informations:
 .. py:attribute:: data_index

    *(integer)* Index of the last block of data received on the input (See section
-    :ref:`beat-core-algorithms-input-synchronization`)
+    :ref:`beat-system-algorithms-input-synchronization`)

 .. py:attribute:: data

 @@ -366,7 +461,7 @@ not in the same order. The two blocks use different algorithms, which both
 refers to their inputs and outputs using names of their choice

 Nevertheless, Joe can choose to use Bill's algorithm instead of his own one.
-When the algorithm to use is changed on the graphical interface, the system will
+When the algorithm to use is changed on the web interface, the platform will
 attempt to match each input with the names (and types) declared by the
 algorithm. In case of ambiguity, the user will be asked to manually resolve it.

 @@ -462,7 +557,7 @@ its inputs, the algorithm would do:

    class Algorithm:

-        def process(self, inputs, outputs):
+        def process(self, inputs, data_loaders, outputs):

            # Iterate over all the unsynchronized data
            while inputs.hasMoreData():
 @@ -523,7 +618,7 @@ keeps the others synchronized and iterate over all their data:

    class Algorithm:

-        def process(self, inputs, outputs):
+        def process(self, inputs, data_loaders, utputs):

            # Desynchronize the third input. From now on, inputs['desynchronized'].data
            # and inputs['desynchronized'].data_index won't change
 @@ -561,6 +656,51 @@ This will be addressed in a later version.
    Feedback loop


+
+.. _beat-system-algorithms-dataloaders:
+
+Data loaders
+------------
+
+.. _beat-system-algorithms-dataloaders-dataloaderlist:
+
+DataLoader list
+...............
+
+An algorithm is given accessto the **list of data loaders** of the processing
+block. This list can be used to access each data loader individually, either by
+their channel name (see :ref:`beat-system-algorithms-dataloaders-name`), their
+index or by iterating over the list:
+
+
+.. code-block:: python
+
+    # 'data_loaders' is the list of data loaders of the processing block
+
+    # Retrieve a data loader by name
+    data_loader = data_loaders['labels']
+
+    # Retrieve a data loader by index
+    for index in range(0, len(data_loaders)):
+        data_loader = data_loaders[index]
+
+    # Iteration over all data loaders
+    for data_loader in data_loaders:
+        ...
+
+    # Retrieve the data loader an input belongs to, by input name
+    data_loader = data_loaders.loaderOf('label')
+
+
+.. _beat-system-algorithms-dataloaders-dataloader:
+
+DataLoader
+..........
+
+Provides access to data from a group of inputs synchronized together.
+
+See .. :py:class::DataLoader:
+
 .. _beat-system-algorithms-output:

 Handling output data
 @@ -608,19 +748,14 @@ Each output provides the following informations:
    *(string)* Format of the data written on the output


-And the following methods:
-
-.. py:method:: createData()
-
-    Retrieve an initialized block of data corresponding to the data format of
-    the output
+And the following method:

 .. py:method:: write(data, end_data_index=None)

    Write a block of data on the output


-We'll look at the usage of those methods through some examples in the following
+We'll look at the usage of this method through some examples in the following
 sections.


 @@ -703,12 +838,9 @@ block for each image received on its inputs. This is the simplest case.

        def process(self, inputs, outputs):

-            # Ask the output to create a data object according to its data format
-            data = outputs['features'].createData()
-
            # Compute something from inputs['images'].data and inputs['labels'].data
            # and store the result in 'data'
-            ...
+            data = ...

            # Write our data block on the output
            outputs['features'].write(data)
 @@ -765,17 +897,14 @@ available.

    class Algorithm:

-        def process(self, inputs, outputs):
+        def process(self, inputs, data_loaders, outputs):

            # Use a criterion on the image to determine if we can perform our
            # computation on it or not
            if can_compute(inputs['images'].data):
-                # Ask the output to create a data object according to its data format
-                data = outputs['features'].createData()
-
                # Compute something from inputs['images'].data and inputs['labels'].data
                # and store the result in 'data'
-                ...
+                data = ...

                # Write our data block on the output
                outputs['features'].write(data)
 @@ -852,7 +981,7 @@ the data block on the output.
            self.previous_data_index = None # Data index of the input list during the
                                            # processing of the previous image

-        def process(self, inputs, outputs):
+        def process(self, inputs, data_loaders, outputs):
            # Determine if we already processed some image(s)
            if self.data is not None:
                # Determine if the label has changed since the last image we processed
 @@ -866,8 +995,7 @@ the data block on the output.

            # Create a new block of data if necessary
            if self.data is None:
-                # Ask the output to create a data object according to its data format
-                self.data = outputs['features'].createData()
+                self.data = ...

                # Remember the label we are currently processing
                self.current_label = inputs['labels'].data.name
 @@ -884,4 +1012,90 @@ the data block on the output.
            return True


+.. _beat-system-algorithms-api-migration
+
+Migrating from API v1 to API v2
+-------------------------------
+
+Algorithm that have been written using BEAT's algorithm v1 can still be run under
+v2 execution model. They are now considered legacy algorithm and should be ported
+quickly to the API v2.
+
+API v2 provides two different types of algorithms:
+- Sequential
+- Autonomous
+
+The Sequential type follows the same code execution model as the v1 API, meaning
+that the process function is called once for each input item.
+
+The Autonomous type allows the developer to load the input data at will therefor
+the process method will only be called once. This allows for example to optimize
+loading of data to the GPU memory for faster execution.
+
+The straightforward migration path from v1 to v2 is to make a Sequential algorithm
+which will require only a few changes regarding the code.
+
+API V1:
+
+.. code-block:: python
+    class Algorithm:
+
+        def setup(self, parameters):
+            self.sync = parameters['sync']
+            return True
+
+
+        def process(self, inputs, outputs):
+            if inputs[self.sync].isDataUnitDone():
+                outputs['out'].write({
+                    'value': inputs['in1'].data.value + inputs['in2'].data.value,
+                })
+
+            return True
+
+
+API V2 sequential:
+
+.. code-block:: python
+    class Algorithm:
+
+        def setup(self, parameters):
+            self.sync = parameters['sync']
+            return True
+
+
+        def process(self, inputs, data_loaders, outputs):
+            if inputs[self.sync].isDataUnitDone():
+                outputs['out'].write({
+                    'value': inputs['in1'].data.value + inputs['in2'].data.value,
+                })
+
+            return True
+
+
+API V2 automous:
+
+.. code-block:: python
+    class Algorithm:
+
+        def setup(self, parameters):
+            self.sync = parameters['sync']
+            return True
+
+
+        def process(self, data_loaders, outputs):
+            data_loader = data_loaders.loaderOf('in1')
+
+            for i in range(data_loader.count(self.sync)):
+                view = data_loader.view(self.sync, i)
+
+                (data, start, end) = view[view.count() - 1]
+
+                outputs['out'].write({
+                        'value': data['in1'].value + data['in2'].value,
+                    },
+                    end
+                )
+
+            return True
 .. include:: links.rst