algorithms.rst 54.6 KB
Newer Older
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1 2 3 4 5
.. vim: set fileencoding=utf-8 :

.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/          ..
.. Contact: beat.support@idiap.ch                                             ..
..                                                                            ..
6
.. This file is part of the beat.docs module of the BEAT platform.            ..
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
..                                                                            ..
.. Commercial License Usage                                                   ..
.. Licensees holding valid commercial BEAT licenses may use this file in      ..
.. accordance with the terms contained in a written agreement between you     ..
.. and Idiap. For further information contact tto@idiap.ch                    ..
..                                                                            ..
.. Alternatively, this file may be used under the terms of the GNU Affero     ..
.. Public License version 3 as published by the Free Software and appearing   ..
.. in the file LICENSE.AGPL included in the packaging of this file.           ..
.. The BEAT platform is distributed in the hope that it will be useful, but   ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE.                                       ..
..                                                                            ..
.. You should have received a copy of the GNU Affero Public License along     ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/.          ..


24
.. _beat-system-algorithms:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
25

26 27 28
============
 Algorithms
============
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
29 30 31

Algorithms are user-defined piece of software that run within the blocks of a
toolchain. An algorithm can read data on the input(s) of the block and write
32
processed data on its output(s) (We refer to the inputs and outputs collectively as *endpoints*.).
33
They are, hence, key components for
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
34 35 36 37 38 39 40
scientific experiments, since they formally describe how to transform raw
data into higher level concept such as classes.


An algorithm lies at the core of each processing block and may be subject to
parametrization. Inputs and outputs of an algorithm have well-defined data
formats. The format of the data on each input and output of the block is
41
defined at a higher-level in BEAT framework. It is expected that the
42
implementation of the algorithm respects the format of each endpoint that was declared before.
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

:numref:`beat-core-overview-block` displays the relationship between a
processing block and its algorithm.

.. _beat-core-overview-block:
.. figure:: ./img/block.*

   Relationship between a processing block and its algorithm

Typically, an algorithm will process data units received at the input
endpoints, and push the relevant results to the output endpoint. Each algorithm
must have at least one input and at least one output. The links in a toolchain
connect the output of one block to the input of another effectively connecting
algorithms together, thus determining the information-flow through the
toolchain.

Blocks at the beginning of the toolchain are typically connected to datasets,
and blocks at the end of a toolchain are connected to analyzers (special
algorithms with no output). BEAT is responsible for
delivering inputs from the desired datasets into the toolchain and through your
algorithms. This drives the synchronization of information-flow through the
toolchain. Flow synchronization is determined by data units produced from a
dataset and injected into the toolchain.

67 68 69 70 71 72 73 74 75 76 77 78 79 80
.. note:: **Naming Convention**

   Algorithms are named using three values joined by a ``/`` (slash) operator:

     * **username**: indicates the author of the algorithm
     * **name**: indicates the name of the algorithm
     * **version**: indicates the version (integer starting from ``1``) of the
       algorithm

   Each tuple of these three components defines a *unique* algorithm name
   inside the BEAT ecosystem.



Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
81

82 83 84
.. _beat-system-algorithms-types:

Algorithm types
85
===============
86

87
The current version of BEAT framework has two algorithm type which are different
88
in the way they handle data samples. These algorithms are the following:
89 90 91 92

- Sequential
- Autonomous

93
In the previous versions of BEAT only one type of
94
algorithm (referred to as v1 algorithm) was implemented.
95
The sequential algorithm type is the direct successor of the v1 algorithm. For
96
migration information, see :ref:`beat-system-algorithms-api-migration`.
97

98 99 100
The platform now also provides the concept of soft loop. The soft loop allows
the implementation of supervised processing within a macro block.

101
Sequential
102
----------
103 104

The sequential algorithm is **data-driven**; algorithm is typically provided
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
105
one data sample at a time and must immediately produce some output data.
106 107

Autonomous
108
----------
109 110

The autonomous algorithm as its name suggest is responsible for loading the data
111
samples it needs in order to do its work. It's also responsible for writing the
112 113 114
appropriate amount of data on its outputs.


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
115 116 117
Furthermore, the way the algorithm handle the data is highly configurable and
covers a huge range of possible scenarios.

118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
Loop
----

A loop is composed of three elements:

- An processor algorithm
- An evaluator algorithm
- A LoopChannel

The two algorithms work in pair using the LoopChannel to communicate. The
processor algorithm is responsible for applying some transformation or analysis
on a set of data and then send the result to evaluator for validation. The
role of the evaluator is to provide a feedback to the processor that will
either continue processing the same block of data or go on with the next until
all data is exhausted. The output writing of the evaluator is synchronized with
the output writing of the processor.

Sequential versions have also the reading part that is synchronized so that the
evaluator can read data at the same pace as the processor.

The two algorithms are available in both sequential and autonomous form. However
there are only three valid combinations:

========== ==========
 Processor Evaluator
========== ==========
Autonomous Autonomous
Sequential Sequential
Sequential Autonomous
========== ==========

149

150
.. _beat-system-algorithms-definition:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
151

152 153
Definition of an Algorithm
==========================
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
154 155 156 157 158 159 160 161 162 163

An algorithm is defined by two distinct components:

* a `JSON`_ object with several fields, specifying the inputs, the outputs,
  the parameters and additional information such as the language in which it
  is implemented.
* source code (and/or [later] binary code) describing how to transform the input
  data.


164
.. _beat-system-algorithms-definition-json:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
165 166

JSON Declaration
167
----------------
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
168 169 170 171 172 173 174 175

A `JSON`_ declaration of an algorithm consists of several fields. For example,
the following declaration is the one of an algorithm implementing
probabilistic component analysis (PCA):

.. code-block:: javascript

    {
176
        "schema_version": 2,
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
177
        "language": "python",
178 179
        "api_version": 2,
        "type": "sequential",
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
        "splittable": false,
        "groups": [
            {
                "inputs": {
                    "image": {
                        "type": "system/array_2d_uint8/1"
                    }
                },
                "outputs": {
                    "subspace": {
                        "type": "tutorial/linear_machine/1"
                    }
                }
            }
        ],
        "parameters": {
            "number-of-components": {
                "default": 5,
                "type": "uint32"
            }
        },
        "description": "Principal Component Analysis (PCA)"
    }

204 205
Here are the description for each of the fields in the example above:

206
*   **schema_version:** specifies which schema version must be used to validate the file content.
207

208
*   **api_version:** specifies the version of the API implemented by the algorithm.
209

210
*   **type:** specifies the type of the algorithm. Depending on that, the execution model will change.
211

212
*   **language:** specifies the language in which the algorithm is implemented.
213

214
*   **splittable:** indicates, whether the algorithm can be parallelized into chunks or not.
215

216
*   **parameters:** lists the parameters of the algorithm, describing both default values and their types.
217

218
*   **groups:** gives information about the inputs and outputs of the algorithm. They are provided into a list of dictionary, each element in this list being associated to a database *channel*. The group, which contains outputs, is the *synchronization channel*. By default, a loop is automatically performed by the BEAT framework on the synchronization channel, and user-code must not loop on this group. In contrast, it is the responsibility of the user to load data from the other groups. This is described in more details in the following subsections.
219

220
*   **description:** is optional and gives a short description of the algorithm.
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
221

222
.. note::
223

224 225 226 227 228
   The graphical interface of BEAT provides user-friendly editors to configure
   the main components of the system (for example: algorithms, data formats,
   etc.), which simplifies their `JSON`_ declaration definition. One needs
   only to declare an algorithm using the described specifications when not
   using this graphical interface.
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
229 230


231
.. _beat-system-algorithms-definition-analyzer:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
232 233 234 235 236

Analyzer
........

At the end of the processing workflow of an experiment, there is a special
237 238
kind of algorithm, which does not yield any *output* but instead it produces
*results*. These algorithms are called **analyzers**.
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
239

240
*Results* of an experiment are reported back to the user. Data privacy is very
241 242 243 244
important in the BEAT framework and therefore only a limited number of data
formats can be employed as results in an analyzer, such as boolean, integers,
floating point values, strings (of limited size), as well as plots (such as
scatter or bar plots).
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294

For example, the following declaration is the one of a simple analyzer, which
generates an ROC curve as well as few other metrics.

.. code-block:: javascript

    {
      "language": "python",
      "groups": [
        {
          "inputs": {
            "scores": {
              "type": "tutorial/probe_scores/1"
            }
          }
        }
      ],
      "results": {
        "far": {
          "type": "float32",
          "display": true
        },
        "roc": {
          "type": "plot/scatter/1",
          "display": false
        },
        "number_of_positives": {
          "type": "int32",
          "display": false
        },
        "frr": {
          "type": "float32",
          "display": true
        },
        "eer": {
          "type": "float32",
          "display": true
        },
        "threshold": {
          "type": "float32",
          "display": false
        },
        "number_of_negatives": {
          "type": "int32",
          "display": false
        }
      }
    }


295
.. _beat-system-algorithms-definition-code:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
296

297 298
Source code
-----------
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
299

300
The BEAT framework has been designed to support algorithms written in different
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
301 302 303
programming languages. However, for each language, a corresponding back-end
needs to be implemented, which is in charge of connecting the inputs and
outputs to the algorithm and running its code as expected. In this section,
304
we describe the implementation of algorithms in the Python and C++ programming
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
305 306
language.

307 308

|project| treats algorithms as objects that are derived from the class
309 310 311
``Algorithm`` when using Python or in case of C++, they should be derived from
``IAlgorithmLagacy``, ``IAlgorithmSequential``, or ``IAlgorithmAutonomous``
depending of the algorithm type. To define a new algorithm,
312 313 314 315 316 317 318 319 320 321 322 323 324
at least one method must be implemented:

  * ``process()``: the method that actually processes input and produces
    outputs.

The code example below illustrates the implementation of an algorithm (in
Python):

.. code-block:: python
   :linenos:

   class Algorithm:

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
325
        def process(self, inputs, data_loaders, outputs):
326
           # here, you read inputs, process and write results to outputs
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
327 328


329
Here is the equivalent example for a sequential algorithm in C++:
330 331 332 333

.. code-block:: c++
   :linenos:

334
    class Algorithm: public IAlgorithmSequential
335 336
    {
    public:
337
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
338 339 340 341 342
        {
            // here, you read inputs, process and write results to outputs
        }
    };

343
.. _beat-system-algorithms-examples:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
344 345

Examples
346 347 348 349
........

To implement a new algorithm, one must write a class following a few
conventions. In the following, examples of such classes are provided.
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
350

351
.. _beat-system-algorithms-examples-simple-sequential:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
352

353 354
Simple sequential algorithm (no parametrization)
................................................
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
355 356 357 358 359 360 361

At the very minimum, an algorithm class must look like this:

.. code-block:: python

    class Algorithm:

362
        def process(self, inputs, data_loaders, outputs):
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
363 364 365 366 367 368 369
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
            return True

The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of inputs (see section
370
:ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section
371
:ref:`beat-system-algorithms-dataloaders-dataloaderlist`) and a list of outputs
372
(see section :ref:`beat-system-algorithms-output-outputlist`). This method must
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
373 374 375 376 377 378
return ``True`` if everything went correctly, and ``False`` if an error
occurred.

The platform will call this method once per block of data available on the
`synchronized` inputs of the block.

379
.. _beat-system-algorithms-examples-simple-autonomous:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
380

381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406
Simple autonomous algorithm (no parametrization)
................................................

At the very minimum, an algorithm class must look like this:

.. code-block:: python

    class Algorithm:

        def process(self, data_loaders, outputs):
            # Read data from data_loaders, compute something, and write the
            # result of the computation on outputs
            ...
            return True

The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of data loader (see section
:ref:`beat-system-algorithms-dataloaders`) and a list of outputs (see
section :ref:`beat-system-algorithms-output-outputlist`). This method must
return ``True`` if everything went correctly, and ``False`` if an error
occurred.

The platform will call this method only once as it is its responsibility to load
the appropriate amount of data and process it.


407 408 409
.. _beat-system-algorithms-examples-simple-processor:

Simple autonomous processor algorithm (no parametrization)
410
..........................................................
411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444

At the very minimum, a processor algorithm class must look like this:

.. code-block:: python

    class Algorithm:

        def process(self, data_loaders, outputs, loop_channel):
            # Read data from data_loaders, compute something, and validates the
            # hypothesis
            ...
            is_valid, feedback = loop_channel.validate({"value": np.float64(some_value)})
            # check is_valid and continue appropriately and write the result
            # of the computation on outputs
            ...
            return True


The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of inputs (see section
:ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section
:ref:`beat-system-algorithms-dataloaders-dataloaderlist`), a list of outputs
(see section :ref:`beat-system-algorithms-output-outputlist`) and a loop chanel
(see section :ref:`beat-system-algorithms-loop-channel`) . This method must
return ``True`` if everything went correctly, and ``False`` if an error
occurred.

The platform will call this method once per block of data available on the
`synchronized` inputs of the block.


.. _beat-system-algorithms-examples-simple-evaluator:

Simple autonomous evaluator algorithm (no parametrization)
445
..........................................................
446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471

At the very minimum, a processor algorithm class must look like this:

.. code-block:: python

    class Algorithm:

        def validate(self, hypothesis):
            # compute if hypothesis makes sense and returns a tuple with a
            # boolean value and some feendback
            return (result, {"value": np.float32(delta)})

        def write(self, outputs, processor_output_name, end_data_index):
            # write something on its output, it is called in sync with processor
            # algorithm write
            outputs["out"].write({"value": np.int32(self.output)}, end_data_index)



The class must be called ``Algorithm`` and must have a method called
``validate()``, that takes as parameter a dataformat that will contain the
hypothesis that needs validation. The function must return a tuple made of a
boolean value and feedback value that will be used by the processor to determine
whether it should continue processing the current data or move further.


472
.. _beat-system-algorithms-examples-parameterizable:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
473

Zohreh MOSTAANI's avatar
Zohreh MOSTAANI committed
474
Parameterizable algorithm
475
.........................
476 477

The following is valid for all types of algorithms
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
478

479
To implement a parameterizable algorithm, two things must be added to the class:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506
(1) a field in the JSON declaration of the algorithm containing their default
values as well as the type of the parameters, and (2) a method called
``setup()``, that takes one argument, a map containing the parameters of the
algorithm.

.. code-block:: javascript

    {
        ...
        "parameters": {
            "threshold": {
                "default": 0.5,
                "type": "float32"
            }
        },
        ...
    }

.. code-block:: python

    class Algorithm:

        def setup(self, parameters):
            # Retrieve the value of the parameters
            self.threshold = parameters['threshold']
            return True

507
        def process(self, inputs, data_loaders, outputs):
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
508 509 510 511 512 513 514 515 516
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
            return True

When retrieving the value of the parameters, one must not assume that a value
was provided for each parameter. This is why we may use a *try: ... except: ...*
construct in the ``setup()`` method.

517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533
.. _beat-system-algorithms-preparation:

Preparation of an algorithm
...........................

The following is valid for all types of algorithms

Often algorithms need to compute some values or retrieve some data prior to
applying their mathematical logic.

This is possible using the prepare method.

.. code-block:: python

    class Algorithm:

        def prepare(self, data_loaders):
534
            # Retrieve and prepare some data.
535 536 537 538 539 540 541 542 543 544 545 546 547
            data_loader = data_loaders.loaderOf('in2')
            (data, _, _) = data_loader[0]
            self.offset = data['in2'].value
            return True

        def process(self, inputs, data_loaders, outputs):
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
            return True



548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590

Data Synchronization in Sequential Algorithms
=============================================

One particularity of the |project| framework is how the data-flow through a
given toolchain is synchronized. The framework is responsible for extracting
data units (images, speech-segments, videos, etc.) from the database and
presenting them to the input endpoints of certain blocks, as specified in the
toolchain. Each time a new data unit is presented to the input of a block can
be thought of as an individual time-unit. The algorithm implemented in a block
is responsible for the synchronization between its inputs and its output. In
other words, every time a data unit is produced by a dataset on an experiment,
the ``process()`` method of your algorithm is called to act upon it.

An algorithm may have one of two kinds of sychronicities: one-to-one, and
many-to-one. These are discussed in detail in separate sections below.


One-to-one synchronization
--------------------------

Here, the algorithm generates one output for every input entity (e.g., image,
video, speech-file).  For example, an image-based feature-extraction algorithm
would typically output one set of features every time it is called with a new
input image. A schematic diagram of one-to-one sychronization for an algorithm
is shown in the figure below:

.. image:: img/case-study-1.*

At the configuration shown in this figure, the algorithm-block has two
endpoints: one input, and one output. The inputs and outputs and the block are
synchronized together (notice the color information). Each red box represents
one input unit (e.g., an image, or a video), that is fed to the input interface
of the block.  Corresponding to each input received, the block produces one
output unit, shown as a blue box in the figure.

An example code showing how to implement an algorithm in this configuration is shown below:

.. code-block:: python
   :linenos:

    class Algorithm:

591
        def process(self, inputs, data_loaders, outputs):
592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616

            # to read the field "value" on the "in" input, use "data"
            # a possible declaration of "user/format/1" would be:
            # {
            #   "value": ...
            # }
            value = inputs['in'].data.value

            # do your processing and create the "output" value
            output = magical_processing(value)

            # to write "output" into the relevant endpoint use "write"
            # a possible declaration of "user/other/1" would be:
            # {
            #   "value": ...
            # }
            outputs['out'].write({'value': output})

            # No error occurred, so return True
            return True


.. code-block:: c++
   :linenos:

617
    class Algorithm: public IAlgorithmSequential
618 619
    {
    public:
620
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636
        {
            // to read the field "value" on the "in" input, use "data"
            // a possible declaration of "user/format/1" would be:
            // {
            //   "value": ...
            // }
            auto value = inputs["in"]->data<user::format_1>()->value;

            // do your processing and create the "output" value
            auto output = magical_processing(value);

            // to write "output" into the relevant endpoint use "write"
            // a possible declaration of "user/other/1" would be:
            // {
            //   "value": ...
            // }
637 638
            user::other_1 result;
            result.value = output;
639

640
            outputs["out"]->write(&result);
641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674

            # No error occurred, so return true
            return true;
        }
    };


In this example, the platform will call the user algorithm every time a new
input block with the format ``user/format/1`` is available at the input. Notice
no ``for`` loops are necessary on the user code. The platform controls the
looping for you.


A more complex case of one-to-one sychronization is shown the following figure:

.. image:: img/case-study-2.*

In such a configuration, the platform will ensure that each input unit at the
input-endpoint ``in`` is associated with the correct input unit at the
input-endpoint ``in2``. For example, referring to the figure above, the items
at the input ``in`` could be images, at the items at the input ``in2`` could be
labels, and the configuration depicted indicates that the first two input
images have the same label, say, ``l1``, whereas the next two input images have
the same label, say, ``l2``. The algorithm produces one output item at the
endpoint ``out``, for each input object presented at endpoint ``in``.

Example code implementing an algorithm processing data in this scenario is
shown below:

.. code-block:: python
   :linenos:

    class Algorithm:

675
        def process(self, inputs, data_loaders, outputs):
676 677 678 679 680 681 682 683 684 685 686 687 688 689

            i1 = inputs['in'].data.value
            i2 = inputs['in2'].data.value

            out = magical_processing(i1, i2)

            outputs['out'].write({'value': out})

            return True


.. code-block:: c++
   :linenos:

690
    class Algorithm: public IAlgorithmSequential
691 692
    {
    public:
693
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
694 695 696 697 698 699
        {
            auto i1 = inputs["in"]->data<user::format_1>()->value;
            auto i2 = inputs["in2"]->data<user::format_1>()->value;

            auto out = magical_processing(i1, i2);

700 701
            user::other_1 result;
            result.value = out;
702

703
            outputs["out"]->write(&result);
704 705 706 707 708 709

            return true;
        }
    };


710
You should notice that we still don't require any sort of ``for`` loops! BEAT *synchronizes* the inputs ``in`` and ``in2`` so they are available to
711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729
your program as the dataset implementor defined.


Many-to-one synchronization
---------------------------

Here, the algorithm produces a single output after processing a batch of
inputs.  For example, the algorithm may produce a model for a *dog* after
processing all input images for the *dog* class. A block diagram illustrating
many-to-one synchronization is shown below:

.. image:: img/case-study-3.*


Here the synchronization is driven by the endpoint ``in2``. For each data unit
received at the input ``in2``, the algorithm generates one output unit. Note
that, here, multiple units received at the input ``in`` are accumulated and
associated with a single unit received at ``in2``. The user does not have to
handle the internal indexing. Producing output data at the right moment is
730
enough for BEAT to understand the output is synchronized with ``in2``.
731 732 733 734 735 736 737 738 739 740 741

The example below illustrates how such an algorithm could be implemented:

.. code-block:: python
   :linenos:

    class Algorithm:

        def __init__(self):
            self.objs = []

742
        def process(self, inputs, data_loaders, outputs):
743 744
            self.objs.append(inputs['in'].data.value) # accumulates

745
            if not (inputs['in2'].hasMoreData()):
746 747 748 749 750 751 752 753 754 755
               out = magical_processing(self.objs)
               outputs['out'].write({'value': out})
               self.objs = [] #reset accumulator for next label

            return True


.. code-block:: c++
   :linenos:

756
    class Algorithm: public IAlgorithmSequential
757 758
    {
    public:
759
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
760 761 762
        {
            objs.push_back(inputs["in"]->data<user::format_1>()->value); // accumulates

763
            if !(inputs["in2"]->hasMoreData())
764 765 766
            {
                auto out = magical_processing(objs);

767 768
                user::other_1 result;
                result.value = out;
769

770
                outputs["out"]->write(&result);
771 772 773 774 775 776 777 778 779 780 781 782 783

                objs.clear();   // reset accumulator for next label
            }

            return true;
        }

    public:
        std::vector<float> objs;
    };


Here, the units received at the endpoint ``in`` are accumulated as long as the
784
``hasMoreData()`` method attached to the input ``in2`` returns ``True``.
785
When ``hasMoreData()`` returns ``False``, the corresponding label is read
786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815
from ``in2``, and a result is produced at the endpoint ``out``. After an output
unit has been produced, the internal accumulator for ``in`` is cleared, and the
algorithm starts accumulating a new set of objects for the next label.


Unsynchronized Operation
------------------------

Not all inputs for a block need to be synchronized together. In the diagram
shown below, the block is synchronized with the inputs ``in`` and ``in2`` (as indicated by
the green circle which matches the colour of the input lines connecting ``in`` and ``in2``).
The output ``out`` is synchronized with the block (and as one can notice locking at the code
below, outputs signal after every ``in`` input). The input ``in3`` is not
synchronized with the endpoints ``in``, ``in2`` and with the block. A processing block
which receives a previously calculated model and must score test samples is a
good example for this condition. In this case, the user is responsible for
reading out the contents of ``in3`` explicitly.

.. image:: img/case-study-4.*


In this case the algorithm will include an explicit loop to read the
unsynchronized input (``in3``).

.. code-block:: python
   :linenos:

    class Algorithm:

        def __init__(self):
816
            self.models = []
817

818
        def prepare(self, data_loaders):
819

820 821 822 823 824 825
            # Loads the "model" data at the beginning
            loader = data_loaders.loaderOf('in3')
            for i in range(loader.count()):
                view = loader.view('in3', i)
                data, _, _ = view[0]
                self.models.append(data['in3'].value)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
826
            return True
827 828 829 830


        def process(self, inputs, data_loaders, outputs):
            # N.B.: this will be called for every unit in `in'
831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846

            # Processes the current input in `in' and `in2', apply the
            # model/models
            out = magical_processing(inputs['in'].data.value,
                                     inputs['in2'].data.value,
                                     self.models)

            # Writes the output
            outputs.write({'value': out})

            return True


.. code-block:: c++
   :linenos:

847
    class Algorithm: public IAlgorithmSequential
848 849
    {
    public:
850
        bool prepare(const beat::backend::cxx::DataLoaderList& data_load_list) override
851
        {
852 853 854 855 856 857 858
            auto loader = data_load_list["in3"];
            for (int i = 0 ; i < loader->count() ; ++i) {
                auto view = loader->view("in3", i);
                std::map<std::string, beat::backend::cxx::Data *> data;
                std::tie(data, std::ignore, std::ignore) = (*view)[0];
                auto model = static_cast<user::model*>(data["in3"]);
                models.append(*model);
859 860
            }

861 862 863 864 865 866 867
            return true;
        }

        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
        {
            # N.B.: this will be called for every unit in `in'

868 869 870 871 872 873
            // Processes the current input in `in' and `in2', apply the model/models
            auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
                                          inputs["in2"]->data<user::format_1>()->value,
                                          models);

            // Writes the output
874 875
            user::other_1 result;
            result.value = out;
876

877
            outputs["out"]->write(&result);
878 879 880 881 882 883 884 885 886

            return true;
        }

    public:
        std::vector<user::model_1> models;
    };


887
In the example above you have several inputs which are synchronized together, but
888 889
unsynchronized with the block you're writing your algorithm for. It may also happen that you have even more data inputs that are unsynchronized. In this case, using *group* for different set of inputs makes the code easier to read.
.. it is safer to treat inputs using their *group*. For example:
890

891 892
.. .. code-block python
..    :linenos:
893

894
..     class Algorithm:
895

896 897
..         def __init__(self):
..             self.models = None
898

899
..         def prepare(self, data_loaders):
900

901
..             #??? Is the concept of groups any use when we have dataloaders assuming this scenario???
902

903 904 905 906 907 908
..             # Loads the "model" data at the beginning
..             loader = data_loaders.loaderOf('in3')
..             for i in range(loader.count()):
..                 view = loader.view('in3', i)
..                 data, _, _ = view[0]
..                 self.models.append(data['in3'].value)
909

910 911
..         def process(self, inputs, data_loaders, outputs):
..             # N.B.: this will be called for every unit in `in'
912

913 914 915 916 917 918 919
..             # Loads the "model" data at the beginning, once
..             if self.models is None:
..                 self.models = []
..                 group = inputs.groupOf('in3')
..                 while group.hasMoreData():
..                     group.next() #synchronously advances the data
..                     self.models.append(group['in3'].data.value)
920

921 922 923 924
..             # Processes the current input in `in' and `in2', apply the model/models
..             out = magical_processing(inputs['in'].data.value,
..                                      inputs['in2'].data.value,
..                                      self.models)
925

926 927
..             # Writes the output
..             outputs.write({'value': out})
928

929
..             return True
930 931


932 933
.. code-block c++
..    :linenos:
934

935 936 937
..     class Algorithm: public IAlgorithmSequential
..     {
..     public:
938
        bool prepare(const beat::backend::cxx::DataLoaderList& data_load_list) override
939 940 941 942
..         {
..             auto loader = data_load_list["in3"];
..             for (int i = 0 ; i < loader->count() ; ++i) {
..                 auto view = loader->view("in3", i);
943 944 945
                std::map<std::string, beat::backend::cxx::Data *> data;
                std::tie(data, std::ignore, std::ignore) = (*view)[0];
                auto model = static_cast<user::model*>(data["in3"]);
946 947
..                 models.append(*model);
..             }
948

949 950
..             return true;
..         }
951

952 953 954
..         bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
..         {
..             // N.B.: this will be called for every unit in `in'
955

956
..             // Processes the current input in `in' and `in2', apply the model/models
957 958
            auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
                                          inputs["in2"]->data<user::format_1>()->value,
959
..                                           models);
960

961
..             // Writes the output
962
            user::other_1 result;
963
..             result.value = out;
964

965
..             outputs["out"]->write(&result);
966

967 968
..             return true;
..         }
969

970
..     public:
971
        std::vector<user::model_1> models;
972
..     };
973 974


975 976
.. In practice, encoding your algorithms using *groups* instead of looping over
.. individual inputs makes the code more robust to changes.
977 978 979



980
.. _beat-system-algorithms-input:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
981 982 983 984

Handling input data
-------------------

985
.. _beat-system-algorithms-input-inputlist:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
986 987 988 989 990 991

Input list
..........

An algorithm is given access to the **list of the inputs** of the processing
block. This list can be used to access each input individually, either by
992
their name (see section :ref:`beat-system-algorithms-input-name`), their index
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009
or by iterating over the list:

.. code-block:: python

    # 'inputs' is the list of inputs of the processing block

    print(inputs['labels'].data_format)

    for index in range(0, inputs.length):
        print(inputs[index].data_format)

    for input in inputs:
        print(input.data_format)

    for input in inputs[0:2]:
        print(input.data_format)

Zohreh MOSTAANI's avatar
Zohreh MOSTAANI committed
1010
Additionally, the following method is usable on a **list of inputs**:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1011 1012 1013 1014 1015 1016 1017

.. py:method:: hasMoreData()

    Indicates if there is (at least) another block of data to process on some of
    the inputs


1018
.. _beat-system-algorithms-input-input:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035

Input
.....

Each input provides the following informations:

.. py:attribute:: name

    *(string)* Name of the input

.. py:attribute:: data_format

    *(string)* Data format accepted by the input

.. py:attribute:: data_index

    *(integer)* Index of the last block of data received on the input (See section
1036
    :ref:`beat-system-algorithms-input-synchronization`)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1037 1038 1039 1040 1041 1042 1043 1044

.. py:attribute:: data

    *(object)* The last block of data received on the input

The structure of the ``data`` object is dependent of the data format assigned to
the input. Note that ``data`` can be *None*.

1045
.. _beat-system-algorithms-input-name:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1046 1047 1048 1049 1050

Input naming
............

Each algorithm assign a name of its choice to each input (and output, see
1051
section :ref:`beat-system-algorithms-output-name`). This mechanism ensures that algorithms
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1052 1053
are easily shareable between users.

1054
For instance, in :numref:`beat-system-algorithms-input-naming`, two different users
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1055 1056 1057 1058 1059 1060 1061
(Joe and Bill) are using two different toolchains. Both toolchains have one
block with two entries and one output, with a similar set of data formats
(*image/rgb* and *label* on the inputs, *array/float* on the output), although
not in the same order. The two blocks use different algorithms, which both
refers to their inputs and outputs using names of their choice

Nevertheless, Joe can choose to use Bill's algorithm instead of his own one.
1062
When the algorithm to use is changed, BEAT will
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1063 1064 1065 1066 1067 1068 1069 1070 1071
attempt to match each input with the names (and types) declared by the
algorithm. In case of ambiguity, the user will be asked to manually resolve it.

In other words: the way the block is connected in the toolchain doesn't force a
naming scheme or a specific order of inputs to the algorithms used in that
block. As long as the set of data types (on the inputs and outputs) is
compatible for both the block and the algorithm, the algorithm can be used in
the block.

1072
.. _beat-system-algorithms-input-naming:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099
.. figure:: ./img/inputs-naming.*

   Different toolchains, but interchangeable algorithms

The name of the inputs are assigned in the JSON declaration of the algorithm,
such as:

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "name1": {
                        "type": "data_format_1"
                    },
                    "name2": {
                        "type": "data_format_2"
                    }
                }
            }
        ],
        ...
    }


1100
.. _beat-system-algorithms-input-synchronization:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1101 1102 1103 1104 1105 1106

Inputs synchronization
......................

The data available on the different inputs from the synchronized channels
are (of course) synchronized. Let's consider the example toolchain on
1107
:numref:`beat-system-algorithms-input-synchronization-example`, where:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124

* The image database provides two kind of data: some *images* and their
  associated *labels*
* The *block A* receives both data via its inputs
* The *block B* only receives the *labels*
* Both algorithms are *data-driven*

The system will ask the *block A* to process 6 images, one by one. On the
second input, the algorithm will find the correct label for the current image.
The ``block B`` will only be asked to process 2 labels.

The algorithm can retrieve the index of the current block of data of each of
its input by looking at their ``data_index`` attribute. For simplicity, the
list of inputs has two attributes (``current_data_index`` and
``current_end_data_index``) that indicates the data indexes currently used by
the synchronization mechanism of the platform.

1125
.. _beat-system-algorithms-input-synchronization-example:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1126 1127 1128 1129 1130 1131
.. figure:: ./img/inputs-synchronization.*
   :width: 80%

   Synchronization example


1132
.. _beat-system-algorithms-input-unsynchronized:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1133 1134 1135 1136 1137 1138 1139 1140

Additional input methods for unsynchronized channels
....................................................

Unsynchronized input channels of algorithms can be accessed at will, and
algorithms can use it any way they want. To be able to perform their job, they
have access to additional methods.

Zohreh MOSTAANI's avatar
Zohreh MOSTAANI committed
1141
The following method is usable on a **list of inputs**:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1142 1143 1144 1145 1146 1147 1148 1149

.. py:method:: next()

    Retrieve the next block of data on all the inputs **in a synchronized
    manner**


Let's come back at the example toolchain on
1150
:numref:`beat-system-algorithms-input-synchronization-example`, and assume
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1151 1152 1153 1154 1155 1156 1157
that *block A* uses an autonomous algorithm. To iterate over all the data on
its inputs, the algorithm would do:

.. code-block:: python

    class Algorithm:

1158
        def process(self, inputs, data_loaders, outputs):
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172

            # Iterate over all the unsynchronized data
            while inputs.hasMoreData():
                inputs.next()

                # Do something with inputs['images'].data and inputs['labels'].data
                ...

            # At this point, there is no more data available on inputs['images'] and
            # inputs['labels']

            return True


Zohreh MOSTAANI's avatar
Zohreh MOSTAANI committed
1173
The following methods are usable on an ``input``, in cases where the algorithm
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218
doesn't care about the synchronization of some of its inputs:

.. py:method:: hasMoreData()

    Indicates if there is (at least) another block of data available on the input

.. py:method:: next()

    Retrieve the next block of data

    .. warning::

       Once this method has been called by an algorithm, the input is no more
       automatically synchronized with the other inputs of the block.

In the following example, the algorithm desynchronizes one of its inputs but
keeps the others synchronized and iterate over all their data:

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    },
                    "desynchronized": {
                        "type": "number"
                    }
                }
            }
        ],
        ...
    }


.. code-block:: python

    class Algorithm:

1219
        def process(self, inputs, data_loaders, utputs):
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237

            # Desynchronize the third input. From now on, inputs['desynchronized'].data
            # and inputs['desynchronized'].data_index won't change
            inputs['desynchronized'].next()

            # Iterate over all the data on the inputs still synchronized
            while inputs.hasMoreData():
                inputs.next()

                # Do something with inputs['images'].data and inputs['labels'].data
                ...

            # At this point, there is no more data available on inputs['images'] and
            # inputs['labels'], but there might be more on inputs['desynchronized']

            return True


1238
.. _beat-system-algorithms-input-feedbackloop:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1239 1240 1241 1242

Feedback inputs
...............

1243
The :numref:`beat-system-algorithms-input-feedbackloop-example` shows a toolchain
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1244 1245 1246 1247 1248 1249 1250
containing a feedback loop. A special kind of input is needed in this scenario:
a *feedback input*, that isn't synchronized with the other inputs, and can be
freely used by the algorithm.

Those feedback inputs aren't yet implemented in the prototype of the platform.
This will be addressed in a later version.

1251
.. _beat-system-algorithms-input-feedbackloop-example:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1252 1253 1254 1255 1256
.. figure:: ./img/feedback-loop.*

    Feedback loop


1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267

.. _beat-system-algorithms-dataloaders:

Data loaders
------------

.. _beat-system-algorithms-dataloaders-dataloaderlist:

DataLoader list
...............

1268
An algorithm is given access to the **list of data loaders** of the processing
1269
block. This list can be used to access each data loader individually, either by
1270
their channel name (see :ref:`beat-system-algorithms-input-name`), their
1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298
index or by iterating over the list:


.. code-block:: python

    # 'data_loaders' is the list of data loaders of the processing block

    # Retrieve a data loader by name
    data_loader = data_loaders['labels']

    # Retrieve a data loader by index
    for index in range(0, len(data_loaders)):
        data_loader = data_loaders[index]

    # Iteration over all data loaders
    for data_loader in data_loaders:
        ...

    # Retrieve the data loader an input belongs to, by input name
    data_loader = data_loaders.loaderOf('label')


.. _beat-system-algorithms-dataloaders-dataloader:

DataLoader
..........

Provides access to data from a group of inputs synchronized together.
1299
See :py:class:`DataLoader`.
1300

1301
.. _beat-system-algorithms-output:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1302 1303 1304 1305

Handling output data
--------------------

1306
.. _beat-system-algorithms-output-outputlist:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1307 1308 1309 1310 1311 1312

Output list
...........

An algorithm is given access to the **list of the outputs** of the processing
block.  This list can be used to access each output individually, either by
1313
their name (see section :ref:`beat-system-algorithms-output-name`), their index
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331
or by iterating over the list:

.. code-block:: python

    # 'outputs' is the list of outputs of the processing block

    print outputs['features'].data_format

    for index in range(0, outputs.length):
        outputs[index].write(...)

    for output in outputs:
        output.write(...)

    for output in outputs[0:2]:
        output.write(...)


1332
.. _beat-system-algorithms-output-output:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347

Output
......

Each output provides the following informations:

.. py:attribute:: name

    *(string)* Name of the output

.. py:attribute:: data_format

    *(string)* Format of the data written on the output


1348
And the following method:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1349 1350 1351 1352 1353 1354

.. py:method:: write(data, end_data_index=None)

    Write a block of data on the output


1355
We'll look at the usage of this method through some examples in the following
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1356 1357 1358
sections.


1359
.. _beat-system-algorithms-output-name:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1360 1361 1362 1363 1364

Output naming
.............

Like for its inputs, each algorithm assign a name of its choice to each output
1365
(see section :ref:`beat-system-algorithms-input-name` for more details) by
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391
including them in the JSON declaration of the algorithm.


.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    ...
                },
                "outputs": {
                    "name1": {
                        "type": "data_format1"
                    },
                    "name2": {
                        "type": "data_format2"
                    }
                }
            }
        ],
        ...
    }


1392
.. _beat-system-algorithms-output-example1:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1393 1394 1395 1396

Example 1: Write one block of data for each received block of data
..................................................................

1397
.. _beat-system-algorithms-output-example1-figure:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1398 1399 1400 1401 1402
.. figure:: ./img/outputs-example1.*

   Example 1: 6 images as input, 6 blocks of data produced

Consider the example toolchain on
1403
:numref:`beat-system-algorithms-output-example1-figure`. We will implement a
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439
*data-driven* algorithm that will write one block of data on the output of the
block for each image received on its inputs. This is the simplest case.

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    }
                },
                "outputs": {
                    "features": {
                        "type": "array/float"
                    }
                }
            }
        ],
        ...
    }


.. code-block:: python

    class Algorithm:

        def process(self, inputs, outputs):

            # Compute something from inputs['images'].data and inputs['labels'].data
            # and store the result in 'data'
1440
            data = ...
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451

            # Write our data block on the output
            outputs['features'].write(data)

            return True


The structure of the ``data`` object is dependent of the data format assigned
to the output.


1452
.. _beat-system-algorithms-output-example2:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1453 1454 1455 1456

Example 2: Skip some blocks of data
...................................

1457
.. _beat-system-algorithms-output-example2-figure:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1458 1459 1460 1461 1462 1463
.. figure:: ./img/outputs-example2.*

   Example 2: 6 images as input, 4 blocks of data produced, 2 blocks of data
   skipped

Consider the example toolchain on
1464
:numref:`beat-system-algorithms-output-example2-figure`. This time, our algorithm
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496
will use a criterion to decide if it can perform its computation on an image or
not, and tell the platform that, for a particular data index, no data is
available.

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    }
                },
                "outputs": {
                    "features": {
                        "type": "array/float"
                    }
                }
            }
        ],
        ...
    }

.. code-block:: python

    class Algorithm:

1497
        def process(self, inputs, data_loaders, outputs):
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1498 1499 1500 1501 1502 1503

            # Use a criterion on the image to determine if we can perform our
            # computation on it or not
            if can_compute(inputs['images'].data):
                # Compute something from inputs['images'].data and inputs['labels'].data
                # and store the result in 'data'
1504
                data = ...
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519

                # Write our data block on the output
                outputs['features'].write(data)
            else:
                # Tell the platform that no data is available for this image
                outputs['features'].write(None)

            return True

        def can_compute(self, image):
            # Implementation of our criterion
            ...
            return True # or False


1520
.. _beat-system-algorithms-output-example3:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1521 1522 1523 1524

Example 3: Write one block of data related to several received blocks of data
.............................................................................

1525
.. _beat-system-algorithms-output-example3-figure:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1526 1527 1528 1529 1530
.. figure:: ./img/outputs-example3.*

   Example 3: 6 images as input, 2 blocks of data produced

Consider the example toolchain on
1531
:numref:`beat-system-algorithms-output-example3-figure`. This time, our algorithm
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1532 1533 1534 1535 1536 1537 1538
will compute something using all the images with the same label (all the dogs,
all the cats) and write only one block of data related to all those images.

The key here is the correct usage of the **current end data index** of the
input list to specify the indexes of the blocks of data we write on the output.
This ensure that the data will be synchronized everywhere in the toolchain: the
platform can now tell, for each of our data block, which image and label it
1539
relates to (See section :ref:`beat-system-algorithms-input-synchronization`).
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580

Additionally, since we can't know in advance if the image currently processed
is the last one with the current label, we need to memorize the current data
index of the input list to correctly assign it later when we effectively write
the data block on the output.

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    }
                },
                "outputs": {
                    "features": {
                        "type": "array/float"
                    }
                }
            }
        ],
        ...
    }

.. code-block:: python

    class Algorithm:

        def __init__(self):
            self.data = None                # Block of data updated each time we
                                            # receive a new image
            self.current_label = None       # Label of the images currently processed
            self.previous_data_index = None # Data index of the input list during the
                                            # processing of the previous image

1581
        def process(self, inputs, data_loaders, outputs):
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1582 1583 1584 1585 1586
            # Determine if we already processed some image(s)
            if self.data is not None:
                # Determine if the label has changed since the last image we processed
                if inputs['labels'].data.name != self.current_label:
                    # Write the block of data on the output
1587
                    outputs['features'].write(self.data, self.previous_data_index)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1588 1589 1590 1591 1592 1593 1594
                    self.data = None

            # Memorize the current data index of the input list
            self.previous_data_index = inputs.current_end_data_index

            # Create a new block of data if necessary
            if self.data is None:
1595
                self.data = ...
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611

                # Remember the label we are currently processing
                self.current_label = inputs['labels'].data.name

            # Compute something from inputs['images'].data and inputs['labels'].data
            # and update the content of 'self.data'
            ...

            # Determine if this was the last block of data or not
            if not(inputs.hasMoreData()):
                # Write the block of data on the output
                outputs['features'].write(self.data, inputs.current_end_data_index)

            return True


1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622
.. _beat-system-algorithms-loop-channel:

Soft loop communication
-----------------------

The processor and evaluator algorithm components of the soft loop macro block
communicate with each other using a LoopChannel object. This object defines the
two dataformats that will be used to make the request and the answer that will
transit through the loop channel. This class is only meant to be used by the
algorithm implementer.

1623
.. _beat-system-algorithms-api-migration:
1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648

Migrating from API v1 to API v2
-------------------------------

Algorithm that have been written using BEAT's algorithm v1 can still be run under
v2 execution model. They are now considered legacy algorithm and should be ported
quickly to the API v2.

API v2 provides two different types of algorithms:
- Sequential
- Autonomous

The Sequential type follows the same code execution model as the v1 API, meaning
that the process function is called once for each input item.

The Autonomous type allows the developer to load the input data at will therefor
the process method will only be called once. This allows for example to optimize
loading of data to the GPU memory for faster execution.

The straightforward migration path from v1 to v2 is to make a Sequential algorithm
which will require only a few changes regarding the code.

API V1:

.. code-block:: python
1649

1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668
    class Algorithm:

        def setup(self, parameters):
            self.sync = parameters['sync']
            return True


        def process(self, inputs, outputs):
            if inputs[self.sync].isDataUnitDone():
                outputs['out'].write({
                    'value': inputs['in1'].data.value + inputs['in2'].data.value,
                })

            return True


API V2 sequential:

.. code-block:: python
1669

1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688
    class Algorithm:

        def setup(self, parameters):
            self.sync = parameters['sync']
            return True


        def process(self, inputs, data_loaders, outputs):
            if inputs[self.sync].isDataUnitDone():
                outputs['out'].write({
                    'value': inputs['in1'].data.value + inputs['in2'].data.value,
                })

            return True


API V2 automous:

.. code-block:: python
1689

1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711
    class Algorithm:

        def setup(self, parameters):
            self.sync = parameters['sync']
            return True


        def process(self, data_loaders, outputs):
            data_loader = data_loaders.loaderOf('in1')

            for i in range(data_loader.count(self.sync)):
                view = data_loader.view(self.sync, i)

                (data, start, end) = view[view.count() - 1]

                outputs['out'].write({
                        'value': data['in1'].value + data['in2'].value,
                    },
                    end
                )

            return True
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
1712
.. include:: links.rst