[doc] Add explanations about the development of C++ based algorithms

cec9c92a · Philip ABBET · 2825180c · cec9c92a
Commit cec9c92a authored 8 years ago by Philip ABBET
--- a/doc/user/algorithms/guide.rst
+++ b/doc/user/algorithms/guide.rst
@@ -51,14 +51,15 @@ toolchain. Flow synchronization is determined by data units produced from a
 dataset and injected into the toolchain.

 Code for algorithms may be implemented in any programming language supported by
-|project|. At present, only a Python backend has been integrated and,
-therefore, algorithms are expected to be implemented in this language.  (In
-future, other backends will be added to |project|.) Python code implementing a
-certain algorithm can be created using our web-based :ref:`algorithm editor`.
+|project|. At present, only two backends have been integrated, supporting Python
+and C++, therefore, algorithms are expected to be implemented in one of those
+languages.  (In future, other backends will be added to |project|.) Python code implementing a certain algorithm can be created using our web-based
+:ref:`algorithm editor`. C++ based algorithms must be compiled using a provided
+docker container, and uploaded on the platform (see :ref:`binary algorithms`).

 |project| treats algorithms as objects that are derived from the class
-``Algorithm``. To define a new algorithm, at least one method must be
-implemented:
+``Algorithm`` (in Python) or ``IAlgorithm`` (in C++). To define a new algorithm,
+at least one method must be implemented:

  * ``process()``: the method that actually processes input and produces
    outputs.
@@ -71,10 +72,25 @@ Python):

   class Algorithm:

-       def process(self, inputs, outputs):
+        def process(self, inputs, outputs):
           # here, you read inputs, process and write results to outputs


+Here is the equivalent example in C++:
+
+.. code-block:: c++
+   :linenos:
+
+    class Algorithm: public IAlgorithm
+    {
+    public:
+        virtual bool process(const InputList& inputs, const OutputList& outputs)
+        {
+            // here, you read inputs, process and write results to outputs
+        }
+    };
+
+
 One particularity of the |project| platform is how the data-flow through a
 given toolchain is synchronized. The platform is responsible for extracting
 data units (images, speech-segments, videos, etc.) from the database and
@@ -112,26 +128,63 @@ An example code showing how to implement an algorithm in this configuration is s
 .. code-block:: python
   :linenos:

-   class Algorithm:
+    class Algorithm:

-       def process(self, inputs, outputs):
+        def process(self, inputs, outputs):

-           # to read the field "value" on the "in" input, use "data"
-           # a possible declaration of "user/format/1" would be:
-           # {
-           #   "value": ...
-           # }
-           value = inputs['in'].data.value
+            # to read the field "value" on the "in" input, use "data"
+            # a possible declaration of "user/format/1" would be:
+            # {
+            #   "value": ...
+            # }
+            value = inputs['in'].data.value

-           # do your processing and create the "output" value
-           output = magical_processing(value)
+            # do your processing and create the "output" value
+            output = magical_processing(value)
+
+            # to write "output" into the relevant endpoint use "write"
+            # a possible declaration of "user/other/1" would be:
+            # {
+            #   "value": ...
+            # }
+            outputs['out'].write({'value': output})
+
+            # No error occurred, so return True
+            return True
+
+
+.. code-block:: c++
+   :linenos:

-           # to write "output" into the relevant endpoint use "write"
-           # a possible declaration of "user/other/1" would be:
-           # {
-           #   "value": ...
-           # }
-           outputs['out'].write({'value': output})
+    class Algorithm: public IAlgorithm
+    {
+    public:
+        virtual bool process(const InputList& inputs, const OutputList& outputs)
+        {
+            // to read the field "value" on the "in" input, use "data"
+            // a possible declaration of "user/format/1" would be:
+            // {
+            //   "value": ...
+            // }
+            auto value = inputs["in"]->data<user::format_1>()->value;
+
+            // do your processing and create the "output" value
+            auto output = magical_processing(value);
+
+            // to write "output" into the relevant endpoint use "write"
+            // a possible declaration of "user/other/1" would be:
+            // {
+            //   "value": ...
+            // }
+            auto result = new user::other_1();
+            result->value = output;
+
+            outputs["out"]->write(result);
+
+            # No error occurred, so return true
+            return true;
+        }
+    };


 In this example, the platform will call the user algorithm every time a new
@@ -159,18 +212,41 @@ shown below:
 .. code-block:: python
   :linenos:

-   class Algorithm:
+    class Algorithm:
+
+        def process(self, inputs, outputs):
+
+            i1 = inputs['in'].data.value
+            i2 = inputs['in2'].data.value
+
+            out = magical_processing(i1, i2)
+
+            outputs['out'].write({'value': out})

-       def process(self, inputs, outputs):
+            return True

-           i1 = inputs['in'].data.value
-           i2 = inputs['in2'].data.value

-           out = magical_processing(i1, i2)
+.. code-block:: c++
+   :linenos:
+
+    class Algorithm: public IAlgorithm
+    {
+    public:
+        virtual bool process(const InputList& inputs, const OutputList& outputs)
+        {
+            auto i1 = inputs["in"]->data<user::format_1>()->value;
+            auto i2 = inputs["in2"]->data<user::format_1>()->value;
+
+            auto out = magical_processing(i1, i2);

-           outputs['out'].write({'value': out})
+            auto result = new user::other_1();
+            result->value = out;

-           return True
+            outputs["out"]->write(result);
+
+            return true;
+        }
+    };


 You should notice that we still don't require any sort of ``for`` loops! The
@@ -201,20 +277,50 @@ The example below illustrates how such an algorithm could be implemented:
 .. code-block:: python
   :linenos:

-   class Algorithm:
+    class Algorithm:

-       def __init__(self):
-           self.objs = []
+        def __init__(self):
+            self.objs = []

-       def process(self, inputs, outputs):
-           self.objs.append(inputs['in'].data.value) #accumulates
+        def process(self, inputs, outputs):
+            self.objs.append(inputs['in'].data.value) # accumulates

-           if inputs['in2'].isDataUnitDone():
-               out = magical_processing(self.objs)
-               outputs['out'].write({'value': out})
-               self.images = [] #reset accumulator for next label
+            if inputs['in2'].isDataUnitDone():
+                out = magical_processing(self.objs)
+                outputs['out'].write({'value': out})
+                self.images = [] #reset accumulator for next label

-           return True
+            return True
+
+
+.. code-block:: c++
+   :linenos:
+
+    class Algorithm: public IAlgorithm
+    {
+    public:
+        virtual bool process(const InputList& inputs, const OutputList& outputs)
+        {
+            objs.push_back(inputs["in"]->data<user::format_1>()->value); // accumulates
+
+            if (inputs["in2"]->isDataUnitDone())
+            {
+                auto out = magical_processing(objs);
+
+                auto result = new user::other_1();
+                result->value = out;
+
+                outputs["out"]->write(result);
+
+                objs.clear();   // reset accumulator for next label
+            }
+
+            return true;
+        }
+
+    public:
+        std::vector<float> objs;
+    };


 Here, the units received at the endpoint ``in`` are accumulated as long as the
@@ -246,31 +352,66 @@ unsynchronized input (``in2``).
 .. code-block:: python
   :linenos:

-   class Algorithm:
+    class Algorithm:
+
+        def __init__(self):
+            self.models = None
+
+        def process(self, inputs, outputs):
+            # N.B.: this will be called for every unit in `in'
+
+            # Loads the "model" data at the beginning, once
+            if self.models is None:
+                self.models = []
+                while inputs['in2'].hasMoreData():
+                    inputs['in2'].next()
+                    self.models.append(inputs['in2'].data.value)
+
+            # Processes the current input in `in', apply the model/models
+            out = magical_processing(inputs['in'].data.value, self.models)

+            # Writes the output
+            outputs.write({'value': out})

-       def __init__(self):
+            return True

-           self.models = None

+.. code-block:: c++
+   :linenos:
+
+    class Algorithm: public IAlgorithm
+    {
+    public:
+        virtual bool process(const InputList& inputs, const OutputList& outputs)
+        {
+            // N.B.: this will be called for every unit in `in'
+
+            // Loads the "model" data at the beginning, once
+            if (models.empty())
+            {
+                while (inputs["in2"]->hasMoreData())
+                {
+                    inputs["in2"]->next();
+                    auto model = inputs["in2"]->data<user::model_1>();
+                    models.push_back(*model);
+                }
+            }

-       def process(self, inputs, outputs):
-           # N.B.: this will be called for every unit in `in'
+            // Processes the current input in `in', apply the model/models
+            auto out = magical_processing(inputs["in"]->data<user::format_1>()->value, models);

-           # Loads the "model" data at the beginning, once
-           if self.models is None:
-               self.models = []
-               while inputs['in2'].hasMoreData():
-                   inputs['in2'].next()
-                   self.models.append(inputs['in2'].data.value)
+            // Writes the output
+            auto result = new user::other_1();
+            result->value = out;

-           # Processes the current input in `in', apply the model/models
-           out = magical_processing(inputs['in'].data.value, self.models)
+            outputs["out"]->write(result);

-           # Writes the output
-           outputs.write({'value': out})
+            return true;
+        }

-           return True
+    public:
+        std::vector<user::model_1> models;
+    };


 It may happen that you have several inputs which are synchronized together, but
@@ -280,32 +421,69 @@ it is safer to treat inputs using their *group*. For example:
 .. code-block:: python
   :linenos:

-   class Algorithm:
+    class Algorithm:

+        def __init__(self):
+            self.models = None

-       def __init__(self):

-           self.models = None
+        def process(self, inputs, outputs):
+            # N.B.: this will be called for every unit in `in'

+            # Loads the "model" data at the beginning, once
+            if self.models is None:
+                self.models = []
+                group = inputs.groupOf('in2')
+                while group.hasMoreData():
+                    group.next() #synchronously advances the data
+                    self.models.append(group['in2'].data.value)

-       def process(self, inputs, outputs):
-           # N.B.: this will be called for every unit in `in'
+            # Processes the current input in `in', apply the model/models
+            out = magical_processing(inputs['in'].data.value, self.models)

-           # Loads the "model" data at the beginning, once
-           if self.models is None:
-               self.models = []
-               group = inputs.groupOf('in2')
-               while group.hasMoreData():
-                   group.next() #synchronously advances the data
-                   self.models.append(group['in2'].data.value)
+            # Writes the output
+            outputs.write({'value': out})

-           # Processes the current input in `in', apply the model/models
-           out = magical_processing(inputs['in'].data.value, self.models)
+            return True

-           # Writes the output
-           outputs.write({'value': out})

-           return True
+.. code-block:: c++
+   :linenos:
+
+    class Algorithm: public IAlgorithm
+    {
+    public:
+        virtual bool process(const InputList& inputs, const OutputList& outputs)
+        {
+            // N.B.: this will be called for every unit in `in'
+
+            // Loads the "model" data at the beginning, once
+            if (models.empty())
+            {
+                auto group = inputs->groupOf("in2");
+                while (group->hasMoreData())
+                {
+                    group->next(); // synchronously advances the data
+                    auto model = group["in2"]->data<user::model_1>();
+                    models.push_back(*model);
+                }
+            }
+
+            // Processes the current input in `in', apply the model/models
+            auto out = magical_processing(inputs["in"]->data<user::format_1>()->value, models);
+
+            // Writes the output
+            auto result = new user::other_1();
+            result->value = out;
+
+            outputs["out"]->write(result);
+
+            return true;
+        }
+
+    public:
+        std::vector<user::model_1> models;
+    };


 In practice, encoding your algorithms using *groups* instead of looping over
@@ -353,7 +531,7 @@ image below.
 There are two types of algorithm in the editor: Analyzer, and Splittable.
 Analyzer algorithms are special algorithms where the purpose is to generate
 statistics about the processing results (graphs, means, variances, etc.).
-Usual, biometric data processing algorithms are of type Splittable, indicating
+Usually, biometric data processing algorithms are of type Splittable, indicating
 to the platform that these algorithms can be executed in a distributed fashion,
 depending on the available computing resources.

@@ -372,7 +550,7 @@ You should see a web-page similar to what is displayed below:
 .. image:: img/algorithm_new.*


-For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_.??????
+For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_.


 Edit an existing algorithm
@@ -403,15 +581,20 @@ Please refer to the Section of `algorithm editor`_ for creating an algorithm.
 Editor
 ------

-To create an algorithm,  there are six sections which are:
+To create an algorithm,  there are seven sections which are:

  * Name:  the name of algorithm.
  * Algorithm type: Analyzer or Splittable.
+  * Language: The language used to implement the algorithm (Python or C++).
  * Documentation: This is used to describe your algorithm.
-  * Source code: The (Python) code implementing the algorithm.
  * Inputs / Outputs: Define the properties of the Input and Output endpoints for this algorithm.
  * Parameters: Define the configuration-parameters for the algorithm.
+
+For Python-based algorithms only:
+
  * Libraries: If there are functions in a library, you can add them for the algorithm to use.
+  * Source code: The (Python) code implementing the algorithm.
+

 You should see a webpage similar to what is displayed below:

@@ -440,6 +623,89 @@ your algorithm code, to help with your debugging.
   very last 4 kilobytes of these streams is kept for user inspection.


+.. _binary algorithms:
+
+Implementing an algorithm in C++
+--------------------------------
+
+Prerequisite: Configure your command-line client
+================================================
+
+In order to ensure that your compiled algorithm will works on the |project| platform,
+you must compile it using our docker image called *beats/client*. Once downloaded,
+you'll need to configure the command-line tool to access your account on the |project|
+platform:
+
+.. code-block:: bash
+
+    $ docker run -ti beats/client:0.1.5 bash
+    /# cd home
+    /home# beat config set user <your_user_name>
+    /home# beat config set token "<your_token>"
+    /home# beat config save
+
+Here, ``<your_user_name>`` is your username on the |project| platform, and
+``<your_token>`` can be retrieved from your settings page. Note that we use the
+``/home`` folder to save everything, but feel free to use the one you want.
+
+
+Algorithm compilation
+=====================
+
+To implement an algorithm in C++, follow the following steps:
+
+  1. Create the algorithm on the |project| platform, by selecting the C++ language.
+  Declare all the needed inputs, outputs and parameters.
+
+  2. Using the ``beat`` command-line tool, download the algorithm declaration from the
+  |project| platform (note that all the necessary data formats will be dowloaded too):
+
+.. code-block:: bash
+
+    /home# beat algorithms pull <your_user_name>/<algorithm_name>/<version>
+
+At this point, the folder ``/home/algorithms/<your_user_name>/<algorithm_name>/``
+will contain the declaration of your algorithm in JSON format, and ``/home/dataformats/``
+will contain the declaration files of the data formats used by the algorithm.
+
+  3. Generate the C++ files corresponding to the algorithm declaration:
+
+.. code-block:: bash
+
+    /home# generate_cxx.py . <your_user_name>/<algorithm_name>/<version>
+
+At this point, the folder ``/home/algorithms/<your_user_name>/<algorithm_name>/``
+will contain a few new C++ files:
+
+    * one header/source file for each needed data format
+    * ``beat_setup.h`` and ``beat_setup.cpp``: used by the platform to learn everything
+      it needs to know about your algorithm
+    * ``algorithm.h`` and ``algorithm.cpp``: you will implement your algorithm in those
+      files
+
+Feel free to add as many other files as you need for your implementation.
+
+   4. Implement your algorithm in ``algorithm.h`` and ``algorithm.cpp``
+
+   5. Compile your code as a shared library (an example CMake file was generated, you can
+   either modify it to add your own files or use another build system if you want). Note
+   that the |project| platform expect you to upload one and only one *shared library*, so
+   if your algorithm has any dependencies, you must link them statically inside the shared
+   library:
+
+.. code-block:: bash
+
+    /home# cd algorithms/<your_user_name>/<algorithm_name>/
+    /home/algorithms/<your_user_name>/<algorithm_name># mkdir build
+    /home/algorithms/<your_user_name>/<algorithm_name># cd build
+    /home/algorithms/<your_user_name>/<algorithm_name>/build# cmake ..
+    /home/algorithms/<your_user_name>/<algorithm_name>/build# make
+
+This will produce a file called ``<version>.so`` in the ``/home/algorithms/<your_user_name>/<algorithm_name>/`` folder.
+
+   6. Upload the shared library on the |project| platform, from the algorithm page.
+
+
 Sharing an Algorithm
 ---------------------