Skip to content
Snippets Groups Projects
Commit cec9c92a authored by Philip ABBET's avatar Philip ABBET
Browse files

[doc] Add explanations about the development of C++ based algorithms

parent 2825180c
No related branches found
No related tags found
No related merge requests found
Pipeline #
......@@ -51,14 +51,15 @@ toolchain. Flow synchronization is determined by data units produced from a
dataset and injected into the toolchain.
Code for algorithms may be implemented in any programming language supported by
|project|. At present, only a Python backend has been integrated and,
therefore, algorithms are expected to be implemented in this language. (In
future, other backends will be added to |project|.) Python code implementing a
certain algorithm can be created using our web-based :ref:`algorithm editor`.
|project|. At present, only two backends have been integrated, supporting Python
and C++, therefore, algorithms are expected to be implemented in one of those
languages. (In future, other backends will be added to |project|.) Python code implementing a certain algorithm can be created using our web-based
:ref:`algorithm editor`. C++ based algorithms must be compiled using a provided
docker container, and uploaded on the platform (see :ref:`binary algorithms`).
|project| treats algorithms as objects that are derived from the class
``Algorithm``. To define a new algorithm, at least one method must be
implemented:
``Algorithm`` (in Python) or ``IAlgorithm`` (in C++). To define a new algorithm,
at least one method must be implemented:
* ``process()``: the method that actually processes input and produces
outputs.
......@@ -71,10 +72,25 @@ Python):
class Algorithm:
def process(self, inputs, outputs):
def process(self, inputs, outputs):
# here, you read inputs, process and write results to outputs
Here is the equivalent example in C++:
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// here, you read inputs, process and write results to outputs
}
};
One particularity of the |project| platform is how the data-flow through a
given toolchain is synchronized. The platform is responsible for extracting
data units (images, speech-segments, videos, etc.) from the database and
......@@ -112,26 +128,63 @@ An example code showing how to implement an algorithm in this configuration is s
.. code-block:: python
:linenos:
class Algorithm:
class Algorithm:
def process(self, inputs, outputs):
def process(self, inputs, outputs):
# to read the field "value" on the "in" input, use "data"
# a possible declaration of "user/format/1" would be:
# {
# "value": ...
# }
value = inputs['in'].data.value
# to read the field "value" on the "in" input, use "data"
# a possible declaration of "user/format/1" would be:
# {
# "value": ...
# }
value = inputs['in'].data.value
# do your processing and create the "output" value
output = magical_processing(value)
# do your processing and create the "output" value
output = magical_processing(value)
# to write "output" into the relevant endpoint use "write"
# a possible declaration of "user/other/1" would be:
# {
# "value": ...
# }
outputs['out'].write({'value': output})
# No error occurred, so return True
return True
.. code-block:: c++
:linenos:
# to write "output" into the relevant endpoint use "write"
# a possible declaration of "user/other/1" would be:
# {
# "value": ...
# }
outputs['out'].write({'value': output})
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// to read the field "value" on the "in" input, use "data"
// a possible declaration of "user/format/1" would be:
// {
// "value": ...
// }
auto value = inputs["in"]->data<user::format_1>()->value;
// do your processing and create the "output" value
auto output = magical_processing(value);
// to write "output" into the relevant endpoint use "write"
// a possible declaration of "user/other/1" would be:
// {
// "value": ...
// }
auto result = new user::other_1();
result->value = output;
outputs["out"]->write(result);
# No error occurred, so return true
return true;
}
};
In this example, the platform will call the user algorithm every time a new
......@@ -159,18 +212,41 @@ shown below:
.. code-block:: python
:linenos:
class Algorithm:
class Algorithm:
def process(self, inputs, outputs):
i1 = inputs['in'].data.value
i2 = inputs['in2'].data.value
out = magical_processing(i1, i2)
outputs['out'].write({'value': out})
def process(self, inputs, outputs):
return True
i1 = inputs['in'].data.value
i2 = inputs['in2'].data.value
out = magical_processing(i1, i2)
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
auto i1 = inputs["in"]->data<user::format_1>()->value;
auto i2 = inputs["in2"]->data<user::format_1>()->value;
auto out = magical_processing(i1, i2);
outputs['out'].write({'value': out})
auto result = new user::other_1();
result->value = out;
return True
outputs["out"]->write(result);
return true;
}
};
You should notice that we still don't require any sort of ``for`` loops! The
......@@ -201,20 +277,50 @@ The example below illustrates how such an algorithm could be implemented:
.. code-block:: python
:linenos:
class Algorithm:
class Algorithm:
def __init__(self):
self.objs = []
def __init__(self):
self.objs = []
def process(self, inputs, outputs):
self.objs.append(inputs['in'].data.value) #accumulates
def process(self, inputs, outputs):
self.objs.append(inputs['in'].data.value) # accumulates
if inputs['in2'].isDataUnitDone():
out = magical_processing(self.objs)
outputs['out'].write({'value': out})
self.images = [] #reset accumulator for next label
if inputs['in2'].isDataUnitDone():
out = magical_processing(self.objs)
outputs['out'].write({'value': out})
self.images = [] #reset accumulator for next label
return True
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
objs.push_back(inputs["in"]->data<user::format_1>()->value); // accumulates
if (inputs["in2"]->isDataUnitDone())
{
auto out = magical_processing(objs);
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
objs.clear(); // reset accumulator for next label
}
return true;
}
public:
std::vector<float> objs;
};
Here, the units received at the endpoint ``in`` are accumulated as long as the
......@@ -246,31 +352,66 @@ unsynchronized input (``in2``).
.. code-block:: python
:linenos:
class Algorithm:
class Algorithm:
def __init__(self):
self.models = None
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
while inputs['in2'].hasMoreData():
inputs['in2'].next()
self.models.append(inputs['in2'].data.value)
# Processes the current input in `in', apply the model/models
out = magical_processing(inputs['in'].data.value, self.models)
# Writes the output
outputs.write({'value': out})
def __init__(self):
return True
self.models = None
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// N.B.: this will be called for every unit in `in'
// Loads the "model" data at the beginning, once
if (models.empty())
{
while (inputs["in2"]->hasMoreData())
{
inputs["in2"]->next();
auto model = inputs["in2"]->data<user::model_1>();
models.push_back(*model);
}
}
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
// Processes the current input in `in', apply the model/models
auto out = magical_processing(inputs["in"]->data<user::format_1>()->value, models);
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
while inputs['in2'].hasMoreData():
inputs['in2'].next()
self.models.append(inputs['in2'].data.value)
// Writes the output
auto result = new user::other_1();
result->value = out;
# Processes the current input in `in', apply the model/models
out = magical_processing(inputs['in'].data.value, self.models)
outputs["out"]->write(result);
# Writes the output
outputs.write({'value': out})
return true;
}
return True
public:
std::vector<user::model_1> models;
};
It may happen that you have several inputs which are synchronized together, but
......@@ -280,32 +421,69 @@ it is safer to treat inputs using their *group*. For example:
.. code-block:: python
:linenos:
class Algorithm:
class Algorithm:
def __init__(self):
self.models = None
def __init__(self):
self.models = None
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
group = inputs.groupOf('in2')
while group.hasMoreData():
group.next() #synchronously advances the data
self.models.append(group['in2'].data.value)
def process(self, inputs, outputs):
# N.B.: this will be called for every unit in `in'
# Processes the current input in `in', apply the model/models
out = magical_processing(inputs['in'].data.value, self.models)
# Loads the "model" data at the beginning, once
if self.models is None:
self.models = []
group = inputs.groupOf('in2')
while group.hasMoreData():
group.next() #synchronously advances the data
self.models.append(group['in2'].data.value)
# Writes the output
outputs.write({'value': out})
# Processes the current input in `in', apply the model/models
out = magical_processing(inputs['in'].data.value, self.models)
return True
# Writes the output
outputs.write({'value': out})
return True
.. code-block:: c++
:linenos:
class Algorithm: public IAlgorithm
{
public:
virtual bool process(const InputList& inputs, const OutputList& outputs)
{
// N.B.: this will be called for every unit in `in'
// Loads the "model" data at the beginning, once
if (models.empty())
{
auto group = inputs->groupOf("in2");
while (group->hasMoreData())
{
group->next(); // synchronously advances the data
auto model = group["in2"]->data<user::model_1>();
models.push_back(*model);
}
}
// Processes the current input in `in', apply the model/models
auto out = magical_processing(inputs["in"]->data<user::format_1>()->value, models);
// Writes the output
auto result = new user::other_1();
result->value = out;
outputs["out"]->write(result);
return true;
}
public:
std::vector<user::model_1> models;
};
In practice, encoding your algorithms using *groups* instead of looping over
......@@ -353,7 +531,7 @@ image below.
There are two types of algorithm in the editor: Analyzer, and Splittable.
Analyzer algorithms are special algorithms where the purpose is to generate
statistics about the processing results (graphs, means, variances, etc.).
Usual, biometric data processing algorithms are of type Splittable, indicating
Usually, biometric data processing algorithms are of type Splittable, indicating
to the platform that these algorithms can be executed in a distributed fashion,
depending on the available computing resources.
......@@ -372,7 +550,7 @@ You should see a web-page similar to what is displayed below:
.. image:: img/algorithm_new.*
For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_.??????
For instructions on how to create an algorithm from scratch, please refer to the Section of `algorithm editor`_.
Edit an existing algorithm
......@@ -403,15 +581,20 @@ Please refer to the Section of `algorithm editor`_ for creating an algorithm.
Editor
------
To create an algorithm, there are six sections which are:
To create an algorithm, there are seven sections which are:
* Name: the name of algorithm.
* Algorithm type: Analyzer or Splittable.
* Language: The language used to implement the algorithm (Python or C++).
* Documentation: This is used to describe your algorithm.
* Source code: The (Python) code implementing the algorithm.
* Inputs / Outputs: Define the properties of the Input and Output endpoints for this algorithm.
* Parameters: Define the configuration-parameters for the algorithm.
For Python-based algorithms only:
* Libraries: If there are functions in a library, you can add them for the algorithm to use.
* Source code: The (Python) code implementing the algorithm.
You should see a webpage similar to what is displayed below:
......@@ -440,6 +623,89 @@ your algorithm code, to help with your debugging.
very last 4 kilobytes of these streams is kept for user inspection.
.. _binary algorithms:
Implementing an algorithm in C++
--------------------------------
Prerequisite: Configure your command-line client
================================================
In order to ensure that your compiled algorithm will works on the |project| platform,
you must compile it using our docker image called *beats/client*. Once downloaded,
you'll need to configure the command-line tool to access your account on the |project|
platform:
.. code-block:: bash
$ docker run -ti beats/client:0.1.5 bash
/# cd home
/home# beat config set user <your_user_name>
/home# beat config set token "<your_token>"
/home# beat config save
Here, ``<your_user_name>`` is your username on the |project| platform, and
``<your_token>`` can be retrieved from your settings page. Note that we use the
``/home`` folder to save everything, but feel free to use the one you want.
Algorithm compilation
=====================
To implement an algorithm in C++, follow the following steps:
1. Create the algorithm on the |project| platform, by selecting the C++ language.
Declare all the needed inputs, outputs and parameters.
2. Using the ``beat`` command-line tool, download the algorithm declaration from the
|project| platform (note that all the necessary data formats will be dowloaded too):
.. code-block:: bash
/home# beat algorithms pull <your_user_name>/<algorithm_name>/<version>
At this point, the folder ``/home/algorithms/<your_user_name>/<algorithm_name>/``
will contain the declaration of your algorithm in JSON format, and ``/home/dataformats/``
will contain the declaration files of the data formats used by the algorithm.
3. Generate the C++ files corresponding to the algorithm declaration:
.. code-block:: bash
/home# generate_cxx.py . <your_user_name>/<algorithm_name>/<version>
At this point, the folder ``/home/algorithms/<your_user_name>/<algorithm_name>/``
will contain a few new C++ files:
* one header/source file for each needed data format
* ``beat_setup.h`` and ``beat_setup.cpp``: used by the platform to learn everything
it needs to know about your algorithm
* ``algorithm.h`` and ``algorithm.cpp``: you will implement your algorithm in those
files
Feel free to add as many other files as you need for your implementation.
4. Implement your algorithm in ``algorithm.h`` and ``algorithm.cpp``
5. Compile your code as a shared library (an example CMake file was generated, you can
either modify it to add your own files or use another build system if you want). Note
that the |project| platform expect you to upload one and only one *shared library*, so
if your algorithm has any dependencies, you must link them statically inside the shared
library:
.. code-block:: bash
/home# cd algorithms/<your_user_name>/<algorithm_name>/
/home/algorithms/<your_user_name>/<algorithm_name># mkdir build
/home/algorithms/<your_user_name>/<algorithm_name># cd build
/home/algorithms/<your_user_name>/<algorithm_name>/build# cmake ..
/home/algorithms/<your_user_name>/<algorithm_name>/build# make
This will produce a file called ``<version>.so`` in the ``/home/algorithms/<your_user_name>/<algorithm_name>/`` folder.
6. Upload the shared library on the |project| platform, from the algorithm page.
Sharing an Algorithm
---------------------
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment