Commit 4097e8e4 authored by Zohreh MOSTAANI's avatar Zohreh MOSTAANI

[web][doc] removing extra info from and modifying experiments, toolchains and dataformats

parent d1a4f35d
......@@ -30,33 +30,12 @@
Data formats specify the transmitted data between the blocks of a toolchain.
They describe the format of the data blocks that circulate between algorithms
and formalize the interaction between algorithms and data sets, so they can
communicate in an orderly manner. Inputs and outputs of the algorithms and
datasets **must** be formally declared. Two algorithms that communicate
directly must produce and consume the **same** type of data objects.
communicate in an orderly manner. For more detailed information see :ref:`beat-system-dataformats`.
A data format specifies a list of typed fields. An algorithm or data set
generating a block of data (via one of its outputs) must fill all the fields
declared in that data format. An algorithm consuming a block of data (via one
of its inputs) must not expect the presence of any other field than the ones
defined by the data format.
The |project| platform provides a number of pre-defined formats to facilitate
experiments. They are implemented in an extensible way. This allows users to
define their own formats, based on existing ones, while keeping some level of
compatibility with other existing algorithms.
.. note:: **Naming Convention**
.. note::
Data formats are named using three values joined by a ``/`` (slash)
operator:
* **username**: indicates the author of the dataformat
* **name**: an identifier for the object
* **version**: an integer (starting from one), indicating the version of
the object
Each tuple of these three components defines a *unique* data format name
inside the platform. For example, ``system/float/1``.
operator. The first value is the **username**.
The ``system`` user, provides a number of `pre-defined formats such as
integers, booleans, floats and arrays
......@@ -74,218 +53,6 @@ to that data format, like shown on the image below:
.. image:: img/system-defined-info.*
A data format is declared as a JSON_ object with several fields. For example,
the following declaration could represent the coordinates of a bounding box in
a video frame:
.. code-block:: javascript
{
"value": [
0,
{
"frame_id": "uint64",
"height": "int32",
"width": "int32",
"top-left-y": "int32",
"top-left-x": "int32"
}
]
}
The special field ``#description`` can be used to store a short description of
the declared data format. It is ignored in practice and only used for
informational purposes. Each field in a declaration has a well-defined type, as
explained next.
Simple type (primitive object)
==============================
The |project| platform supports the following *primitive* types.
* signed integers: ``int8``, ``int16``, ``int32``, ``int64``
* unsigned integers: ``uint8``, ``uint16``, ``uint32``, ``uint64``
* floating-point numbers: ``float32``, ``float64``
* complex numbers: ``cpmplex64``, ``complex128``
* a boolean value: ``bool``
* a string: ``string``
Aggregation
===========
A data format can be composed of complex objects formed by aggregating other
*declared* types. For example, we could define the positions of the eyes of a
face in an image like this:
.. code-block:: javascript
{
"left": "system/coordinates/1",
"right": "system/coordinates/1"
}
Arrays
======
A field can be a multi-dimensional array of any other type. Here ``array1`` is
declared as a one dimensional array of 10 32-bit signed integers (``int32``)
and ``array2`` as a two-dimensional array with 10 rows and 5 columns of
booleans:
.. code-block:: javascript
{
"array1": [10, "int32"],
"array2": [10, 5, "bool"]
}
An array can have up to 32 dimensions. It can also contain objects (either
declared inline, or using another data format). It is also possible to declare
an array without specifying the number of elements in some of its dimensions,
by using a size of 0 (zero). For example, here is a two-dimensional grayscale
image of unspecified size:
.. code-block:: javascript
{
"value": [0, 0, "uint8"]
}
You may also fix the some of dimensions extent. For example, here is a
possible representation for a three-dimensional RGB image of unspecified size
(width and height):
.. code-block:: javascript
{
"value": [3, 0, 0, "uint8"],
}
In this representation, the image must have 3 color planes (no more, no less).
The width and the height are unspecified.
.. note:: **Unspecified Dimensions**
Because of the way the |project| platform stores data, not all combinations
of unspecified extents will work for arrays. As a rule of thumb, only the
last dimensions may remain unspecified. These are valid:
.. code-block:: javascript
{
"value1": [0, "float64"],
"value2": [3, 0, "float64"],
"value3": [3, 2, 0, "float64"],
"value4": [3, 0, 0, "float64"],
"value5": [0, 0, 0, "float64"]
}
Whereas this would be invalid declarations for arrays:
.. code-block:: javascript
{
"value": [0, 3, "float64"],
"value": [4, 0, 3, "float64"]
}
Object Representation
---------------------
As you'll read in our :ref:`Algorithms` section, data is available via our
backend API to the user algorithms. For example, in Python, the |project|
platform uses NumPy_ data types to pass data to and from algorithms. For
example, when the algorithm reads data for which the format is defined like:
.. code-block:: javascript
{
"value": "float64"
}
The field ``value`` of an instance named ``object`` of this format is
accessible as ``object.value`` and will have the type ``numpy.float64``. If the
format would be, instead:
.. code-block:: javascript
{
"value": [0, 0, "float64"]
}
It would be accessed in the same way (i.e., via ``object.value``), except that
the type would be ``numpy.ndarray`` and ``object.value.dtype`` would be
``numpy.float64``. Naturally, objects which are instances of a format like
this:
.. code-block:: javascript
{
"x": "int32",
"y": "int32"
}
Could be accessed like ``object.x``, for the ``x`` value and ``object.y``, for
the ``y`` value. The type of ``object.x`` and ``object.y`` would be
``numpy.int32``.
Conversely, if you *write* output data in an algorithm, the type of the output
objects are checked for compatibility with respect to the value declared on the
format. For example, this would be a valid use of the format above, in Python:
.. code-block:: python
import numpy
class Algorithm:
def process(self, inputs, outputs):
# read data
# prepares object to be written
myobj = {"x": numpy.int32(4), "y": numpy.int32(6)}
# write it
outputs["point"].write(myobj) #OK!
If you try to write into an object that is supposed to be of type ``int32``, a
``float64`` object, an exception will be raised. For example:
.. code-block:: python
import numpy
class Algorithm:
def process(self, inputs, outputs):
# read data
# prepares object to be written
myobj = {"x": numpy.int32(4), "y": numpy.float64(3.14)}
# write it
outputs["point"].write(myobj) #Error: cannot downcast!
The bottomline is: **all type casting in the platform must be explicit**. It
will not automatically downcast or upcast objects for you as to avoid
unexpected precision loss leading to errors.
Editing Operations
......
......@@ -178,7 +178,7 @@ toolchain:
.. note::
As it was mentioned in :ref:`beat-system-experiments-blocks`, BEAT checks that connected datasets, algorithms and
As mentioned in :ref:`beat-system-experiments-blocks`, BEAT checks that connected datasets, algorithms and
analyzers produce or consume data in the right format. It only presents
options which are *compatible* with adjacent blocks.
......
......@@ -28,7 +28,7 @@
============
Toolchains are the backbone of experiments within the |project| platform. They
determine the data flow for experiments in the |project| platform.
determine the data flow for experiments in the |project| platform. For more information about toolchanis see :ref:`beat-system-toolchains`.
You can see an example toolchain for a toy-`eigenface`_ system on the image
below.
......@@ -67,56 +67,6 @@ an experiment with this workflow:
the top-right of each block. For example, the block called ``scoring`` is
said to be *synchronized with* the ``probes`` channel.
When a block is synchronized with a channel, it means the platform will
iterate on that channel contents when calling the user algorithm on that
block. For example, the block ``linear_(machine)_training``, on the top
left of the image, following the data set block ``train``, is synchronized
with that dataset block. Therefore, it will be executed as many times as
the dataset block outputs objects through its ``image`` output. I.e., the
``linear_machine_training`` block *loops* or *iterates* over the ``train``
data.
Notice, the toolchain does not define what an ``image`` will be. That is
defined by the concrete dataset implementation chosen by the user when an
experiment is constructed. The block ``linear_machine_training`` also does not
define which type of images it can input. That is defined by the algorithm
chosen by the user when an experiment is constructed. For example, if the user
chooses a data set that outputs objects with the data format
``system/array_2d_uint8/1`` objects then, an algorithm that can input those
types of objects must be chosen for the block following that dataset. Don't
worry! The |project| platform experiment configuration will check that for you!
The order of execution can also be abstracted from this diagram. We sketched
that for you in this overlay:
.. image:: img/eigenfaces-ordered.*
The backend processing farm will first "push" the data out of the datasets. It
will then run the code on the block ``linear_machine_training`` (numbered 2).
The blocks ``template_builder`` and ``probe_builder`` are then ready to run.
The platform may choose to run them at the *same time* if enough computing
resources are available. The ``scoring`` block runs by fourth. The last block
to be executed is the ``analysis`` block. In the figure above, you can also see
marked what is the channel data in which the block *loops* on. When you read
about :ref:`Algorithms`, you'll understand, concretely, how synchronization is
handled on algorithm code.
.. note:: **Naming Convention**
Toolchains are named using three values joined by a ``/`` (slash) operator:
* **username**: indicates the author of the toolchain
* **name**: indicates the name of the toolchain
* **version**: indicates the version (integer starting from ``1``) of the
toolchain
Each tuple of these three components defines a *unique* toolchain name
inside the platform. For a grasp, you may browse `publicly available
toolchains`_.
The *Toolchains* tab
--------------------
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment