Commit 826288fc authored by Jaden's avatar Jaden
Browse files

rewrite exp docs and add section on executors

parent 505d7d66
......@@ -47,60 +47,47 @@ The commands available for experiments are:
How to run an experiment?
.........................
The ``run_toolchain.py`` script can be used to perform the experiment defined
in a toolchain. It is the ideal way to debug an algorithm, since this script
doesn't try to do any advanced trick like the Scheduler (multi-processing,
optimizations, sandboxing, ...).
For example, we execute a simple toolchain with two processing blocks (found in
``src/beat.core/beat/core/test/toolchains/integers_addition2.json``):
.. code-block:: sh
$ ./bin/run_toolchain.py --prefix=src/beat.core/beat/core/test/ integers_addition2
Processing block 'addition1'...
Algorithm: sum
Inputs:
- a (single_integer): beat/src/beat.core/beat/core/test/databases/integers/output1.data
- b (single_integer): beat/src/beat.core/beat/core/test/databases/integers/output2.data
Outputs:
- sum (single_integer): beat/src/beat.core/beat/core/test/cache/addition1/sum.data
Processing block 'addition2'...
Algorithm: sum
Inputs:
- a (single_integer): beat/src/beat.core/beat/core/test/cache/addition1/sum.data
- b (single_integer): beat/src/beat.core/beat/core/test/databases/integers/output3.data
Outputs:
- sum (single_integer): beat/src/beat.core/beat/core/test/cache/addition2/sum.data
DONE
Results available at:
- addition2.sum: beat/src/beat.core/beat/core/test/cache/addition2/sum.data
The command ``beat experiments run <name>`` can be used to run the experiment
defined in an experiment definition file. It is the ideal way to debug an
experiment, since by default ``beat`` will use the local executor, which provides
a simple environment with PDB support without advanced features
(multi-processing, optimizations, sandboxing, multiple environments, etc.).
Here, the ``--prefix`` option is used to tell the scripts where all our data
formats, toolchains and algorithms are located, and ``integers_addition2`` is
the name of the toolchain we want to check (note that we don't add the
``.json`` extension, as this is the name of the toolchain, not the filename!).
formats, toolchains and algorithms are located. This option can be set
in your configuration file (see ``beat config``).
This script displays for each block the files containing the data to use as
This command displays for each block the files containing the data to use as
input, and the files generated by the outputs of the block.
By default, files are generated in binary format, but you can force them to be
in a more readable JSON format with the ``--json`` flag:
.. code-block:: sh
$ ./bin/run_toolchain.py --prefix=src/beat.core/beat/core/test/ --json integers_addition2
The default behavior is to not regenerate data files already present in the
cache. You can force the script to not take the content of the cache into
account with the ``--force`` flag:
.. code-block:: sh
$ ./bin/run_toolchain.py --prefix=src/beat.core/beat/core/test/ --force integers_addition2
account with the ``--force`` flag.
Executors
=========
"Executors" are modules that execute each block in an experiment. On the BEAT
platform, there is only the one executor, which executes the experiment using
Docker containers with advanced scheduling and security features. When
developing using ``beat.cmdline``, however, you have the option of using either
the BEAT platform's executor, behind the ``--docker`` flag, or the "local"
executor, provided in this project. The local executor, as explained above, is
much simpler, aimed at providing a smooth development experience. However,
there are two important tradeoffs:
- Lower performance for non-trivial experiments, as it runs everything
synchronously in one process on the CPU.
- No multiple environments, as the Python environment that built
``beat.cmdline`` is used. This means that many BEAT experiments that
rely on different/multiple environments will not work.
If you want to use the local executor, pay attention to the python environment
used to call `buildout` in your copy of ``beat.cmdline``. The suggested way
to use Bob libraries while developing on the local executor is to use install
``zc.buildout`` in a Python2.7 conda environment with Bob installed. Using
the ``buildout`` command from the environment will make the entire environment
available to ``beat.cmdline`` even when the environment is not active.
.. _beat-core-experiments-displaydata:
......@@ -108,259 +95,7 @@ account with the ``--force`` flag:
How to examine the content of a data file?
..........................................
The ``display_data.py`` script can be used to examine the content of a data
file generated by the execution of a toolchain.
For example, we look at the content of one of the data file used by the tests
of beat.core (found in
``src/beat.core/beat/core/test/data/single_integer.data``):
.. code-block:: sh
The ``beat cache`` collection of commands interact with the cache:
$ ./bin/display_data.py --prefix=src/beat.core/beat/core/test data/single_integer_delayed.data
Data format: single_integer
----------------------------------------------
Indexes: 0-1
{
"value": 0
}
----------------------------------------------
Indexes: 2-3
{
"value": 1
}
----------------------------------------------
Indexes: 4-5
{
"value": 2
}
----------------------------------------------
Indexes: 6-7
{
"value": 3
}
----------------------------------------------
Indexes: 8-9
{
"value": 4
}
----------------------------------------------
Indexes: 10-11
{
"value": 5
}
----------------------------------------------
Indexes: 12-13
{
"value": 6
}
----------------------------------------------
Indexes: 14-15
{
"value": 7
}
----------------------------------------------
Indexes: 16-17
{
"value": 8
}
----------------------------------------------
Indexes: 18-19
{
"value": 9
}
The script tells us that the data correspond to the data format
``single_integer``, and displays each entry (with the indexes it correspond to)
in a JSON representation.
.. _beat-core-experiments-example:
Putting it all together: a complete example
...........................................
.. _beat-core-experiments-example-figure:
.. figure:: img/toolchain-example.*
A complete toolchain that train and test a face detector
The following example describes the toolchain visible at :num:`figure
#beat-core-toolchains-example-figure`, a complete toolchain that:
#. train a face detector on one set of images (*beat_face_dataset_train*)
#. validate it on another set of images (*beat_face_dataset_validation*)
#. test it on a third set of images (*beat_face_dataset_test*)
.. note::
This toolchain is still not considered as an executable one by the platform,
since it contains no mention of the algorithms that must be used in each
processing block.
.. code-block:: json
{
"databases": [ {
"name": "beat_face_dataset_train",
"outputs": {
"images": "image/rgb",
"faces": "coordinates_list"
}
},
{
"name": "beat_face_dataset_validation",
"outputs": {
"images": "image/rgb",
"faces": "coordinates_list"
}
},
{
"name": "beat_face_dataset_test",
"outputs": {
"images": "image/rgb",
"faces": "coordinates_list"
}
}
],
"blocks": [{
"name": "features_extractor_train",
"inputs": {
"images": "images/rgb"
},
"outputs": {
"features": "array/float"
}
},
{
"name": "face_model_builder",
"inputs": {
"features": "array/float",
"faces": "coordinates_list"
},
"outputs": {
"model": "face_model"
}
},
{
"name": "features_extractor_validation",
"inputs": {
"images": "images/rgb"
},
"outputs": {
"features": "array/float"
}
},
{
"name": "face_detector_validation",
"inputs": {
"model": "face_model",
"features": "array/float"
},
"outputs": {
"faces": "coordinates_list"
}
},
{
"name": "thresholder",
"inputs": {
"detected_faces": "coordinates_list",
"labelled_faces": "coordinates_list"
},
"outputs": {
"threshold": "float"
}
},
{
"name": "features_extractor_test",
"inputs": {
"images": "images/rgb"
},
"outputs": {
"features": "array/float"
}
},
{
"name": "face_detector_test",
"inputs": {
"model": "face_model",
"features": "array/float"
},
"outputs": {
"faces": "coordinates_list"
}
},
{
"name": "evaluator",
"inputs": {
"threshold": "float",
"detected_faces": "coordinates_list",
"labelled_faces": "coordinates_list"
},
"outputs": {
"score": "float"
}
}
],
"connections": [{
"from": "beat_face_dataset_train.images",
"to": "features_extractor_train.images"
},
{
"from": "features_extractor_train.features",
"to": "face_model_builder.features"
},
{
"from": "beat_face_dataset_train.faces",
"to": "face_model_builder.faces"
},
{
"from": "beat_face_dataset_validation.images",
"to": "features_extractor_validation.images"
},
{
"from": "face_model_builder.model",
"to": "face_detector_validation.model"
},
{
"from": "features_extractor_validation.features",
"to": "face_detector_validation.features"
},
{
"from": "face_detector_validation.faces",
"to": "thresholder.detected_faces"
},
{
"from": "beat_face_dataset_validation.faces",
"to": "thresholder.labelled_faces"
},
{
"from": "beat_face_dataset_test.images",
"to": "features_extractor_test.images"
},
{
"from": "features_extractor_test.features",
"to": "face_detector_test.features"
},
{
"from": "face_model_builder.model",
"to": "face_detector_test.model"
},
{
"from": "thresholder.threshold",
"to": "evaluator.threshold"
},
{
"from": "face_detector_test.faces",
"to": "evaluator.detected_faces"
},
{
"from": "beat_face_dataset_test.faces",
"to": "evaluator.labelled_faces"
}
],
"results": [
"thresholder.threshold",
"evaluator.score"
]
}
.. command-output:: ./bin/beat cache --help
:cwd: ..
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment