Skip to content
Snippets Groups Projects
Commit 884288e6 authored by André Anjos's avatar André Anjos :speech_balloon:
Browse files

Merge branch '70_zmq_architecture_documentation' into 'master'

ZMQ architecture documentation

Closes #70

See merge request !60
parents 774d012f b81b7934
No related branches found
No related tags found
1 merge request!60ZMQ architecture documentation
Pipeline #28141 passed
......@@ -267,170 +267,59 @@ In the remainder of this section, we describe the various commands, which are
supported by this communication protocol.
Command: has-more-data (hmd)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Command: information (ifo)
~~~~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure, whether there is more input data in a
given input or not. The format of this command is:
This command asks the infrastructure to return information about the remote
data sources queried. The format of this command is:
.. code-block:: text
"hmd channel [name]"
where ``name`` refers to the input name and is optional.
The BEAT infrastructure will answer by writing the following into the input
pipe.
.. code-block:: text
"boolean"
where `boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False.
Command: is-dataunit-done (idd)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure, whether the current chunk of data is
going to change or not, next time the current data index is increased. The
format of this command is:
.. code-block:: text
"idd channel name"
"ifo name\n"
The infrastructure will answer by writing the following into the input pipe.
.. code-block:: text
"boolean"
where `boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False.
Command: next (nxt)
~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure to provide the next data on a given
channel and/or input. If no input ``name`` is provided, then the infrastructure
will return the data on all inputs of the given ``channel``. The format of this
command is:
.. code-block:: text
"nxt channel [name]"
where ``name`` refers to the input name and is optional.
If an input ``name`` is provided, the infrastructure will answer by writing the
following into the input pipe:
.. code-block:: text
"dat N name1 <bin1> .. nameN-1 <binN-1>"
where ``N`` refers to the number of data chunks in the reply, ``name`` is the
name of a given channel, and ``<bin>`` corresponds to the binary representation
of the data contents. The binary data format uses the same format for disk
storage used by the infrastructure so as to optimize I/O performance. Our
reference backend implementation at `beat.backend.python`_ contains
implementation details about the binary data format. As a backend developer,
you must ensure your backend is fully capable of **correctly** interpreting the
contents of the binary stream given the data formats associated with each input
and output of the algorithm.
Command: write (wrt)
~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure to write data on a given output. The
format of this command is:
.. code-block:: text
"wrt name <bin>"
The ``name`` identifies the output on which to write the data. The message
`<bin>` is the raw data to write on the output, pre-encoded in the same binary
data format as the one used for the input. The contents of the binary stream
sent to the infrastructure will be checked again before being written to disk.
In case of problems, a system error will be issued and the processing will
stop. It is the task of the backend developer to insure conformity in order to
avoid such errors.
The infrastructure acknowledges with:
.. code-block:: text
"ack"
"X"
"Start0"
"End0"
...
"StartX-1"
"EndX-1"
In case all is OK or with the keyword ``err``, if there was an error
interpreting the binary data back. In such cases, the UP is expected to stop
processing and issue a ``done`` command, waiting to be gracefully stopped by
the infrastructure. If the UP does not issue the ``done`` command next, the UP
is forcibly terminated with a operating system level signal.
where `X` is the length of the data source and the `StartX` and `EndX` are the
start and end indexes available through that data source.
Command: is-data-missing (idm)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Command: get data (get)
~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure whether there is missing data on a given
output, by looking at the synchronization information. The format of this
command is:
.. code-block:: text
"idm name"
The infrastructure will answer by writing the following into the input pipe.
This command asks the infrastructure to return the data at the given index.
The format of this command is:
.. code-block:: text
"boolean"
where `boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False.
Command: output-is-connected (oic)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure, whether a given output is connected or
not. The format of this command is:
.. code-block:: text
"get X"
"ict name\n"
where X is the index of the data in the data source.
The infrastructure will answer by writing the following into the input pipe.
.. code-block:: text
"boolean"
"StartX"
"EndX"
"data"
where `boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False.
where `StartX` and `EndX` and the start and end indexes in the data sources and
`data` is the packed data that the data sources provides.
Command: done (don)
......
......@@ -32,8 +32,15 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. #
# #
###################################################################################
import time
import os
import pkg_resources
import sphinx_rtd_theme
# For inter-documentation mapping:
from bob.extension.utils import link_documentation, load_requirements
# -- General configuration -----------------------------------------------------
......@@ -54,7 +61,6 @@ extensions = [
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx.ext.mathjax",
#'matplotlib.sphinxext.plot_directive'
]
# Be picky about warnings
......@@ -104,7 +110,6 @@ master_doc = "index"
# General information about the project.
project = u"beat.core"
import time
copyright = u"%s, Idiap Research Institute" % time.strftime("%Y")
......@@ -164,7 +169,6 @@ owner = [u"Idiap Research Institute"]
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
import sphinx_rtd_theme
html_theme = "sphinx_rtd_theme"
......@@ -258,16 +262,13 @@ autoclass_content = "class"
autodoc_member_order = "bysource"
autodoc_default_flags = ["members", "undoc-members", "show-inheritance"]
if not "BOB_DOCUMENTATION_SERVER" in os.environ:
if "BOB_DOCUMENTATION_SERVER" not in os.environ:
# notice we need to overwrite this for BEAT projects - defaults from Bob are
# not OK
os.environ[
"BOB_DOCUMENTATION_SERVER"
] = "https://www.idiap.ch/software/beat/docs/beat/%(name)s/%(version)s/|https://www.idiap.ch/software/beat/docs/beat/%(name)s/master/"
# For inter-documentation mapping:
from bob.extension.utils import link_documentation, load_requirements
sphinx_requirements = "extra-intersphinx.txt"
if os.path.exists(sphinx_requirements):
intersphinx_mapping = link_documentation(
......
......@@ -46,6 +46,7 @@ This package provides the core components of BEAT ecosystem. These core componen
backend_api
develop
api
zmq_architecture
Indices and tables
......
......@@ -43,6 +43,8 @@
.. _python 2.7: http://www.python.org
.. _zero message queue: http://zeromq.org
.. _zmq: http://zeromq.org
.. _ZeroMQ book: http://shop.oreilly.com/product/0636920026136.do
.. _Majordomo Protocol: https://rfc.zeromq.org/spec:18/MDP/
.. _language bindings: http://zeromq.org/bindings:_start
.. _python bindings: http://zeromq.org/bindings:python
.. _markdown: http://daringfireball.net/projects/markdown/
......
.. vim: set fileencoding=utf-8 :
.. Copyright (c) 2019 Idiap Research Institute, http://www.idiap.ch/ ..
.. Contact: beat.support@idiap.ch ..
.. ..
.. This file is part of the beat.backend.python module of the BEAT platform. ..
.. ..
.. Redistribution and use in source and binary forms, with or without
.. modification, are permitted provided that the following conditions are met:
.. 1. Redistributions of source code must retain the above copyright notice, this
.. list of conditions and the following disclaimer.
.. 2. Redistributions in binary form must reproduce the above copyright notice,
.. this list of conditions and the following disclaimer in the documentation
.. and/or other materials provided with the distribution.
.. 3. Neither the name of the copyright holder nor the names of its contributors
.. may be used to endorse or promote products derived from this software without
.. specific prior written permission.
.. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
.. ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
.. WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
.. DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
.. FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
.. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
.. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
.. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
.. OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.. _zmq_architecture:
==================================
ZMQ Architecture for task handling
==================================
Introduction
------------
The ZMQ architecture implemented in beat.core is based on the `Majordomo
Protocol`_ as described in the `ZeroMQ book`_.
There are however some subtle differences:
- We have one client: the scheduler
- We have unique workers
- We currently don't have "system commands"
Due to these differences and their implementation, the protocol has been
renamed: "BEAT Computation Protocol" or BCP for short.
The system is based on these three components:
- The client
- The broker
- The worker(s)
In BEAT, the client will be the scheduler which will send the tasks to the
broker which will be responsible for forwarding them to the appropriate worker
requested by the scheduler.
Once the task has been completed, the worker will send back a message to the
scheduler through the broker.
The whole messaging system is asynchronous except when starting an actual task.
The worker will send back a confirmation as soon as the runner was properly
started.
Why this design ?
-----------------
The original design was a bit simpler:
- One scheduler
- Many workers
The scheduler was responsible for both task scheduling and worker communication
handling. One issue that arose from time to time was that with very low volume
of network activity, the connection between one or more workers and the
scheduler would get cut and nobody would notice. The result was that new tasks
would be sent but silently dropped and thus experiment would stay in a running
state while not doing anything. And if canceled, the state would stay in
canceling as again the command would be silently dropped.
Thus the rational behind choosing this new design was to avoid these connection
loss and therefore platform paralysis.
Now, the broker and the workers implement a bidirectional heartbeat. This has a
twofold benefit:
- The heartbeat itself should generate enough network activity to avoid the
connection to be cut.
- If a worker goes missing, it will be detected by the broker that will act as
configured to.
BCP Schema
----------
The figure below shows how the system is working.
::
--------------
| |
| client |
| |
--------------
|
--------------
| |
| broker |
| |
--------------
| | | |
| | | |
/----------------------/ | | \-----------------------\
| | | |
| /-------/ \-------\ |
| | | |
--------------- --------------- --------------- ---------------
| | | | | | | |
| worker1 | | worker2 | | worker3 | | worker4 |
| | | | | | | |
--------------- --------------- --------------- ---------------
.. include:: links.rst
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment