Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
beat.core
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
beat
beat.core
Commits
884288e6
Commit
884288e6
authored
6 years ago
by
André Anjos
Browse files
Options
Downloads
Plain Diff
Merge branch '70_zmq_architecture_documentation' into 'master'
ZMQ architecture documentation Closes
#70
See merge request
!60
parents
774d012f
b81b7934
No related branches found
No related tags found
1 merge request
!60
ZMQ architecture documentation
Pipeline
#28141
passed
6 years ago
Stage: build
Stage: deploy
Changes
5
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
doc/backend_api.rst
+24
-135
24 additions, 135 deletions
doc/backend_api.rst
doc/conf.py
+8
-7
8 additions, 7 deletions
doc/conf.py
doc/index.rst
+1
-0
1 addition, 0 deletions
doc/index.rst
doc/links.rst
+2
-0
2 additions, 0 deletions
doc/links.rst
doc/zmq_architecture.rst
+129
-0
129 additions, 0 deletions
doc/zmq_architecture.rst
with
164 additions
and
142 deletions
doc/backend_api.rst
+
24
−
135
View file @
884288e6
...
...
@@ -267,170 +267,59 @@ In the remainder of this section, we describe the various commands, which are
supported by this communication protocol.
Command:
has-more-data (hmd
)
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~
Command:
information (ifo
)
~~~~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure
, whether there is more input data in a
given input or not
. The format of this command is:
This command asks the infrastructure
to return information about the remote
data sources queried
. The format of this command is:
.. code-block:: text
"hmd channel [name]"
where ``name`` refers to the input name and is optional.
The BEAT infrastructure will answer by writing the following into the input
pipe.
.. code-block:: text
"boolean"
where `boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False.
Command: is-dataunit-done (idd)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure, whether the current chunk of data is
going to change or not, next time the current data index is increased. The
format of this command is:
.. code-block:: text
"idd channel name"
"ifo name\n"
The infrastructure will answer by writing the following into the input pipe.
.. code-block:: text
"boolean"
where `boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False.
Command: next (nxt)
~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure to provide the next data on a given
channel and/or input. If no input ``name`` is provided, then the infrastructure
will return the data on all inputs of the given ``channel``. The format of this
command is:
.. code-block:: text
"nxt channel [name]"
where ``name`` refers to the input name and is optional.
If an input ``name`` is provided, the infrastructure will answer by writing the
following into the input pipe:
.. code-block:: text
"dat N name1 <bin1> .. nameN-1 <binN-1>"
where ``N`` refers to the number of data chunks in the reply, ``name`` is the
name of a given channel, and ``<bin>`` corresponds to the binary representation
of the data contents. The binary data format uses the same format for disk
storage used by the infrastructure so as to optimize I/O performance. Our
reference backend implementation at `beat.backend.python`_ contains
implementation details about the binary data format. As a backend developer,
you must ensure your backend is fully capable of **correctly** interpreting the
contents of the binary stream given the data formats associated with each input
and output of the algorithm.
Command: write (wrt)
~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure to write data on a given output. The
format of this command is:
.. code-block:: text
"wrt name <bin>"
The ``name`` identifies the output on which to write the data. The message
`<bin>` is the raw data to write on the output, pre-encoded in the same binary
data format as the one used for the input. The contents of the binary stream
sent to the infrastructure will be checked again before being written to disk.
In case of problems, a system error will be issued and the processing will
stop. It is the task of the backend developer to insure conformity in order to
avoid such errors.
The infrastructure acknowledges with:
.. code-block:: text
"ack"
"X"
"Start0"
"End0"
...
"StartX-1"
"EndX-1"
In case all is OK or with the keyword ``err``, if there was an error
interpreting the binary data back. In such cases, the UP is expected to stop
processing and issue a ``done`` command, waiting to be gracefully stopped by
the infrastructure. If the UP does not issue the ``done`` command next, the UP
is forcibly terminated with a operating system level signal.
where `X` is the length of the data source and the `StartX` and `EndX` are the
start and end indexes available through that data source.
Command:
is-data-missing (idm
)
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~
Command:
get data (get
)
~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure whether there is missing data on a given
output, by looking at the synchronization information. The format of this
command is:
.. code-block:: text
"idm name"
The infrastructure will answer by writing the following into the input pipe.
This command asks the infrastructure to return the data at the given index.
The format of this command is:
.. code-block:: text
"boolean"
where `boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False.
Command: output-is-connected (oic)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(User Process -> Infrastructure)
This command asks the infrastructure, whether a given output is connected or
not. The format of this command is:
.. code-block:: text
"get X"
"ict name\n"
where X is the index of the data in the data source.
The infrastructure will answer by writing the following into the input pipe.
.. code-block:: text
"boolean"
"StartX"
"EndX"
"data"
where `
boolean` is a boolean indicating whether there is more data to process
or not. The values may be ``tru``, for True and ``fal``, for False
.
where `
StartX` and `EndX` and the start and end indexes in the data sources and
`data` is the packed data that the data sources provides
.
Command: done (don)
...
...
This diff is collapsed.
Click to expand it.
doc/conf.py
+
8
−
7
View file @
884288e6
...
...
@@ -32,8 +32,15 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. #
# #
###################################################################################
import
time
import
os
import
pkg_resources
import
sphinx_rtd_theme
# For inter-documentation mapping:
from
bob.extension.utils
import
link_documentation
,
load_requirements
# -- General configuration -----------------------------------------------------
...
...
@@ -54,7 +61,6 @@ extensions = [
"
sphinx.ext.napoleon
"
,
"
sphinx.ext.viewcode
"
,
"
sphinx.ext.mathjax
"
,
#'matplotlib.sphinxext.plot_directive'
]
# Be picky about warnings
...
...
@@ -104,7 +110,6 @@ master_doc = "index"
# General information about the project.
project
=
u
"
beat.core
"
import
time
copyright
=
u
"
%s, Idiap Research Institute
"
%
time
.
strftime
(
"
%Y
"
)
...
...
@@ -164,7 +169,6 @@ owner = [u"Idiap Research Institute"]
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
import
sphinx_rtd_theme
html_theme
=
"
sphinx_rtd_theme
"
...
...
@@ -258,16 +262,13 @@ autoclass_content = "class"
autodoc_member_order
=
"
bysource
"
autodoc_default_flags
=
[
"
members
"
,
"
undoc-members
"
,
"
show-inheritance
"
]
if
not
"
BOB_DOCUMENTATION_SERVER
"
in
os
.
environ
:
if
"
BOB_DOCUMENTATION_SERVER
"
not
in
os
.
environ
:
# notice we need to overwrite this for BEAT projects - defaults from Bob are
# not OK
os
.
environ
[
"
BOB_DOCUMENTATION_SERVER
"
]
=
"
https://www.idiap.ch/software/beat/docs/beat/%(name)s/%(version)s/|https://www.idiap.ch/software/beat/docs/beat/%(name)s/master/
"
# For inter-documentation mapping:
from
bob.extension.utils
import
link_documentation
,
load_requirements
sphinx_requirements
=
"
extra-intersphinx.txt
"
if
os
.
path
.
exists
(
sphinx_requirements
):
intersphinx_mapping
=
link_documentation
(
...
...
This diff is collapsed.
Click to expand it.
doc/index.rst
+
1
−
0
View file @
884288e6
...
...
@@ -46,6 +46,7 @@ This package provides the core components of BEAT ecosystem. These core componen
backend_api
develop
api
zmq_architecture
Indices and tables
...
...
This diff is collapsed.
Click to expand it.
doc/links.rst
+
2
−
0
View file @
884288e6
...
...
@@ -43,6 +43,8 @@
.. _python 2.7: http://www.python.org
.. _zero message queue: http://zeromq.org
.. _zmq: http://zeromq.org
.. _ZeroMQ book: http://shop.oreilly.com/product/0636920026136.do
.. _Majordomo Protocol: https://rfc.zeromq.org/spec:18/MDP/
.. _language bindings: http://zeromq.org/bindings:_start
.. _python bindings: http://zeromq.org/bindings:python
.. _markdown: http://daringfireball.net/projects/markdown/
...
...
This diff is collapsed.
Click to expand it.
doc/zmq_architecture.rst
0 → 100644
+
129
−
0
View file @
884288e6
.. vim: set fileencoding=utf-8 :
.. Copyright (c) 2019 Idiap Research Institute, http://www.idiap.ch/ ..
.. Contact: beat.support@idiap.ch ..
.. ..
.. This file is part of the beat.backend.python module of the BEAT platform. ..
.. ..
.. Redistribution and use in source and binary forms, with or without
.. modification, are permitted provided that the following conditions are met:
.. 1. Redistributions of source code must retain the above copyright notice, this
.. list of conditions and the following disclaimer.
.. 2. Redistributions in binary form must reproduce the above copyright notice,
.. this list of conditions and the following disclaimer in the documentation
.. and/or other materials provided with the distribution.
.. 3. Neither the name of the copyright holder nor the names of its contributors
.. may be used to endorse or promote products derived from this software without
.. specific prior written permission.
.. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
.. ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
.. WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
.. DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
.. FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
.. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
.. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
.. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
.. OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.. _zmq_architecture:
==================================
ZMQ Architecture for task handling
==================================
Introduction
------------
The ZMQ architecture implemented in beat.core is based on the `Majordomo
Protocol`_ as described in the `ZeroMQ book`_.
There are however some subtle differences:
- We have one client: the scheduler
- We have unique workers
- We currently don't have "system commands"
Due to these differences and their implementation, the protocol has been
renamed: "BEAT Computation Protocol" or BCP for short.
The system is based on these three components:
- The client
- The broker
- The worker(s)
In BEAT, the client will be the scheduler which will send the tasks to the
broker which will be responsible for forwarding them to the appropriate worker
requested by the scheduler.
Once the task has been completed, the worker will send back a message to the
scheduler through the broker.
The whole messaging system is asynchronous except when starting an actual task.
The worker will send back a confirmation as soon as the runner was properly
started.
Why this design ?
-----------------
The original design was a bit simpler:
- One scheduler
- Many workers
The scheduler was responsible for both task scheduling and worker communication
handling. One issue that arose from time to time was that with very low volume
of network activity, the connection between one or more workers and the
scheduler would get cut and nobody would notice. The result was that new tasks
would be sent but silently dropped and thus experiment would stay in a running
state while not doing anything. And if canceled, the state would stay in
canceling as again the command would be silently dropped.
Thus the rational behind choosing this new design was to avoid these connection
loss and therefore platform paralysis.
Now, the broker and the workers implement a bidirectional heartbeat. This has a
twofold benefit:
- The heartbeat itself should generate enough network activity to avoid the
connection to be cut.
- If a worker goes missing, it will be detected by the broker that will act as
configured to.
BCP Schema
----------
The figure below shows how the system is working.
::
--------------
| |
| client |
| |
--------------
|
--------------
| |
| broker |
| |
--------------
| | | |
| | | |
/----------------------/ | | \-----------------------\
| | | |
| /-------/ \-------\ |
| | | |
--------------- --------------- --------------- ---------------
| | | | | | | |
| worker1 | | worker2 | | worker3 | | worker4 |
| | | | | | | |
--------------- --------------- --------------- ---------------
.. include:: links.rst
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment