Newer
Older
.. vim: set fileencoding=utf-8 :
.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ ..
.. Contact: beat.support@idiap.ch ..
.. ..
.. This file is part of the beat.web module of the BEAT platform. ..
.. ..
.. Commercial License Usage ..
.. Licensees holding valid commercial BEAT licenses may use this file in ..
.. accordance with the terms contained in a written agreement between you ..
.. and Idiap. For further information contact tto@idiap.ch ..
.. ..
.. Alternatively, this file may be used under the terms of the GNU Affero ..
.. Public License version 3 as published by the Free Software and appearing ..
.. in the file LICENSE.AGPL included in the packaging of this file. ..
.. The BEAT platform is distributed in the hope that it will be useful, but ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE. ..
.. ..
.. You should have received a copy of the GNU Affero Public License along ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/. ..
.. _administratorguide-installation:
==============
Installation
==============
In this section, we provide basic instructions and fundamental ideas required
to deploy the BEAT platform. Depending on the deployment strategy (single
machine or distributed across several machines), the installation instructions
will of course differ. Nevertheless, configuring and installing a simple
platform instance remains reasonably easy.
The BEAT platform is written as a set of python packages. This package
(beat.web), in particular, constitutes the central deployment pillar of BEAT
platform instance. It uses as a base development library, a web framework
called Django_. If you are unfamiliar with this framework, but wishes to deploy
or develop the BEAT platform, it is recommended you familiarize yourself with
it.
To deploy a platform on a single machine, it is, hence, sufficient to install
``beat.web`` to get the full BEAT software stack installed. The recipe is as
follows::
$ # after downloading and extracting the beat.web package
$ python bootstrap-buildout.py
$ ./bin/buildout
These two commands should download and install all non-installed dependencies
and generate a fully operational test and development environment.
cpulimit has been superseded by the use of Docker
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
.. tip::
If you'd like to **speed-up** the installation, it is strongly advised you
prepare a preset virtual environment (see the virtualenv_ package) with all
required dependencies, so that ``./bin/buildout`` does not download and
installs all of them every time you cleanup. This technique should allow you
to quickly clean-up and re-start your working environment which is useful
during development.
In order to fetch currently needed dependencies, run::
$ ./bin/buildout #to setup once
$ ./bin/pip freeze > requirements.txt
Examine the file ``requirements.txt`` and remove packages you are either
developing locally (e.g., all that are under ``src``) or that you think you
don't need. The command ``pip freeze`` reports all installed packages and not
only those which are needed by your project. If the Python prompt you used
for bootstrapping already had a good set of packages installed, you may see
them there.
Once you have a satisfying ``requirements.txt`` file, you may proceed to
recreate a virtualenv_ you'll use for your development. Just call::
$ virtualenv ~/work/beat-env #--system-site-packages
To create the virtual environment. This new environment does not contain
system packages by default. You may override that by specifying
``--system-site-packages`` as suggested above. Then, install the required
packages on your new virtual environment::
$ ~/work/beat-env/bin/pip install -r requirements.txt
After that step is done, your virtual environment is ready for deployment.
You may now start from scratch to develop ``beat.web`` taking as base the
Python interpreter on your virtualenv_::
$ cd beat.web
$ git clean -fdx #full clean-up
$ ~/work/beat-env/bin/python bootstrap-buildout.py
$ ./bin/buildout
You'll realize the buildout step now takes considerably less time and you may
repeat this last step as much as needed. ``pip`` is a very flexible tool and
you may use it to manage the virtualenv_ installing and removing packages as
needed.
Documentation
-------------
The documentation project is divided in 3 parts. The user guide is the only one
which is automatically built as part of the ``buildout`` procedure. The API and
administrators guide need to be manually compiled if required.
To build the API documentation, just do::
$ ./bin/sphinx-apidoc --separate -d 2 --output=doc/api/api beat beat/web/*/migrations beat/web/*/tests
$ ./bin/sphinx-build doc/api html/api
To build the administrator guide, just do::
$ ./bin/sphinx-build doc/admin html/admin
The above commands will build the stated guides, in HTML format, and dump
results into your local directory ``html``. You may navigate then to that
directory and, with your preferred web browser, open the file ``index.html`` to
browse the available documentation.
The basic user guide which includes information for users of the platform, is
built automatically upon ``buildout``. If you wish to build it and place it
alongside the other guides, you may do it as well like this::
$ ./bin/sphinx-build doc/user html/user
After installation, it is possible to run a suite of unit tests to check for
the installation sanity. To do so, use::
$ ./bin/django test --settings=beat.web.settings.test -v 1
You may pass filtering criteria to just launch tests for a particular set of
``beat.web`` applications. For example, to run tests only concerning
``beat.web.toolchains``, run::
$ ./bin/django test --settings=beat.web.settings.test -v 1 beat.web.toolchains.tests
To measure coverage, you must set an environment variable for nose::
$ ./bin/coverage run --source='./beat/web' ./bin/django test --settings=beat.web.settings.test
$ ./bin/coverage report
Or, to generate an HTML report::
$ ./bin/coverate html
.. tip::
You may significatively speed-up your testing by re-using the same test
database from run to run. In order to do this, just specify the flag
``--keepdb`` when you run your tests::
$ ./bin/django test --settings=beat.web.settings.test -v 1 --keepdb
In this case, Django will create and keep a test database called
``test.sql3`` on your current directory. You may delete it when you're done.
End-to-End Testing
------------------
`Protractor <http://www.protractortest.org/#/>`_ is an e2e (end-to-end) testing tool for web apps. Protractor runs tests through Selenium using a real browser, and as such needs a headed environment and a compatible browser installed.
.. warning::
Protractor will open a new browser window in the foreground when it is started.
Setup
=====
There are two system dependencies to run Selenium:
- Java 8 must be available in your PATH
- If you want to run the testing in a GNOME environment, you need `GConf <https://projects.gnome.org/gconf/>`_
Download/update Protractor's dependencies into the local repository (Selenium & more):

Jaden Diefenbaugh
committed
./bin/webdriver-manager update
Running tests with the provided script
======================================
The ``protractor.sh`` script is a one-liner to run Protractor tests. It handles database creation/saving/restoring and manages the required local server processes. However, it assumes several things:
- It is being ran in the top directory of the ``beat.web`` repository
- The repository has already ran ``./bin/buildout`` successfully and with default development configuration
- Protractor's ``.conf`` file is ``./protractor-conf.js``
- No additional arguments need to be passed to ``webdriver-manager`` or Django ``runserver``
- Django uses ``./django.sql3`` as the database
- If ``./template.django.sql3`` does not exist, the default database generated by ``./bin/django install`` is sufficient for testing the basic tests. However, some tests will fail and it is suggested to provide a database with experiments that have been ran successfully.
Manual test running
===================
If the ``protractor.sh`` script won't work, one can test manually.
The ``webdriver-manager`` must be running while testing. To run tests using a local BEAT web server, you must have the BEAT web server up as well.
Starting the webdriver server
_____________________________
- Start the webdriver server in a separate shell (or append `` &`` to run it as a background process in the current shell)
.. code:: bash

Jaden Diefenbaugh
committed
./bin/webdriver-manager start
.. important::
You may only have 1 webdriver manager running at once.
- After the webdriver finishes initialization, you can run tests
.. code:: bash

Jaden Diefenbaugh
committed
./bin/protractor protractor-conf.js
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
- If you started your webdriver server as a background process, you can kill all webdriver processes
.. code:: bash
pkill -f webdriver-manager
Understanding the output of Protractor
======================================
By default Protractor prints to ``STDOUT``. If a test passes, nothing is printed about that particular test. If a test fails, Protractor will print more information about the failure, including the specific test, type of failure that occurred, and a stack trace. At the end of testing, Protractor will print a summary of the test run.
Saving test results
___________________
Beyond simply piping Protractor's output to a file, you may enable detailed logging via a specified JSON file. Just uncomment the relevant line in ``protractor-conf.js`` and optionally change the output file location:
.. code:: javascript
//resultJsonOutputFile: './protractor-test-results.json'
Adding your test to Protractor
==============================
The configuration file detailing the test files is ``protractor-conf.js``. The ``specs`` field is a comma-separated list of test files - just add your new test file to the list and run protractor again.
For example, to add the test file ``example-spec.js``:
- Before
.. code:: javascript
specs: [
'./beat/web/reports/static/reports/test/test-spec.js'
],
- After
.. code:: javascript
specs: [
'./beat/web/reports/static/reports/test/test-spec.js',
'example-spec.js'
],
Overriding Protractor's browser choices
=======================================
In ``protractor-conf.js``, add a ``multiCapabilities`` option in the following format:
.. code:: javascript
multiCapabilities: [
{
browserName: '<browser name 1>'
},
{
browserName: '<browser name 2>'
},
...
]
.. note::
You may need to download your browsers' WebDrivers separately - see `the official Selenium docs <https://seleniumhq.github.io/docs/wd.html#quick_reference>`_.
Writing Protractor tests
========================
Protractor uses and expects tests to use the `Jasmine BDD testing framework <https://jasmine.github.io/>`_. For a tutorial on writing Protractor tests, see the `official Protractor tutorial <http://www.protractortest.org/#/tutorial>`_. Protractor also has documentation on their website.
BEAT platform & Protractor's Angular support
____________________________________________
By default, Protractor assumes that the tested website will use Angular in a particular fashion to more intelligently detect a page that has finished rendering. However, the BEAT platform does not use Angular this way, and Protractor will hang forever. To tell Protractor not to assume this compatibility, add the following line at the top of each top-level ``describe`` block in your test files:
.. code:: javascript
browser.ignoreSynchronization = true;
.. _administratorguide-installation-instantiating:
Instantiating and Starting a Development System
-----------------------------------------------
For a simple (development) system, the default settings on
``beat/web/settings/settings.py`` should work out of the box. These settings:
* Instantiate the web service on the local host under port 8000 (the address
will be ``http://127.0.0.1:8000``
* Use an SQLITE3 database named ``django.sql3`` located on the current
working directory
* Run with full debug output
* It sets the working BEAT prefix to ``./web_dynamic_data``
* A single user, called ``user`` will be setup into the system. This user
will have administrative powers.
If you need to tweak these settings, just edit the file
``beat/web/settings/settings.py``. You may also consult the `Django
documentation`_ for detailed information on other settings.
Once the Django settings are tweaked to your liking, you can run a single
command to fully populate the development webserver with test databases,
toolchains, algorithms and experiments::
$ ./bin/django install -v1
.. note::
Concerning databases installed by this command, we only explain the platform
how to **access** their data. It does not download the raw data for the
databases that you must procure yourself through the relevant web sites
(checkout the database pages on the Idiap instance of the BEAT platform for
details).
.. note::
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
If you need to specify your own path to the directories containing the
databases, you could just create a simple JSON file as follows::
{
"atnt/1": "/remote/databases/atnt",
"banca/2": "/remote/databases/banca"
}
Then just use the previous script with the option ``--database-root-file``::
$ ./bin/django install -v1 --database-root-file=MYFILE.json
By default, paths to the root of all databases are set to match the Idiap
Research Institute filesystem organisation.
.. note::
For every installed database, you'll need to generate their data indices,
which allows the platform to correctly parallelize algorithms. To do so, for
every combination of database and version you wish to support, run the
following command::
$ ./bin/beat -p prefix db index <name>/<version>
Replacing the strings ``<name>`` by the name of the database you wish to dump
the indices for, together with the version in ``<version>``. For example, to
dump the indices for the AT&T database, version 1, do the following::
$ ./bin/beat -p prefix db index atnt/1
Once the contributions and users are in place, you're ready to start the test
server::
$ ./bin/django runserver
At this point, the platform can be accessed by typing the URL
``http://127.0.0.1:8000`` in a web browser on the machine the server is
running.
.. note::
To use a dedicated database server such as PostgreSQL, it is sufficient
to configure its Django-like settings in ``beat/web/settings/settings.py``,
assuming the the database server is operational.
.. _administratorguide-installation-allinone:
All-in-one Platform
===================
The BEAT platform is composed of 3 application types that run in synchrony to
create, store and process your experiments: the web server, the scheduler and
one or more workers. The web server is used by you to create and launch
experiments. The scheduler assigns experiment blocks (actually
:py:class:`beat.web.backend.JobSplit`'s) to run in one of the available
workers, respecting user quotas and worker limitations. The worker runs the
user algorithms installed on each block upon scheduling, notifying the web
server when it's done.
The base software framework and models that allow the 3 applications to run
cooperatively are described in one single place: the Django_ models and the
central database of this package. Effectively, it means this package contains
all information that is required to run the 3 types of applications. The
applications "communicate" between each other using the shared Django_
database, reading and modifying objects as experiments are assigned and
treated. Several deployment scenarios are therefore possible and you must use
the one most suited for your requirements.
In order to start the system, just run::
Once the Django development web server is up and running, open a browser and
navigate to http://127.0.0.1:8000. Login with an account with administrative
rights and click on the scheduler icon, using the omni-bar, on the top of any
page. Use the "Helper panel" available to launch one-off or repetitive
scheduling and/or worker activities. In this case, both the scheduling and
worker activities run in the context of the web server process.
Discrete Platform using Localhost
=================================
It is also possible to run each of the applications as separated processes.
Here is how to do it.
1. Start the web service normally::
2. Start the full scheduling setup::
$ ./bin/django full_scheduling
This will start all elements of the scheduling/working process. Docker can
be used for the worker node passing the ``--docker`` option.
Each element composing the scheduling can also be started separately:
1. Start a the broker node::
$ ./bin/django broker -v 2
2. Start a single scheduling node::
$ ./bin/django scheduler -v 2
3. Start a worker for your current node::
$ ./bin/django worker -v 2
By default, the applications are configured to figure out paths and
configuration options by themselves. You can override some defaults via the
command line. Just check the output of each of those commands running the
``--help`` flag on any of them.
You can mix and match any of the above techniques to run a 4-node system
(all-in-one or discrete) to build a test system to suite to your needs. For
example, it is possible to launch the scheduling activities using the web
server and the page reload trick while launching the worker process separately
as per above.
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
Going full scale
================
In order to transform the development system into a fully scale platform, you
will have to create your own maintenance scripts allowing you to automatically
start/stop, update and secure the BEAT platform applications across your BEAT
web nodes. It is beyond the scope of this documentation to enter into details
concerning these. We provide only some tips which we consider important:
* Don't use the SQLite backend on a production system, it does not work well
with the concurrency you may generate. Prefer a PostGRES SQL database.
* The "cache" directory (see the variable ``CACHE_ROOT`` on the Django_
settings file) is shared amongst all applications in the cluster. It is
adviseable you use a proper networked filesystem with good synchronisation
primitives to avoid issues concerning the production and consumption of
data caches between workers living in different nodes.
* Don't rely on your memory: script all deployment instructions so that you
can do them routinely whenever newer versions come up or you have an issue.
* Security: You'll be running code uploaded by users on your computer. Make
sure you properly isolate each of the processes and the backend farm to
avoid unpleasant surprises. Some helpers:
* Disk access: two main directories are shared across the applications. The
cache directory stores intermediary block results. The prefix directory
stores user contributions on disk. You may tune the file system access on
a distributed BEAT platform to increase its security:
- The web server only needs read access to the cache directories. It
needs read and write access to the prefix directory in order to store
user contributions.
- The scheduler needs read/write access to the cache directory. It does
not use the prefix directory and does not read or treat user
contributions. The scheduler also need access to the Django database.
- The workers need read/write access to the cache directory and read
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
access to the prefix directory. The workers also need access to the
Django database.
- The processes launched by the worker need to have similar permissions
as their worker. The user executable though, should have demoted
permissions to increase security. For example, no need to access the
Django database (or the settings file), the prefix or the cache. All is
done via the parent process. In order to implement this, the easiest is
to make sure the worker process is run by an unpriviledged user and a
group with the right access permissions, allowing it to access the
Django database (and the Django settings file), the prefix and the
cache. This will be inherited by the processes launched by the worker,
that will serve data to the processes wrapping the user code. To demote
the user process, just set the group id of the environment executable
to an unpriviledged group. This way, the following security chain is
achieved (pseudo user/groups)::
worker -> process -> environment exec(user code)
[nobody:beat] [nobody:beat] [nobody:nogroup]
It is a requirement by the BEAT platform that this process chain
belongs to the same user. Signals for stopping or killing the
applications in the chain if necessary.
If you don't do anything, then the user code will be run in a process
with the same privileges as the worker application.
* E-mail privileges: e-mailing maybe configured as part of the Django_
standard logging facilities or used to report experiment completion and

André Anjos
committed
other platform activity. While, by default, all node types have access to
Django the configuration and can potentially send e-mails, it is wiser to
use a Django extension such as Post-office_ to centralize e-mail sending
to one node, avoiding potential spam.
* User processes: user code is run in isolated processes launched by the
children of worker processes. Because the user code process does not
require disk access to either the prefix or the cache, it should run
without access to those resources in order to improve the platform
security. This may be achieved by running user processes in ``chroot``'ed
environments or making sure user code is launched with a user identity
which has far fewer access permissions than the worker process itself.
Have a look at the ``--help`` output of the ``worker`` application for
more information and examples.
You may contact our `support` in case you need advice concerning this topic.
Development Notes
-----------------
.. _administratorguide-installation-localhost-snapshot:
Backup and Restore
==================
The BEAT platform can be backed-up and restore easily. These commands allow for
safe information keeping, but also to copy over the state of a given deployment
to a local development server, where more thorough tests can be performed while
tracking a bug or improving performance.
It is easy to quickly setup a local system for development, taking as base the
current state of a production system. Here are some instructions:
1. Before starting, make sure you have gone through, at least once, over the
instructions above. It explains the very basic setup required for a complete
development environment.
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
2. Dump and back-up your current **production** BEAT database::
[production]$ ./bin/django backup
3. [Optional] If you have made important modifications between the contents
available at your production server and your currently checked-out source,
you'll need to run Django migrations on data imported from the production
server. If you need to do this, make sure you don't have unapplied commits
to your local **development** package and reset it to the production tag::
[development]$ git checkout <production-tag>
.. note::
You can figure you the production tag by looking at the footer of the
BEAT website. The corresponding tag name is found by prefixing a ``v``
before the version number. For example, the tag for version ``0.8.2`` of
the platform is ``v0.8.2``.
Also make sure to revert all dependent packages, so as to recreate the state
of the database schema as on the production site.
4. Remove the current local development database so that the restore operation
can start from scratch::
[development]$ rm -rf django.sql3 web_dynamic_data
5. Copy the backup tarball from the production server and restore it locally::
[development]$ scp root@<beatproductionmachine>:backups/<backup-filename>.tar.bz2
[development]$ ./bin/django restore <backup-filename>.tar.bz2
At this point, you have recreated a copy of your production system locally,
on your SQLite3 database.
6. Reset queue configuration to allow for local running.
You may, optionally, reset the queue configuration of your installation so
that the environment you have is compatible with your development machine,
so that you can immediately run experiments locally. To do so, use the
``qsetup`` Django command::
[development]$ ./bin/django qsetup --reset
7. Re-checkout the tip::
$ git co master #or any other branch
8. Apply migrations::
$ ./bin/django migrate
At this point, you should have a complete development setup with all elements
available on the production system installed locally. This system is fully
capable of running experiments locally using your machine.
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
Django migrations, introduced in version 1.7, is a useful feature for
automatically migrating your database to new model schemas, if you get it
right. Here is a recipe to make sure your migrations will work on your
production system, allowing for quick and repetitive test/fix cycles.
The key idea is that we follow the setup for the
administratorguide-installation-localhost-snapshot_ and then, locally backup
our database and prefix so that we can quickly reproduce the migration test
loop.
1. Make sure you go through the
administratorguide-installation-localhost-snapshot_ instructions above
(**up to step 6 only**).
2. Make a copy of the SQLite3 database::
$ cp -a django.sql3 django.sql3.backup
This backup will allow you to quickly test the migrations w/o having to
checkout the production version anymore.
Also, create a temporary git repository of ``web_dynamic_data``, so you can
cross-check changes and reset it in case of problems::
$ cd web_dynamic_data
$ git init .
$ git add .
$ git commit -m "Initial commit"
$ cd ..
3. Go back to the HEAD or branch you were developping before::
$ git checkout HEAD
4. Here is how to test/fix your migrations:
a. Run "django migrate"::
$ ./bin/django migrate
b. Check your database by visually inspecting it on the django web admin or
by manually dumping it.
c. If a problem is detected, fix it and revert the state::
$ cp -af django.sql3.backup django.sql3
$ cd web_dynamic_data && git reset --hard HEAD && git clean -fdx . \
& cd ..
.. note::
Tip: Write the above lines in a shell script so it is easy to repeat.
Go back to a. and restart.
Javascript Management with Node.js/Bower
========================================
We manage javascript external packages with the help of Bower_. If you'd like
to include more packages that will be statically served with the Django web
app, please consider including them at the appropriate section of
``buildout.cfg``.
If you find problems concerning this package, please post a message to our
`group mailing list`_. Currently open issues can be tracked at `our gitlab
page`_.
.. Place here references to all citations in lower case
.. _django documentation: https://doc.djangoproject.com/en/
.. _django: https://www.djangoproject.com/
.. _cpulimit: https://github.com/opsengine/cpulimit/
.. _pip: http://pypi.python.org/pypi/pip
.. _easy_install: http://pypi.python.org/pypi/setuptools
.. _zc.buildout: http://pypi.python.org/pypi/zc.buildout
.. _virtualenv: http://pypi.python.org/pypi/virtualenv
.. _group mailing list: https://groups.google.com/d/forum/beat-devel
.. _our gitlab page: https://gitlab.idiap.ch/beat/beat.web/issues
.. _bower: http://bower.io
.. _support: https://www.beat-eu.org/platform/contact/

André Anjos
committed
.. _post-office: https://pypi.python.org/pypi/django-post_office