installation.rst

.. vim: set fileencoding=utf-8 :

.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/          ..
.. Contact: beat.support@idiap.ch                                             ..
..                                                                            ..
.. This file is part of the beat.web module of the BEAT platform.             ..
..                                                                            ..
.. Commercial License Usage                                                   ..
.. Licensees holding valid commercial BEAT licenses may use this file in      ..
.. accordance with the terms contained in a written agreement between you     ..
.. and Idiap. For further information contact tto@idiap.ch                    ..
..                                                                            ..
.. Alternatively, this file may be used under the terms of the GNU Affero     ..
.. Public License version 3 as published by the Free Software and appearing   ..
.. in the file LICENSE.AGPL included in the packaging of this file.           ..
.. The BEAT platform is distributed in the hope that it will be useful, but   ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE.                                       ..
..                                                                            ..
.. You should have received a copy of the GNU Affero Public License along     ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/.          ..


.. _administratorguide-installation:

==============
 Installation
==============

In this section, we provide basic instructions and fundamental ideas required
to deploy the BEAT platform. Depending on the deployment strategy (single
machine or distributed across several machines), the installation instructions
will of course differ. Nevertheless, configuring and installing a simple
platform instance remains reasonably easy.


Installing beat.web
-------------------

The BEAT platform is written as a set of python packages. This package
(beat.web), in particular, constitutes the central deployment pillar of BEAT
platform instance. It uses as a base development library, a web framework
called Django_. If you are unfamiliar with this framework, but wishes to deploy
or develop the BEAT platform, it is recommended you familiarize yourself with
it.

To deploy a platform on a single machine, it is, hence, sufficient to install
``beat.web`` to get the full BEAT software stack installed. The recipe is as
follows::

  $ # after downloading and extracting the beat.web package
  $ python bootstrap-buildout.py
  $ ./bin/buildout

These two commands should download and install all non-installed dependencies
and generate a fully operational test and development environment.


.. note::

   cpulimit has been superseded by the use of Docker

.. tip::

  If you'd like to **speed-up** the installation, it is strongly advised you
  prepare a preset virtual environment (see the virtualenv_ package) with all
  required dependencies, so that ``./bin/buildout`` does not download and
  installs all of them every time you cleanup. This technique should allow you
  to quickly clean-up and re-start your working environment which is useful
  during development.

  In order to fetch currently needed dependencies, run::

    $ ./bin/buildout #to setup once
    $ ./bin/pip freeze > requirements.txt

  Examine the file ``requirements.txt`` and remove packages you are either
  developing locally (e.g., all that are under ``src``) or that you think you
  don't need. The command ``pip freeze`` reports all installed packages and not
  only those which are needed by your project. If the Python prompt you used
  for bootstrapping already had a good set of packages installed, you may see
  them there.

  Once you have a satisfying ``requirements.txt`` file, you may proceed to
  recreate a virtualenv_ you'll use for your development. Just call::

    $ virtualenv ~/work/beat-env #--system-site-packages

  To create the virtual environment. This new environment does not contain
  system packages by default. You may override that by specifying
  ``--system-site-packages`` as suggested above. Then, install the required
  packages on your new virtual environment::

    $ ~/work/beat-env/bin/pip install -r requirements.txt

  After that step is done, your virtual environment is ready for deployment.
  You may now start from scratch to develop ``beat.web`` taking as base the
  Python interpreter on your virtualenv_::

    $ cd beat.web
    $ git clean -fdx #full clean-up
    $ ~/work/beat-env/bin/python bootstrap-buildout.py
    $ ./bin/buildout

  You'll realize the buildout step now takes considerably less time and you may
  repeat this last step as much as needed. ``pip`` is a very flexible tool and
  you may use it to manage the virtualenv_ installing and removing packages as
  needed.


Documentation
-------------

The documentation project is divided in 3 parts. The user guide is the only one
which is automatically built as part of the ``buildout`` procedure. The API and
administrators guide need to be manually compiled if required.

To build the API documentation, just do::

  $ ./bin/sphinx-apidoc --separate -d 2 --output=doc/api/api beat beat/web/*/migrations beat/web/*/tests
  $ ./bin/sphinx-build doc/api html/api


To build the administrator guide, just do::

  $ ./bin/sphinx-build doc/admin html/admin


The above commands will build the stated guides, in HTML format, and dump
results into your local directory ``html``. You may navigate then to that
directory and, with your preferred web browser, open the file ``index.html`` to
browse the available documentation.

The basic user guide which includes information for users of the platform, is
built automatically upon ``buildout``. If you wish to build it and place it
alongside the other guides, you may do it as well like this::

  $ ./bin/sphinx-build doc/user html/user


Unit Testing
------------

After installation, it is possible to run a suite of unit tests to check for
the installation sanity. To do so, use::

  $ ./bin/django test --settings=beat.web.settings.test -v 1

You may pass filtering criteria to just launch tests for a particular set of
``beat.web`` applications. For example, to run tests only concerning
``beat.web.toolchains``, run::

  $ ./bin/django test --settings=beat.web.settings.test -v 1 beat.web.toolchains.tests

To measure coverage, you must set an environment variable for nose::

  $ ./bin/coverage run --source='./beat/web' ./bin/django test --settings=beat.web.settings.test
  $ ./bin/coverage report

Or, to generate an HTML report::

  $ ./bin/coverate html

.. tip::

   You may significatively speed-up your testing by re-using the same test
   database from run to run. In order to do this, just specify the flag
   ``--keepdb`` when you run your tests::

     $ ./bin/django test --settings=beat.web.settings.test -v 1 --keepdb

   In this case, Django will create and keep a test database called
   ``test.sql3`` on your current directory. You may delete it when you're done.


End-to-End Testing
------------------

`Protractor <http://www.protractortest.org/#/>`_ is an e2e (end-to-end) testing tool for web apps. Protractor runs tests through Selenium using a real browser, and as such needs a headed environment and a compatible browser installed.

.. warning::
   Protractor will open a new browser window in the foreground when it is started.

Setup
=====

There are two system dependencies to run Selenium:

- Java 8 must be available in your PATH
- If you want to run the testing in a GNOME environment, you need `GConf <https://projects.gnome.org/gconf/>`_

Download/update Protractor's dependencies into the local repository (Selenium & more):

  .. code:: bash

   ./bin/webdriver-manager update

Running tests with the provided script
======================================

The ``protractor.sh`` script is a one-liner to run Protractor tests. It handles database creation/saving/restoring and manages the required local server processes. However, it assumes several things:

- It is being ran in the top directory of the ``beat.web`` repository
- The repository has already ran ``./bin/buildout`` successfully and with default development configuration
- Protractor's ``.conf`` file is ``./protractor-conf.js``
- No additional arguments need to be passed to ``webdriver-manager`` or Django ``runserver``
- Django uses ``./django.sql3`` as the database
- If ``./template.django.sql3`` does not exist, the default database generated by ``./bin/django install`` is sufficient for testing the basic tests. However, some tests will fail and it is suggested to provide a database with experiments that have been ran successfully.

Manual test running
===================

If the ``protractor.sh`` script won't work, one can test manually.

The ``webdriver-manager`` must be running while testing. To run tests using a local BEAT web server, you must have the BEAT web server up as well.

Starting the webdriver server
_____________________________

- Start the webdriver server in a separate shell (or append `` &`` to run it as a background process in the current shell)

  .. code:: bash

     ./bin/webdriver-manager start

  .. important::

     You may only have 1 webdriver manager running at once.

- After the webdriver finishes initialization, you can run tests

  .. code:: bash

     ./bin/protractor protractor-conf.js

- If you started your webdriver server as a background process, you can kill all webdriver processes

  .. code:: bash

     pkill -f webdriver-manager

Understanding the output of Protractor
======================================

By default Protractor prints to ``STDOUT``. If a test passes, nothing is printed about that particular test. If a test fails, Protractor will print more information about the failure, including the specific test, type of failure that occurred, and a stack trace. At the end of testing, Protractor will print a summary of the test run.

Saving test results
___________________

Beyond simply piping Protractor's output to a file, you may enable detailed logging via a specified JSON file. Just uncomment the relevant line in ``protractor-conf.js`` and optionally change the output file location:

.. code:: javascript

	//resultJsonOutputFile: './protractor-test-results.json'

Adding your test to Protractor
==============================

The configuration file detailing the test files is ``protractor-conf.js``. The ``specs`` field is a comma-separated list of test files - just add your new test file to the list and run protractor again.

For example, to add the test file ``example-spec.js``:

- Before

  .. code:: javascript

     specs: [
            './beat/web/reports/static/reports/test/test-spec.js'
     ],

- After

  .. code:: javascript

     specs: [
            './beat/web/reports/static/reports/test/test-spec.js',
            'example-spec.js'
     ],

Overriding Protractor's browser choices
=======================================

In ``protractor-conf.js``, add a ``multiCapabilities`` option in the following format:

.. code:: javascript

    multiCapabilities: [
        {
                browserName: '<browser name 1>'
        },
        {
                browserName: '<browser name 2>'
        },
        ...
    ]

.. note::

   You may need to download your browsers' WebDrivers separately - see `the official Selenium docs <https://seleniumhq.github.io/docs/wd.html#quick_reference>`_.

Writing Protractor tests
========================

Protractor uses and expects tests to use the `Jasmine BDD testing framework <https://jasmine.github.io/>`_. For a tutorial on writing Protractor tests, see the `official Protractor tutorial <http://www.protractortest.org/#/tutorial>`_. Protractor also has documentation on their website.

BEAT platform & Protractor's Angular support
____________________________________________

By default, Protractor assumes that the tested website will use Angular in a particular fashion to more intelligently detect a page that has finished rendering. However, the BEAT platform does not use Angular this way, and Protractor will hang forever. To tell Protractor not to assume this compatibility, add the following line at the top of each top-level ``describe`` block in your test files:

.. code:: javascript

   browser.ignoreSynchronization = true;

.. _administratorguide-installation-instantiating:

Instantiating and Starting a Development System
-----------------------------------------------

For a simple (development) system, the default settings on
``beat/web/settings/settings.py`` should work out of the box. These settings:

  * Instantiate the web service on the local host under port 8000 (the address
    will be ``http://127.0.0.1:8000``
  * Use an SQLITE3 database named ``django.sql3`` located on the current
    working directory
  * Run with full debug output
  * It sets the working BEAT prefix to ``./web_dynamic_data``
  * A single user, called ``user`` will be setup into the system. This user
    will have administrative powers.

If you need to tweak these settings, just edit the file
``beat/web/settings/settings.py``. You may also consult the `Django
documentation`_ for detailed information on other settings.

Once the Django settings are tweaked to your liking, you can run a single
command to fully populate the development webserver with test databases,
toolchains, algorithms and experiments::

  $ ./bin/django install -v1

.. note::

   Concerning databases installed by this command, we only explain the platform
   how to **access** their data. It does not download the raw data for the
   databases that you must procure yourself through the relevant web sites
   (checkout the database pages on the Idiap instance of the BEAT platform for
   details).

.. note::

  If you need to specify your own path to the directories containing the
  databases, you could just create a simple JSON file as follows::

    {
      "atnt/1": "/remote/databases/atnt",
      "banca/2": "/remote/databases/banca"
    }

  Then just use the previous script with the option ``--database-root-file``::

    $ ./bin/django install -v1 --database-root-file=MYFILE.json

  By default, paths to the root of all databases are set to match the Idiap
  Research Institute filesystem organisation.

.. note::

  For every installed database, you'll need to generate their data indices,
  which allows the platform to correctly parallelize algorithms. To do so, for
  every combination of database and version you wish to support, run the
  following command::

    $ ./bin/beat -p prefix db index <name>/<version>

  Replacing the strings ``<name>`` by the name of the database you wish to dump
  the indices for, together with the version in ``<version>``. For example, to
  dump the indices for the AT&T database, version 1, do the following::

    $ ./bin/beat -p prefix db index atnt/1

Once the contributions and users are in place, you're ready to start the test
server::

  $ ./bin/django runserver

At this point, the platform can be accessed by typing the URL
``http://127.0.0.1:8000`` in a web browser on the machine the server is
running.

.. note::

   To use a dedicated database server such as PostgreSQL, it is sufficient
   to configure its Django-like settings in ``beat/web/settings/settings.py``,
   assuming the the database server is operational.


.. _administratorguide-installation-allinone:

All-in-one Platform
===================

The BEAT platform is composed of 3 application types that run in synchrony to
create, store and process your experiments: the web server, the scheduler and
one or more workers. The web server is used by you to create and launch
experiments. The scheduler assigns experiment blocks (actually
:py:class:`beat.web.backend.JobSplit`'s) to run in one of the available
workers, respecting user quotas and worker limitations. The worker runs the
user algorithms installed on each block upon scheduling, notifying the web
server when it's done.

The base software framework and models that allow the 3 applications to run
cooperatively are described in one single place: the Django_ models and the
central database of this package. Effectively, it means this package contains
all information that is required to run the 3 types of applications. The
applications "communicate" between each other using the shared Django_
database, reading and modifying objects as experiments are assigned and
treated. Several deployment scenarios are therefore possible and you must use
the one most suited for your requirements.

In order to start the system, just run::

  $ ./bin/django runserver

Once the Django development web server is up and running, open a browser and
navigate to http://127.0.0.1:8000. Login with an account with administrative
rights and click on the scheduler icon, using the omni-bar, on the top of any
page. Use the "Helper panel" available to launch one-off or repetitive
scheduling and/or worker activities. In this case, both the scheduling and
worker activities run in the context of the web server process.


Discrete Platform using Localhost
=================================

It is also possible to run each of the applications as separated processes.
Here is how to do it.

  1. Start the web service normally::

        $ ./bin/django runserver

  2. Start the full scheduling setup::

        $ ./bin/django full_scheduling

This will start all elements of the scheduling/working process. Docker can
be used for the worker node passing the ``--docker`` option.

Each element composing the scheduling can also be started separately:

  1. Start a the broker node::

        $ ./bin/django broker -v 2

  2. Start a single scheduling node::

        $ ./bin/django scheduler -v 2

  3. Start a worker for your current node::

        $ ./bin/django worker -v 2

By default, the applications are configured to figure out paths and
configuration options by themselves. You can override some defaults via the
command line. Just check the output of each of those commands running the
``--help`` flag on any of them.


Mixing and matching
===================

You can mix and match any of the above techniques to run a 4-node system
(all-in-one or discrete) to build a test system to suite to your needs. For
example, it is possible to launch the scheduling activities using the web
server and the page reload trick while launching the worker process separately
as per above.


Going full scale
================

In order to transform the development system into a fully scale platform, you
will have to create your own maintenance scripts allowing you to automatically
start/stop, update and secure the BEAT platform applications across your BEAT
web nodes. It is beyond the scope of this documentation to enter into details
concerning these. We provide only some tips which we consider important:

  * Don't use the SQLite backend on a production system, it does not work well
    with the concurrency you may generate. Prefer a PostGRES SQL database.

  * The "cache" directory (see the variable ``CACHE_ROOT`` on the Django_
    settings file) is shared amongst all applications in the cluster. It is
    adviseable you use a proper networked filesystem with good synchronisation
    primitives to avoid issues concerning the production and consumption of
    data caches between workers living in different nodes.

  * Don't rely on your memory: script all deployment instructions so that you
    can do them routinely whenever newer versions come up or you have an issue.

  * Security: You'll be running code uploaded by users on your computer. Make
    sure you properly isolate each of the processes and the backend farm to
    avoid unpleasant surprises. Some helpers:

    * Disk access: two main directories are shared across the applications. The
      cache directory stores intermediary block results. The prefix directory
      stores user contributions on disk. You may tune the file system access on
      a distributed BEAT platform to increase its security:

      - The web server only needs read access to the cache directories. It
        needs read and write access to the prefix directory in order to store
        user contributions.

      - The scheduler needs read/write access to the cache directory. It does
        not use the prefix directory and does not read or treat user
        contributions. The scheduler also need access to the Django database.

      - The workers need read/write access to the cache directory and read
        access to the prefix directory. The workers also need access to the
        Django database.

      - The processes launched by the worker need to have similar permissions
        as their worker. The user executable though, should have demoted
        permissions to increase security. For example, no need to access the
        Django database (or the settings file), the prefix or the cache. All is
        done via the parent process. In order to implement this, the easiest is
        to make sure the worker process is run by an unpriviledged user and a
        group with the right access permissions, allowing it to access the
        Django database (and the Django settings file), the prefix and the
        cache. This will be inherited by the processes launched by the worker,
        that will serve data to the processes wrapping the user code. To demote
        the user process, just set the group id of the environment executable
        to an unpriviledged group. This way, the following security chain is
        achieved (pseudo user/groups)::

             worker        ->      process      -> environment exec(user code)
          [nobody:beat]         [nobody:beat]           [nobody:nogroup]

        It is a requirement by the BEAT platform that this process chain
        belongs to the same user. Signals for stopping or killing the
        applications in the chain if necessary.

        If you don't do anything, then the user code will be run in a process
        with the same privileges as the worker application.

    * E-mail privileges: e-mailing maybe configured as part of the Django_
      standard logging facilities or used to report experiment completion and
      other platform activity. While, by default, all node types have access to
      Django the configuration and can potentially send e-mails, it is wiser to
      use a Django extension such as Post-office_ to centralize e-mail sending
      to one node, avoiding potential spam.

    * User processes: user code is run in isolated processes launched by the
      children of worker processes. Because the user code process does not
      require disk access to either the prefix or the cache, it should run
      without access to those resources in order to improve the platform
      security. This may be achieved by running user processes in ``chroot``'ed
      environments or making sure user code is launched with a user identity
      which has far fewer access permissions than the worker process itself.
      Have a look at the ``--help`` output of the ``worker`` application for
      more information and examples.

You may contact our `support` in case you need advice concerning this topic.

Development Notes
-----------------


.. _administratorguide-installation-localhost-snapshot:

Backup and Restore
==================

The BEAT platform can be backed-up and restore easily. These commands allow for
safe information keeping, but also to copy over the state of a given deployment
to a local development server, where more thorough tests can be performed while
tracking a bug or improving performance.

It is easy to quickly setup a local system for development, taking as base the
current state of a production system. Here are some instructions:


1. Before starting, make sure you have gone through, at least once, over the
   instructions above. It explains the very basic setup required for a complete
   development environment.


2. Dump and back-up your current **production** BEAT database::

     [production]$ ./bin/django backup


3. [Optional] If you have made important modifications between the contents
   available at your production server and your currently checked-out source,
   you'll need to run Django migrations on data imported from the production
   server. If you need to do this, make sure you don't have unapplied commits
   to your local **development** package and reset it to the production tag::

     [development]$ git checkout <production-tag>

   .. note::

      You can figure you the production tag by looking at the footer of the
      BEAT website. The corresponding tag name is found by prefixing a ``v``
      before the version number. For example, the tag for version ``0.8.2`` of
      the platform is ``v0.8.2``.


   Also make sure to revert all dependent packages, so as to recreate the state
   of the database schema as on the production site.


4. Remove the current local development database so that the restore operation
   can start from scratch::

     [development]$ rm -rf django.sql3 web_dynamic_data


5. Copy the backup tarball from the production server and restore it locally::

     [development]$ scp root@<beatproductionmachine>:backups/<backup-filename>.tar.bz2
     [development]$ ./bin/django restore <backup-filename>.tar.bz2

   At this point, you have recreated a copy of your production system locally,
   on your SQLite3 database.


6. Reset queue configuration to allow for local running.

   You may, optionally, reset the queue configuration of your installation so
   that the environment you have is compatible with your development machine,
   so that you can immediately run experiments locally. To do so, use the
   ``qsetup`` Django command::

     [development]$ ./bin/django qsetup --reset


7. Re-checkout the tip::

   $ git co master #or any other branch


8. Apply migrations::

   $ ./bin/django migrate


At this point, you should have a complete development setup with all elements
available on the production system installed locally. This system is fully
capable of running experiments locally using your machine.


Testing Django Migrations
=========================

Django migrations, introduced in version 1.7, is a useful feature for
automatically migrating your database to new model schemas, if you get it
right. Here is a recipe to make sure your migrations will work on your
production system, allowing for quick and repetitive test/fix cycles.

The key idea is that we follow the setup for the
administratorguide-installation-localhost-snapshot_ and then, locally backup
our database and prefix so that we can quickly reproduce the migration test
loop.


1. Make sure you go through the
   administratorguide-installation-localhost-snapshot_ instructions above
   (**up to step 6 only**).


2. Make a copy of the SQLite3 database::

     $ cp -a django.sql3 django.sql3.backup

   This backup will allow you to quickly test the migrations w/o having to
   checkout the production version anymore.

   Also, create a temporary git repository of ``web_dynamic_data``, so you can
   cross-check changes and reset it in case of problems::

     $ cd web_dynamic_data
     $ git init .
     $ git add .
     $ git commit -m "Initial commit"
     $ cd ..


3. Go back to the HEAD or branch you were developping before::

     $ git checkout HEAD


4. Here is how to test/fix your migrations:

   a. Run "django migrate"::

        $ ./bin/django migrate

   b. Check your database by visually inspecting it on the django web admin or
      by manually dumping it.

   c. If a problem is detected, fix it and revert the state::

        $ cp -af django.sql3.backup django.sql3
        $ cd web_dynamic_data && git reset --hard HEAD && git clean -fdx . \
          & cd ..

      .. note::

         Tip: Write the above lines in a shell script so it is easy to repeat.

      Go back to a. and restart.


Javascript Management with Node.js/Bower
========================================

We manage javascript external packages with the help of Bower_. If you'd like
to include more packages that will be statically served with the Django web
app, please consider including them at the appropriate section of
``buildout.cfg``.


Issues
------

If you find problems concerning this package, please post a message to our
`group mailing list`_. Currently open issues can be tracked at `our gitlab
page`_.


.. Place here references to all citations in lower case

.. _django documentation: https://doc.djangoproject.com/en/
.. _django: https://www.djangoproject.com/
.. _cpulimit: https://github.com/opsengine/cpulimit/
.. _pip: http://pypi.python.org/pypi/pip
.. _easy_install: http://pypi.python.org/pypi/setuptools
.. _zc.buildout: http://pypi.python.org/pypi/zc.buildout
.. _virtualenv: http://pypi.python.org/pypi/virtualenv
.. _group mailing list: https://groups.google.com/d/forum/beat-devel
.. _our gitlab page: https://gitlab.idiap.ch/beat/beat.web/issues
.. _bower: http://bower.io
.. _support: https://www.beat-eu.org/platform/contact/
.. _post-office: https://pypi.python.org/pypi/django-post_office