Integration with the new scheduler

Here is a task list with topics that require attention for the integration:

Hints on behavior and implementation details:

Use Cases

Simple xp submission by an user with no particular rights
A xp submission with a user which has 1000 points of reputation
A xp submission with a user that has priority on a given queue

New Tables on the Django DB:

We must first define an "environment". The environment has a name (e.g. "Python"), a version (e.g. "2.7.3"), a OS (e.g. "Debian Wheezy 7.2 (x86_64)"), a rich description string which defines all properties of that environment, including installed packages.

Each "queue" in the system is defined with a name (e.g. "3 hours/4G on Python"), a memory limit (e.g. "4096Mb"), a time limit (e.g. "3 hours"), an environment (e.g. "Python") and the number of slots the queue can occupy, at most, on every machine available in the system.)

Each user "library" consists of a bunch of files that are packaged together and put on a certain directory. Their organisation follows the same strategies as for algorithms. The sole exception is that each library is represented by a directory rather than a single file. The "library" is defined with a name (e.g. "lbp"), a version (e.g. "1.0"), an environment compatibility list and a list of other libraries this library depends on.

We must also define another table called "GroupQueueRights" in which we are going to track 4 aspects: queue - group - max slots - priority. For example: a user belonging to group "default" may be able to use 5 slots on queue "3 hours/4G on Python" with priority 0.

A user may belong to several groups. In this case, the platform should only consider the maximum slots/priority for each Queue when submitting the job to the scheduler.

E.g.: on GroupQueueRights table (queue name, group name, slots, priority):

Row 1: Q1 - default - 2 - 0
Row 2: Q1 - special - 1 - 1
Row 3: Q1 - super - 3 - 1

User "A" belongs to group "default": Computed user queue rights are (Q1 - 2 - 0)
User "B" belongs to groups "default" and "super": Computed user queue rights are (Q1 - 3 - 1)
User "C" belongs to groups "default" and "special": Computed user queue rights are (Q1 - 2 - 1)

Worker Perspective

The worker installed in each machine knows where each local environment with a given name and version is installed and how to execute user programs using that environment.

The program execution receives as parameters:

the environment
the parameters to call the environment executable with

N.B.: The parameter list API for all environment executables defines our so-called Sandboxing API. It has to be respected for all environment implementations.

The worker is just told what to do - it does not check for rights or know any of that.

User Perspective Operation

The user selects an overall execution queue for that experiment. It may also specify individual queues for individual blocks. The platform only allows the user to select queues in which the user has at least 1 slot for processing (other queues, even if they exist, are suppressed from the selection box).

After basic queue selection, every block on the toolchain executes in a single slot for the selected queues. If the algorithm does support a multi-slot operation, then the platform will allow the user to select how many slots (max'ed at the user rights) to use for a given block.

Users can submit as many jobs as they want. They will be treated according to queue rights and farm availability.

Web Platform

In possession of all this information, the web platform will submit to the scheduler, the experiment for execution. Each request for execution contains 4 components:

Toolchain
Configuration, containing the queues, libraries and slots the user wants to deploy
Username or ID
User queue rights - computed from the max of all the groups the user belongs to

NB: The reputation can be implemented as a multiplying factor. (ceilled at the total # of slots for a particular queue). For example, a user with Reputation = 200 has 2x more processing power than the established User queue rights.

Scheduler

The scheduler will receive run-xp requests and breaks down the experiments in jobs (representing the blocks) with dependencies.

At each queue loop it must decide what to execute based on:

Current slot availability for the different queues/users (user-queue occupation state needs to be stored)
Job queueing time (how old is the job?)
User queue rights/priority

N.B.: If the Scheduler receives a second job for the same user, the queue rights for that user will be updated with the new values, in case they differ.

Simplifications for first implementation

There is only 1 environment installed, based on Python/"execute_single_algorithm.py".
All queue priorities are set to 0 (i.e. the scheduler can ignore it)
#slots/memory goes in pairs until we understand all this a bit better
The scheduler implements only a FIFO (first queued/runnable is run) strategy based on the job submission age
Libraries must be written in pure Python
No reputation system is in place just yet