[backend] Smart scheduling policy
This issue was migrated from the old
beat.scheduler package. Original bug report by Laurent El-Shafey.
The scheduling policy currently implemented is naive, which means that it assigns the jobs when it can, in the order of processing.
In particular, this policy is not able to properly address the following problems:
- Fair (and/or controlled) sharing of computing power between users
- Prioritization of jobs. This consists of both:
- User-based prioritization
- Queing duration-based prioritization
- Non-starvation of jobs when resources (computing nodes) are shared between different queues (commonly known as 'resource reservation'). A typical case is when the queue relies on several cores per slot.
Ideally, we would like to address all these three points in a smarter scheduling policy. All these may require the following variables:
- The 'weight' of a user on a given queue (set by the administrator)
- The 'reputation' of the user (set by the administrator)
- The 'relative priority' of a job (set by the user)
- The 'queing duration' of a job (determined via the scheduler)
In addition to these variables, we may define additional parameters (set by the administrator), which will fine tune the behavior of this policy. For instance, we may define the 'relative weight' of a queue wrt. others, to address Point 3.
The main difficulty is to find a 'good' way to schedule the jobs, while keeping a good trade-off between the three (often contradictory) requirements.
Following are aspects that arise, when attempting to address all these three requirements:
- Fair sharing of computing power
- The computing-load caused by a given user can be computed per queue or globally. Which one should be consider? Both? Considering the load on a queue basis, this will penalize users, who just want to use a specific queue. In fact, users may be tempted to 'artificially' use several queues just to get few more computing slots. Considering it globally leads to the problem of how to measure it, since queues may be attached to different 'amount' of computing power (how to consider the number of slots, cores, maximum execution time, etc. when averaging to get the 'global' load caused by a user).
- There are different ways to compute the 'load': instantly or over a fixed period of time. SGE relies on the latter case. What would be the best option for our problem?
- Prioritization of jobs
- Queuing time may be infinite. How should be the scale used to decide, whether a job queued long time ago should run or not. Linear? Square root? logarithmic? etc.
- User-specific prioritization and queueing time prioritization may be in complete contradiction. How to address this problem and find a good trade-off between these two?
- Non-starvation of jobs when resources (computing nodes) are shared between different queues
- A job queued for a long time will have a high priority, since there is a policy-based on the queueing time. However, how could we guarantee, that it is going to run and not starve, because resources keep being used by jobs on other queues? For this purpose, we need to implement a resource reservation mechanism.
Overall, the difficulty is to combine all these requirements together. Could we generate a single priority value for a job, based on all these aspects, using some weighting mechanism?
Besides, each time a job is assigned, many of these values are theoretically updated. Should we recompute them all the time? Several times per scheduling loop? Only once (which may be a real problem regarding the 'fair sharing of computing power' aspect)
More comments from Laurent
I've just implemented few methods for the scheduler to compute instantaneous load information to address 1. In a first iteration, we won't rely on history/cumulative loads.
Wrt. fair shairing of computing power, we still need to address the problem of global load caused by a given user. This may be done directly in the smart scheduling policy.