local-parallel queue is not setup well
The setup of the current local-parallel
configuration does not work as expected, for several reasons:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/config/distributed/local_parallel.py#L10
-
When we set
processes=False
, we will only use the python threading module, which will effectively limit the CPU usage to around 100% (i.e., one core), no matter how many cores we use. Only withprocesses=True
, we will get real parallelization. -
Selecting all possible CPUs via
cpu_count()
by default does not work well. I have a machine with 128 CPU cores, so setting up all 128 cores takes longer than an experiment -- especially when usingprocesses=False
above, I commonly get a timeout error.
Before, we had something like local-p4
with 4 parallel cores, and alike. I think it would be a good idea to incorporate several of these here. Are there any objections?