Splitting the data one by one instead of chunk by chunk
Currently bob.bio.base (when submitting a job to gridtk) splits the data like this:
mylist = range(10) parallel_jobs = 2 list1 == range(5) list2 == range(5,10)
This is really cumbersome when you have an unbalanced database. For example right now I have a video database where the beginning samples have only 1 frame and quickly finish processing but the rest of the data have 20 frames in each samples which take 20 times more to process.
I was wondering if it is possible to split the data like the following when running in parallel:
mylist = range(10) parallel_jobs = 2 list1 == range(0,10,2) list2 == range(1,10,2)