WIP: Trial to implement EM with multiprocessing
Currently, the EM by default runs in sequential mode, i.e., using only a single core of a multicore system. Implementing this using multithreading or multiprocessing would be great.
Here, I started implementing a version using
multiprocessing, where you can choose to take a
multiprocessing.Pool (for several processes) or a
multiprocessing.pool.ThreadPool for multiple threads.
Unfortunately it is not working yet, but @tiago.pereira might have an idea how to make it work.
Not sure. Probably not. At least, I understand now, why it is not working at the moment. There are issues with pickling and unpickling C++ objects, which is required by
multiprocessingto work. There would be solutions to implement a serialization in the C++ objects, but I am not sure if I would want to go down this path.
Otherwise, we would need to implement to get the data of the C++ objects before sending them to the processes, and create new objects inside the
_parallel_e_stepfunction. I have successfully done that with
def _project(params): data, matrix, mean = params machine = bob.learn.linear.Machine(matrix) machine.input_subtract = mean return machine(data) def project_all(machine, all_data, pool): pool.map(_project, [(data, machine.weights, machine.mean) for data in all_data])
although this implementation is not very nice.
I will handle this in the next few weeks (not for the milestone), however I will go for a pythonic approach using multiprocessing (it works fine ;-) ).
It that good enough?
removed milestoneToggle commit list
No time, although I have everything in my mind to do it.