KMeansTrainer python overloading is buggy
Created by: laurentes
There is a bug when overloading C++ methods of the KMeansTrainer from python. This might also affect the GMMTrainer.
The following program higlights the problem.
#!/usr/bin/env python import bob, numpy import logging logger = logging.getLogger("bob.c++") logger.setLevel(logging.DEBUG) machine = bob.machine.KMeansMachine(2,1) data = numpy.array([[-3],[-2],[-1],[0.],[1.],[2.],[3.]]) print data.shape class MyKMeansTrainer(bob.trainer.overload.KMeansTrainer): """Simple example of python trainer: """ def __init__(self): bob.trainer.overload.KMeansTrainer.__init__(self) def initialization(self, machine, data): bob.trainer.overload.KMeansTrainer.initialization(self, machine, data) machine.means = numpy.array([[-0.5], [ 0.5]]) print machine.means trainer = MyKMeansTrainer() trainer.convergence_threshold = 0.0005 max_iterations = 1 # Just do one iteration of the E-step and M-step trainer.max_iterations = max_iterations; # This does not work as expected trainer.train(machine, data) # After the call of initialization() the means are still [0.,0.] (at the C++ level, whereas the values printed from python are correct) # This means that the E-step is started with two identical means, and leads # to unexpected NaN values. print machine.means # This works as expected trainer.initialization(machine, data) print machine.means trainer.e_step(machine, data) trainer.m_step(machine, data) print machine.means ~
After some debugging, it seems that in the first case, the addresses of the KMeansMachine's respectively passed as arguments to the initialization() and e_step() methods differ. In contrast, using the second approach, they are the same, which is the expected behavior. I'm wondering if this is not due to the fact that we are passing non-const references of KMeansMachine back and forth from python to C++.
It would be nice to fix this issue for the next release 1.1, or otherwise, to disable this feature.