KMeansTrainer python overloading is buggy

Created by: laurentes

There is a bug when overloading C++ methods of the KMeansTrainer from python. This might also affect the GMMTrainer.

The following program higlights the problem.

#!/usr/bin/env python

import bob, numpy
import logging
logger = logging.getLogger("bob.c++")
logger.setLevel(logging.DEBUG)

machine = bob.machine.KMeansMachine(2,1)
data = numpy.array([[-3],[-2],[-1],[0.],[1.],[2.],[3.]])
print data.shape

class MyKMeansTrainer(bob.trainer.overload.KMeansTrainer):
  """Simple example of python trainer: """
  def __init__(self):
    bob.trainer.overload.KMeansTrainer.__init__(self)
 
  def initialization(self, machine, data):
    bob.trainer.overload.KMeansTrainer.initialization(self, machine, data)
    machine.means = numpy.array([[-0.5], [ 0.5]])
    print machine.means

trainer = MyKMeansTrainer()
trainer.convergence_threshold = 0.0005
max_iterations = 1 # Just do one iteration of the E-step and M-step
trainer.max_iterations = max_iterations;


# This does not work as expected
trainer.train(machine, data) 
# After the call of initialization() the means are still [0.,0.] (at the C++ level, whereas the values printed from python are correct)
# This means that the E-step is started with two identical means, and leads
# to unexpected NaN values.
print machine.means

# This works as expected
trainer.initialization(machine, data)
print machine.means
trainer.e_step(machine, data)
trainer.m_step(machine, data)
print machine.means
~

After some debugging, it seems that in the first case, the addresses of the KMeansMachine's respectively passed as arguments to the initialization() and e_step() methods differ. In contrast, using the second approach, they are the same, which is the expected behavior. I'm wondering if this is not due to the fact that we are passing non-const references of KMeansMachine back and forth from python to C++.

It would be nice to fix this issue for the next release 1.1, or otherwise, to disable this feature.