WIP: Resolve "KMeans returns NaNs"
I have implemented a way to assure that there are no
KMeans. I think, the reason is that we do not check if there is any
m_zeroethOrderStatsfor the mean. When there exists a cluster with 0 features, the
m_stepwill divide a zero-vector by 0, which results in
In #3 I proposed to compute the point that is farthest from the current means. While I still believe that that would be a better solution, it requires much more computations, which we might not want to perform in the
Instead, whenever an empty mean is found, I simply select one of the training features randomly to present the new mean. I assure that no feature is taken twice, though I do not check for duplicates (yet).
Unfortunately, since I now require the data to be present in the
e_step, I had to change the C++ interface of the
KMeansTrainerto accept the data in the
mStep. I currently have increased the minor version of the package, but please feel free to change it to a major version bump if you think that this would be required.
In fact, this looks like WIP since it does not include any updated tests. The implementation should be OK.