bob issueshttps://gitlab.idiap.ch/bob/bob/-/issues2019-07-16T14:50:50Zhttps://gitlab.idiap.ch/bob/bob/-/issues/130Bob misses a Covariance-based PCA trainer2019-07-16T14:50:50ZAndré AnjosBob misses a Covariance-based PCA trainer*Created by: anjos*
This should be relatively easy to implement and, as long as the number of training examples is greater than the number of features in each sample, it should produce faster results than the SVDPCATrainer. Memory-wise,...*Created by: anjos*
This should be relatively easy to implement and, as long as the number of training examples is greater than the number of features in each sample, it should produce faster results than the SVDPCATrainer. Memory-wise, it should be less efficient though.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/109libsvm 3.15 and 3.16 potential crash on svm_free_model_content()2019-07-16T14:50:50ZAndré Anjoslibsvm 3.15 and 3.16 potential crash on svm_free_model_content()*Created by: anjos*
The [following bug](https://trac.macports.org/ticket/37862) must be closed before we can properly release bob-1.2.0. The MacPort will be crippled without this.
The patch (to libsvm) is easy:
```patch
--- svm.c...*Created by: anjos*
The [following bug](https://trac.macports.org/ticket/37862) must be closed before we can properly release bob-1.2.0. The MacPort will be crippled without this.
The patch (to libsvm) is easy:
```patch
--- svm.cpp.orig 2013-01-31 12:03:51.000000000 +0100
+++ svm.cpp 2013-01-31 11:58:02.000000000 +0100
@@ -2747,6 +2747,7 @@
model->probB = NULL;
model->label = NULL;
model->nSV = NULL;
+ model->sv_indices = NULL;
char cmd[81];
while(1)
```
If you have problems using our libsvm bindings under MacPorts, please verify if you are not using one of the two versions of libsvm indicated in this bug report. If so, downgrade it to 3.14. or re-build it with the fix applied.
The author of libsvm has been informed of this problem and also the MacPorts maintainer. Both received the same patch instructions.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/117"hides overloaded virtual function [-Woverloaded-virtual]" warnings with llvm...2019-02-13T14:33:28ZAndré Anjos"hides overloaded virtual function [-Woverloaded-virtual]" warnings with llvm/clang*Created by: laurentes*
I've noticed the following warnings when we use the llvm/clang compiler (only on OS X):
```
/Users/buildbot/work/buildbot/macosx-10.8-x86_64-incremental+master/build/include/bob/trainer/IVectorTrainer.h:122:10:...*Created by: laurentes*
I've noticed the following warnings when we use the llvm/clang compiler (only on OS X):
```
/Users/buildbot/work/buildbot/macosx-10.8-x86_64-incremental+master/build/include/bob/trainer/IVectorTrainer.h:122:10: warning: 'bob::trainer::IVectorTrainer::is_similar_to' hides overloaded virtual function [-Woverloaded-virtual]
1 warning generated.
/Users/buildbot/work/buildbot/macosx-10.8-x86_64-incremental+master/build/include/bob/trainer/IVectorTrainer.h:122:10: warning: 'bob::trainer::IVectorTrainer::is_similar_to' hides overloaded virtual function [-Woverloaded-virtual]
1 warning generated.
```
cf. [here](https://www.idiap.ch/software/bob/buildbot/builders/0-macosx-10.8-x86_64-incremental%2Bmaster/builds/465/steps/compile/logs/warnings%20%2819%29)
I don't understand why this is occuring. Does anyone have a clue? The recipe to get rid of them is to change the inline definition in the templated class EMTrainer
```c++
virtual bool is_similar_to(const EMTrainer& b, const double r_epsilon=1e-5, const double a_epsilon=1e-8) const
{
return m_compute_likelihood == b.m_compute_likelihood &&
bob::core::isClose(m_convergence_threshold, b.m_convergence_threshold, r_epsilon, a_epsilon) &&
m_max_iterations == b.m_max_iterations;
}
```
into a single declaration within the EMTrainer class definition
```c++
bool is_similar_to(const IVectorTrainer& b, const double r_epsilon=1e-5, const double a_epsilon=1e-8) const;
```
and a definition of this method outside the class as follows
```c++
template<class T_machine, class T_sampler>
bool bob::trainer::EMTrainer<T_machine,T_sampler>::is_similar_to(const bob::trainer::EMTrainer<T_machine,T_sampler>& b,
const double r_epsilon, const double a_epsilon) const
{
return m_compute_likelihood == b.m_compute_likelihood &&
bob::core::isClose(m_convergence_threshold, b.m_convergence_threshold, r_epsilon, a_epsilon) &&
m_max_iterations == b.m_max_iterations;
}
```
This only happens for is_similar_to method and not for operator==(), which means that the default/optional arguments might be the cause of the problem.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/104Wishes for the next major release (1.2.0)2019-02-13T14:33:28ZAndré AnjosWishes for the next major release (1.2.0)*Created by: laurentes*
I've decided to create a thread to help us to converge towards a better Bob for the next major release. I doubt we will have the time to deal with everything shortly, but there will be at least a trace of it. Fee...*Created by: laurentes*
I've decided to create a thread to help us to converge towards a better Bob for the next major release. I doubt we will have the time to deal with everything shortly, but there will be at least a trace of it. Feel free to update this thread with your thoughts/wishes.
## To consolidate what is already there
1. The user guide is already fairly good. I would not say the same about the documentation of the python bindings that show up when using the `help()` function of python. For many functions/classes, this documentation is very limited and unhelpful.
2. The library is becoming larger. We should have stricter rule when defining class methods that refers to the same concept. For instance, very often, we have a variable related to the input/feature dimensionality. Depending on the class, it might be obtained using a 'getDimD()', 'inputDims()', or whatsoever. I think this is very annoying from a user perspective. Same for the API of the trainers.
3. For many functions, we have two different kinds of python bindings: one kind which follows the C++ API (e.g. void f(const BA& input, BA& output)), and one which is more
'pythonic' (e.g. BA f(const BA& input)). These two bindings often share the same python function name. I don't think this is a good strategy, as these sometimes leads to impossible cases when the C++ API has many overloaded functions. I think the function name should reflect this difference. OpenCV strategy was roughly to used two different namespaces (cv for the C like functions, and cv2 for the pythonic one). We could do something slightly different such as appending the function name with something like '_c' or '_cc' to clearly highlight this fact.
4. ~~As previously discussed, the LBP code could be refactored in a single more generic but parametrized class.~~ (Already done by Manuel)
5. Learning from our previous experiences, I would be in favour of making mandatory a version number to any class that provides a 'save_to_hdf5' method.
6. ~~Add a tutorial for the Audio Processing module~~ (Already done by Elie)
7. Provide is_similar_to(const Object& b, const double epsilon=1e-8) functions for all C++ classes that have double members. The default "==" and "!=" operators are largely useless when we want to provide code that runs on several platforms (i.e., 32 and 64 bit machines).
8. ~~Whenever a C++ class uses some random initialization, make sure that you can seed this randomness.~~ (cf. issue #121 )
## To integrate new features
A. I would be nice that the combination of NumPy/SciPy/Bob roughly provides the same functionalities as the Matlab built-in functions. It would also be good to use similar function names such that it is easy for a user to move accross these platforms. A (too exhaustive) list can be found here: http://www.mathworks.ch/ch/help/matlab/functionlist.html
B. More Machine Learning algorithms:
B.1 In particular to add a Hidden Markov Model implementation, as there was one in late Torch 3.
B.2. To add a deformable and parts-based object recognition system (Felzenszwalb-like)
~~B.3 Integration of the i-vector framework~~ (Already done by Laurent)
C. More Audio Processing features:
C.1 Possibility to ~~compute and~~ plot spectrograms (Already done)
C.2 Audio codec to deal with wav and sphere files
C.3 Provide a bridge to HTK
C.4 Boosted Binary Features (cf. Anindya's thesis)
D. Image Processing Tools
~~D.1 GLCM features (Grey-Level Co-occurence Matrix)~~ (Already done by Ivana)
E. ~~Make compiling C++ bindings on satellite packages easier - we can move most of the functionality currently implemented on those packages like https://github/com/bioidiap/xbob.optflow.liu to the core of Bob.~~ (done by André)
F. New metrics: F1-score, precision and recall; cost versus training set size - This should be simple and an excellent coding exercise. If anyone wants to give it a go, please let me know. More info: http://en.wikipedia.org/wiki/F1_score
G. Better use of optimization library (L-BFGS-B) for NNet backprop implementation - this is a somewhat larger piece of work that goes around revisiting the NNet implementation to separate the optimizer from the trainer.
## To be more widely supported
1. ~~To make the Windows/Cygwin port functional~~ (cf. issue #82 )
2. To generate RPM like package:
* For Fedora: http://koji.fedoraproject.org/koji/
* For Suse: https://build.opensuse.org/
3. To generate a package for the TinyCore distribution: http://distro.ibiblio.org/tinycorelinux/downloads.html
In particular, it would be nice to automatize the process of generating a VirtualBox VDI with bob install, potentially based on the tiny linux distribution TinyCore.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/150Turn within-class and between-class scatter matrices computation into a 'publ...2016-08-04T09:31:41ZAndré AnjosTurn within-class and between-class scatter matrices computation into a 'public' feature*Created by: laurentes*
The computation of the within-class and between-class scatter matrices are currently done in two different classes: FisherLDATrainer and WCCNTrainer. To avoid this duplication of code, we should move this feature...*Created by: laurentes*
The computation of the within-class and between-class scatter matrices are currently done in two different classes: FisherLDATrainer and WCCNTrainer. To avoid this duplication of code, we should move this feature into the math module of bob.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/146bob-config.cmake2016-08-04T09:31:37ZAndré Anjosbob-config.cmake*Created by: neodark*
Hello,
It's a great idea to have the bob-config.cmake in the <install_folder>/lib/bob/bob-config.cmake
There's just a small bug at the generation of this 2 lines in the file (I have spotted the error with <This p...*Created by: neodark*
Hello,
It's a great idea to have the bob-config.cmake in the <install_folder>/lib/bob/bob-config.cmake
There's just a small bug at the generation of this 2 lines in the file (I have spotted the error with <This path is not correct>):
get_filename_component(bob_INCLUDE_DIRS "<This path is not correct>" ABSOLUTE)
get_filename_component(bob_LIBRARY_DIRS "<This path is not correct>" ABSOLUTE)
<This path is not correct> should be replaced by <path to installed bob folder>/include for the first line and <path to installed bob folder>/lib for the second line
Cheers,
Flaviov1.2https://gitlab.idiap.ch/bob/bob/-/issues/137BIC Tests failing after PCA fixes on #1302016-08-04T09:31:21ZAndré AnjosBIC Tests failing after PCA fixes on #130*Created by: anjos*
Manuel, could you please have a look at the BIC unit tests? They are failing since I patched up things for bug #130. We have took that opportunity to fix the number of PCA components generated by the trainer to be `m...*Created by: anjos*
Manuel, could you please have a look at the BIC unit tests? They are failing since I patched up things for bug #130. We have took that opportunity to fix the number of PCA components generated by the trainer to be `min(#features,#samples)-1`, whereas before, it was wrongly set to `min(#features,#samples)` with the last eigenvalue being always very close to zero. The outcome is that the machines produced by the PCA trainers are now inherently smaller and that is making the tests for BIC fail.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/135Sphinx Autosummary and Bob's documentation2016-08-04T09:31:17ZAndré AnjosSphinx Autosummary and Bob's documentation*Created by: anjos*
I'd propose we stop having our manuals with every submodule including all methods and classes in that submodule, and went more like the manuals for NumPy and SciPy (http://docs.scipy.org/doc/numpy/reference/routines....*Created by: anjos*
I'd propose we stop having our manuals with every submodule including all methods and classes in that submodule, and went more like the manuals for NumPy and SciPy (http://docs.scipy.org/doc/numpy/reference/routines.math.html) in which every method or class gets its own dedicated page, with an upfront summary. This can be done with the sphinx autosummary extension (http://sphinx-doc.org/latest/ext/autosummary.html).
That should make it easier to browse and reference to our documentation.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/133EMTrainer minimizes or maximizes? What?2016-08-04T09:31:13ZAndré AnjosEMTrainer minimizes or maximizes? What?*Created by: anjos*
There is a bug on the initialization of the EMTrainer. The printed value is initialized with a very small number whereas it should be initialized with a very big one. Otherwise, it gives out the impression that a pea...*Created by: anjos*
There is a bug on the initialization of the EMTrainer. The printed value is initialized with a very small number whereas it should be initialized with a very big one. Otherwise, it gives out the impression that a peak is reached and then the training continues. Example:
```
Bootstrapping Gaussian-Mixture Modelling with K-Means Clustering...
# EMTrainer:
# Iteration 1: -1.79769e+308 -> 1938.63
# Iteration 2: 1938.63 -> 1458.86
# Iteration 3: 1458.86 -> 986.196
```
To fix this, one just would have to initialize the first estimation with +Inf instead of -Inf.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/132KmeansTrainer never stops if it reaches 0 on the first iteration2016-08-04T09:31:11ZAndré AnjosKmeansTrainer never stops if it reaches 0 on the first iteration*Created by: anjos*
The following code sample (color image segmentation) shows the problem:
```python
import bob
import logging
image = bob.io.load('/idiap/resource/database/banca/english/images/en_video_sc1_1/1001_f_g1_s01_1001...*Created by: anjos*
The following code sample (color image segmentation) shows the problem:
```python
import bob
import logging
image = bob.io.load('/idiap/resource/database/banca/english/images/en_video_sc1_1/1001_f_g1_s01_1001_en_1.ppm')
image_flat = image.reshape(3, -1).transpose().copy()
logging.getLogger().setLevel(logging.INFO)
kmeans = bob.machine.KMeansMachine(3, 3)
ktrainer = bob.trainer.KMeansTrainer()
ktrainer.max_iterations = 1000
ktrainer.convergence_threshold = 1e-5
ktrainer.train(kmeans, image_flat)
```
This code will iterate 1000 times instead of stopping on the very first iteration as expected.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/131Bob misses a naive Fisher LDA implementation2016-08-04T09:31:09ZAndré AnjosBob misses a naive Fisher LDA implementation*Created by: anjos*
The current implementation of FisherLDA on Bob uses Lapack's `dsygv`, which is supposed to be more numerically stable than using `dsyevd` since it does not require the inversion of Sw. It can still fail in certain co...*Created by: anjos*
The current implementation of FisherLDA on Bob uses Lapack's `dsygv`, which is supposed to be more numerically stable than using `dsyevd` since it does not require the inversion of Sw. It can still fail in certain conditions. Another implementation that would still use `dsyevd` would be possible using the pseudo-inverse instead of the inverse of Sw and that could be more robust - but slower - in certain cases.
Lapack does not provide a pseudo-inverse function, but that should be easily implementable using QR factorization or SVD:
http://icl.cs.utk.edu/lapack-forum/archives/lapack/msg01395.htmlv1.2https://gitlab.idiap.ch/bob/bob/-/issues/129More flexibility on the MLP Machine Activation2016-08-04T09:31:05ZAndré AnjosMore flexibility on the MLP Machine Activation*Created by: anjos*
The current implementation requires that other users change the file `bob/machine/Activation.h` to insert a new enumeration type. A new design based on a base class and derived ones that allows people to instantiate ...*Created by: anjos*
The current implementation requires that other users change the file `bob/machine/Activation.h` to insert a new enumeration type. A new design based on a base class and derived ones that allows people to instantiate and pass their own activations to the MLP machine would be a bit more flexible and allow for more variants to be quickly implemented from Python.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/128MLP machine revamping for introducing new features2016-08-04T09:31:03ZAndré AnjosMLP machine revamping for introducing new features*Created by: laurentes*
We would like to revamp the MLP machine to have the following features:
- ~~Possibility to get the outputs of each layer (Both before and after applying the activation function)~~
- ~~Possibility to set a diffe...*Created by: laurentes*
We would like to revamp the MLP machine to have the following features:
- ~~Possibility to get the outputs of each layer (Both before and after applying the activation function)~~
- ~~Possibility to set a different activation function for the last/output layer (useful when using MLP for regression)~~
- ~~Possibility to do backward propagation directly using the machine (To avoid code duplication within the trainers)~~
~~In particular, this should simplify the definition of new trainers for this MLP machine, but will (slightly) increase the cost when processing data (Saving intermediate outputs from each layer).~~ (fixed by adding a base MLP trainer class)v1.2https://gitlab.idiap.ch/bob/bob/-/issues/124LBP implementation is overcomplicated2016-08-04T09:30:57ZAndré AnjosLBP implementation is overcomplicated*Created by: laurentes*
The current implementation of the LBP is made complicated by the use of the LBP abstract class. This class should likely be refactored and made parametrizable, to avoid the definition of the additional LBP4, LBP8...*Created by: laurentes*
The current implementation of the LBP is made complicated by the use of the LBP abstract class. This class should likely be refactored and made parametrizable, to avoid the definition of the additional LBP4, LBP8 and LBP16 classes that bring an extra layer of complexity and make the code much more difficult to maintain.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/123Make use of isEqual and isClose when defining comparison methods2016-08-04T09:30:54ZAndré AnjosMake use of isEqual and isClose when defining comparison methods*Created by: laurentes*
Recently, the function isEqual() and isClose() have been added in the core submodule of bob. They allow to compare blitz++ arrays (BA), vectors of BA and maps of BA, checking both that the dimensionality and the ...*Created by: laurentes*
Recently, the function isEqual() and isClose() have been added in the core submodule of bob. They allow to compare blitz++ arrays (BA), vectors of BA and maps of BA, checking both that the dimensionality and the content of the BA are the same (or similar). These functions can be used to implement the operator==() and is_similar_to() methods of many classes. This will make the code shorter and easier to maintain.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/121Random initialization of arrays is inconsistent2016-08-04T09:30:50ZAndré AnjosRandom initialization of arrays is inconsistent*Created by: laurentes*
At the C++ level, there are several options to generate random numbers. Across the library, this is currently not consistent: We sometimes rely on blitz++ ranlib, sometimes on boost.
In addition, few classes all...*Created by: laurentes*
At the C++ level, there are several options to generate random numbers. Across the library, this is currently not consistent: We sometimes rely on blitz++ ranlib, sometimes on boost.
In addition, few classes allow the user to set a boost random number generator, whereas others allow him to set a seed.
We have decided to follow this approach:
- Always use boost at the C++ level
- Classes that make use of random numbers should provide a way to set the boost random number generator.
We still have to discuss whether it is better to handle the boost random number generator through a reference or a boost::shared_ptr.
The goal is aim to converge to this design. This will involve:
- ~~Make the JFATrainer use boost rather than ranlib~~
- Remove the seed attribute from:
* ~~KMeansTrainer~~
* ~~PLDABaseTrainer~~ (done by @laurentes)
- Check if we keep using a reference (or a boost::shared_ptr) in the following classes
* MLP
* DataShuffler
Please be aware that this will slightly affect the results afterwards, as the initial random matrices will be differentv1.2https://gitlab.idiap.ch/bob/bob/-/issues/119Proper definition and usage of the abstract Trainer template class2016-08-04T09:30:47ZAndré AnjosProper definition and usage of the abstract Trainer template class*Created by: laurentes*
For the major release 1.2.0, I would be in favour of consolidating the abstract Trainer class. There are trainer class that do not inherit from it. In this case, inheritance might help us to uniformise the API.*Created by: laurentes*
For the major release 1.2.0, I would be in favour of consolidating the abstract Trainer class. There are trainer class that do not inherit from it. In this case, inheritance might help us to uniformise the API.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/118Proper definition of the abstract Machine template class2016-08-04T09:30:45ZAndré AnjosProper definition of the abstract Machine template class*Created by: laurentes*
For the major release 1.2.0, I would be in favour of clearly:
1. Defining what a machine is: To my mind, it is something that can be trained (like the term 'machine' in 'machine learning'), and that ouputs somet...*Created by: laurentes*
For the major release 1.2.0, I would be in favour of clearly:
1. Defining what a machine is: To my mind, it is something that can be trained (like the term 'machine' in 'machine learning'), and that ouputs something given some input.
2. Updating the current abstract class API. To my mind, a machine should have:
- forward's methods
- load/save methods
- copy constructor, assignment operator, and comparison operators (==, != and is_similar_to)
Once done, we should update the 'machine' module accordingly. In addition, for each template, the generic Machine can be bound into python, which will help us to have a more consistent API.v1.2https://gitlab.idiap.ch/bob/bob/-/issues/115Logistic Regression does not implement regularization2016-08-04T09:30:38ZAndré AnjosLogistic Regression does not implement regularization*Created by: anjos*
This is a must when you start using LLR seriously, so I'd +1 this feature request.
More info and implementation details on Sebastien's lectures: http://www.idiap.ch/~marcel/lectures/lectures/epfl2013/FSPR_lecture4...*Created by: anjos*
This is a must when you start using LLR seriously, so I'd +1 this feature request.
More info and implementation details on Sebastien's lectures: http://www.idiap.ch/~marcel/lectures/lectures/epfl2013/FSPR_lecture4.pdfv1.2https://gitlab.idiap.ch/bob/bob/-/issues/110IP bug for rgb_to_hsl: returns NaNs2016-08-04T09:30:30ZAndré AnjosIP bug for rgb_to_hsl: returns NaNs*Created by: csmccool*
The floating point implementations of "rgb_to_hsl" return NaNs if you pass an RGB array of ones (1.,1.,1.), however, the integer based methods don't seem to have this issue. Below are some examples of the problem ...*Created by: csmccool*
The floating point implementations of "rgb_to_hsl" return NaNs if you pass an RGB array of ones (1.,1.,1.), however, the integer based methods don't seem to have this issue. Below are some examples of the problem using the python interface:
```python
import bob;
import scipy;
bob.ip.rgb_to_hsl(scipy.array([[[1.]],[[1.]],[[1.]]])) # Using a scipy array or numpy array
```
RETURNS
```python
array([[[ nan]],
[[ nan]],
[[ 1.]]])
```
While
```python
bob.ip.rgb_to_hsl_f(1.,1.,1.)
```
RETURNS
```python
(nan, nan, 1.0)
```
The integer based methods seem to be ok as can be seen below:
```python
bob.ip.rgb_to_hsl_u8(255,255,255)
RETURNS
(0, 0, 255)
bob.ip.rgb_to_hsl_u16(65535,65535,65535)
RETURNS
(0, 0, 65535)
```
Cheers,
Chris.v1.2