WIP: added LDA and MLP
Hey guys (@amohammadi @onikisins @pkorshunov @ageorge @dgeissbuhler @andre.anjos @sbhatta)
I just added two more classifiers in bob.pad.base.algorithm
:
- MLP, which relies on
bob.learn.mlp
- LDA, which is a derived class of
bob.bio.base.algorithm.LDA
I made them as simple as possible ... there are still stuff missing (i.e. docstrings), but I think it may serve as a nice basis to discuss how algorithms in this package should ideally be implemented.
Let me know what you think !
Merge request reports
Activity
- bob/pad/base/algorithm/MLP.py 0 → 100644
42 label_real = numpy.ones((len(training_features[0]), 1), dtype='float64') 43 label_attack = numpy.zeros((len(training_features[1]), 1), dtype='float64') 44 45 real = numpy.array(training_features[0]) 46 attack = numpy.array(training_features[1]) 47 X = numpy.vstack([real, attack]) 48 Y = numpy.vstack([label_real, label_attack]) 49 50 51 # The machine 52 input_dim = real.shape[1] 53 shape = [] 54 shape.append(input_dim) 55 for i in range(len(self.hidden_units)): 56 shape.append(self.hidden_units[i]) 57 shape.append(1) I normally do two units in the output layer for PAD. It goes nicely with tensorflow toolkits. I think making this like mine would be a little easier to share tools. For example see f947066f
changed this line in version 3 of the diff
- bob/pad/base/algorithm/MLP.py 0 → 100644
20 requires_projector_training=True, 21 **kwargs) 22 23 self.hidden_units = hidden_units 24 self.max_iter = max_iter 25 self.mlp = None 26 27 28 def train_projector(self, training_features, projector_file): 29 """ 30 Trains the MLP 31 32 **Parameters** 33 34 training_features: 35 """ We do numpy style docs for a while. It's both easier to write and read IMHO. See: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy
keeping it like this is ok too but I was hopping we gradually move to numpy style docs.
- bob/pad/base/algorithm/MLP.py 0 → 100644
61 self.mlp.output_activation = bob.learn.activation.Logistic() 62 self.mlp.randomize() 63 64 # The trainer 65 trainer = bob.learn.mlp.BackProp(batch_size, bob.learn.mlp.CrossEntropyLoss(self.mlp.output_activation), self.mlp, train_biases=True) 66 67 n_iter = 0 68 previous_cost = 0 69 current_cost = 1 70 precision = 0.001 71 while (n_iter < self.max_iter) or (abs(previous_cost - current_cost) < precision): 72 previous_cost = current_cost 73 trainer.train(self.mlp, X, Y) 74 current_cost = trainer.cost(self.mlp, X, Y) 75 n_iter += 1 76 print("Iteration {} -> cost = {} (previous = {})".format(n_iter, trainer.cost(self.mlp, X, Y), previous_cost)) changed this line in version 3 of the diff
- bob/pad/base/algorithm/MLP.py 0 → 100644
13 14 """ 15 16 def __init__(self, hidden_units=(10, 10), max_iter=1000, **kwargs): 17 18 Algorithm.__init__(self, 19 performs_projection=True, 20 requires_projector_training=True, 21 **kwargs) 22 23 self.hidden_units = hidden_units 24 self.max_iter = max_iter 25 self.mlp = None 26 27 28 def train_projector(self, training_features, projector_file): - bob/pad/base/algorithm/PadLDA.py 0 → 100644
1 #!/usr/bin/env python 2 # vim: set fileencoding=utf-8 : 3 4 import numpy 5 from bob.bio.base.algorithm import LDA 6 7 class PadLDA(LDA): 8 """ 9 This class is a wrapper for bob.bio.base.algorithm.LDA, 10 to be used in a PAD context. 11 12 **Parameters** 13 14 """ @heusch to fix the ci, please go through our guide on https://gitlab.idiap.ch/bob/bob.admin/tree/master/templates#15-conda-recipe
This looks good thank you. Although I think you need to add to our API summary in the docs and also add tests.
Thanks for the feedback @amohammadi !
I'll address all this when I'll have some more time. And don't worry: I'm well aware of some of the stuff you mentioned, but as stated, it is Work In Progress ;)
@heusch , thank you for adding another classifier! Here are a couple of comments from me.
-
Does MLP classifier supports FrameContainers? Would be very useful in bob.pad.face
-
This is a matter of personal preferences, but I would consider splitting the
train_projector
method of MLP into functions. Nice citation from my point of view: Whenever you can clearly separate tasks within a computation, you should do so. It would help us better reuse useful bits of your code and overwrite if necessary. The splitting I can see: 1. data preparation/management 2. data normalization 3. actual training 4. save the machine/normalization parameters. -
Minor comment, I would suggest to make this
precision = 0.001
an argument, rather than a hard-coded constant. -
Again a matter of preferences, I would suggest to use "4 spaces per indentation level", since most people in the group are doing so, thus easier for others to edit.
Hopefully you will find some stuff useful. Thank you!
-
Thanks for the feedback @onikisins
Here are the answers to the points you raised:
-
MLP and LDA do not support FrameContainers at the moment, I will add the same mechanism that @pkorshunov did in other algorithms, using helper functions.
-
I don't agree on the function breaking stuff ... Data preparation / normalization is data/task dependent, and therefore should not, in my opinion, be part of the classifier. The way I see it, the classifier should receive data in a format as generic as possible (i.e.
numpy.array
), train a machine, save it and that's it. That being said, I think that specific handling - if needed - should be implemented in a derived class. I'm not a design expert, but it sounds more logical to me. Maybe @amohammadi could provide some guidance on this. -
Sure, precision will be an argument in the future
-
I personnaly prefer 2 spaces ;)
-
@heusch, I think it would be best to see other files in this package and use the same number of spacing. It would be inconsistent otherwise.
added 5 commits
- 287abb39 - [algorithm] fixed precision criterion to stop training
- 8b6edd8e - [utils] fixed helper function, to avoid dividing by zero
- f1f528e4 - [algorithm] added two output units to MLP binary classifier
- c9b2ecc3 - [algorithm] fixed the import to convert and prepare feature, and the single score for one sequence
- e85eb10b - [algorithm] added my simple implementation of One Class SVM
Toggle commit listmentioned in merge request !50 (merged)
Hi all,
Since this branch was way behind the master when I got back to il, I created a new one, and I plan to add algorithms and unit tests in a more principled way (i.e. making sure that everything is working at each stage).
Actually, I worked on this branch in a hurry a while ago, and although there is still useful stuff, it needs some more work. Anyway, I close this MR and will eventually delete this branch.
The new branch is here: https://gitlab.idiap.ch/bob/bob.pad.base/tree/add-new-classifiers
and the corresponding MR: !50 (merged)
@amohammadi Don't worry, I took the remarks you made here into account when working on the new branch
@onikisins Same remark applies to you, but I'm still questioning the use of FrameContainers at this stage of the toolchain ... I'll open an issue to discuss that.