README.rst 8.18 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
.. vim: set fileencoding=utf-8 :
.. Manuel Guenther <manuel.guenther@idiap.ch>
.. Thu Sep  4 10:53:22 CEST 2014

.. image:: https://travis-ci.org/bioidiap/bob.learn.boosting.svg?branch=master
   :target: https://travis-ci.org/bioidiap/bob.learn.boosting
.. image:: http://img.shields.io/badge/docs-latest-orange.png
   :target: https://www.idiap.ch/software/bob/docs/latest/bioidiap/bob.learn.boosting/master/index.html
.. image:: https://coveralls.io/repos/bioidiap/bob.learn.boosting/badge.png
   :target: https://coveralls.io/r/bioidiap/bob.learn.boosting
.. image:: http://img.shields.io/github/tag/bioidiap/bob.learn.boosting.png
   :target: https://github.com/bioidiap/bob.learn.boosting
.. image:: http://img.shields.io/pypi/v/bob.learn.boosting.png
   :target: https://pypi.python.org/pypi/bob.learn.boosting
.. image:: http://img.shields.io/pypi/dm/bob.learn.boosting.png
   :target: https://pypi.python.org/pypi/bob.learn.boosting

==========================================================================================
 Generalized Boosting Framework using Stump and Look Up Table (LUT) based Weak Classifier
==========================================================================================
21
22

The package implements a generalized boosting framework, which incorporates different boosting approaches.
23
The Boosting algorithms implemented in this package are:
Rakesh MEHTA's avatar
Rakesh MEHTA committed
24

25
26
1) Gradient Boost [Fri00]_ (generalized version of Adaboost [FS99]_) for univariate cases using stump decision classifiers, as in [VJ04]_.
2) TaylorBoost [SMV11]_ for univariate and multivariate cases using Look-Up-Table based classifiers [Ata12]_
Rakesh MEHTA's avatar
Rakesh MEHTA committed
27

28
29
.. [Fri00]      *Jerome H. Friedman*. **Greedy function approximation: a gradient boosting machine**. Annals of Statistics, 29:1189--1232, 2000.
.. [FS99]       *Yoav Freund and Robert E. Schapire*. **A short introduction to boosting**. Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
30

31
32
33
.. [VJ04]       *Paul Viola and Michael J. Jones*. **Robust real-time face detection**. International Journal of Computer Vision (IJCV), 57(2): 137--154, 2004.
.. [SMV11]      *Mohammad J. Saberian, Hamed Masnadi-Shirazi, Nuno Vasconcelos*. **TaylorBoost: First and second-order boosting algorithms with explicit margin control**. IEEE Conference on Conference on Computer Vision and Pattern Recognition (CVPR), 2929--2934, 2011.
.. [Ata12]      *Cosmin Atanasoaei*. **Multivariate boosting with look-up tables for face processing**. PhD Thesis, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, 2012.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
34

35
36
Installation:
-------------
Rakesh MEHTA's avatar
Rakesh MEHTA committed
37

38
39
Bob
...
40

41
42
The boosting framework is dependent on the open source signal-processing and machine learning toolbox Bob_, which you need to download from its web page.
For more information, please read Bob's `installation instructions <https://github.com/idiap/bob/wiki/Packages>`_.
43

44
45
46
47
This package
............
The most simple way to download the latest stable version of the package is to use the Download button above and extract the archive into a directory of your choice.
If y want, you can also check out the latest development branch of this package using::
48

49
  $ git clone https://github.com/bioidiap/bob.learn.boosting.git
50

51
Afterwards, please open a terminal in this directory and call::
52

53
54
  $ python bootstrap.py
  $ ./bin/buildout
55

56
These 2 commands should download and install all dependencies and get you a fully operational test and development environment.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
57

58

59
60
Example
-------
61
62

To show an exemplary usage of the boosting algorithm, binary and multi-variate classification of hand-written digits from the MNIST database is performed.
63
64
65
For simplicity, we just use the pixel gray values as (discrete) features to classify the digits.
In each boosting round, a single pixel location is selected.
In case of the stump classifier, this pixel value is compared to a threshold (which is determined during training), and one of the two classes is assigned.
66
67
The LUT weak classifier selects a feature (i.e., a pixel location in the images) and determines the most probable digit for each pixel value.
Finally, the strong classifier combines several weak classifiers by a weighted sum of their predictions.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
68

69
70
The script ``./bin/boosting_example.py`` is provided to perform all different examples.
This script has several command line parameters, which vary the behavior of the training and/or testing procedure.
71
All parameters have a long value (starting with ``--``) and a shortcut (starting with a single ``-``).
72
73
74
75
76
77
78
These parameters are (see also ``./bin/boosting_example.py --help``):

To control the type of training, you can select:

* ``--trainer-type``: Select the type of weak classifier. Possible values are ``stump`` and ``lut``
* ``--loss-type``: Select the loss function. Possible values are ``tan``, ``log`` and ``exp``. By default, a loss function suitable to the trainer type is selected.
* ``--number-of-boosting-rounds``: The number of weak classifiers to select.
79
* ``--multi-variate`` (only valid for LUT trainer): Perform multi-variate classification, or binary (one-to-one) classification.
80
81
82
83
84
85
86
87
88
89
90
91
* ``--feature-selection-style`` (only valid for multi-variate training): Select the feature for each output ``independent``ly or ``shared``?

To control the experimentation, you can choose:

* ``--digits``: The digits to classify. For multi-variate training, one classifier is trained for all given digits, while for uni-variate training all possible one-to-one classifiers are trained.
* ``--all``: Select all 10 digits.
* ``--classifier-file``: Save the trained classifier(s) into the given file and/or read the classifier(s) from this file.
* ``--force``: Overwrite the given classifier file if it already exists.

For information and debugging purposes, it might be interesting to use:

* ``--verbose`` (can be used several times): Increases the verbosity level from 0 (error) over 1 (warning) and 2 (info) to 3 (debug). Verbosity level 2 (``-vv``) is recommended.
92
* ``--number-of-elements``: Reduce the number of elements per class (digit) to the given value.
93

94
Four different kinds of experiments can be performed:
95

96
1. Uni-variate classification using the stump classifier, classifying digits 5 and 6::
97

98
    $ ./bin/boosting_example.py -vv --trainer-type stump --digits 5 6
99

100
2. Uni-variate classification using the LUT classifier, classifying digits 5 and 6::
101

102
    $ ./bin/boosting_example.py -vv --trainer-type lut --digits 5 6
103

104
3. Multi-variate classification using LUT classifier and shared features, classifying all 10 digits::
105

106
    $ ./bin/boosting_example.py -vv --trainer-type lut --all-digits --multi-variate --feature-selection-style shared
107

108
4. Multi-variate classification using LUT classifier and independent features, classifying all 10 digits::
109

110
    $ ./bin/boosting_example.py -vv --trainer-type lut --all-digits --multi-variate --feature-selection-style independent
111
112


113
114
115
.. note:
  During the execution of the experiments, the warning message "L-BFGS returned warning '2': ABNORMAL_TERMINATION_IN_LNSRCH" might appear.
  This warning message is normal and does not influence the results much.
116

117
118
119
.. note:
  For experiment 1, the training terminates after 75 of 100 rounds since the computed weight for the weak classifier of that round is vanishing.
  Hence, performing more boosting rounds will not change the strong classifier any more.
120

121
122
All experiments should be able to run using several minutes of execution time.
The results of the above experiments should be the following (split in the remaining classification error on the training set, and the error on the test set)
123

124
125
126
127
128
129
130
131
132
133
134
+------------+----------+----------+
| Experiment | Training |   Test   |
+------------+----------+----------+
|   1        |  91.04 % |  92.05 % |
+------------+----------+----------+
|   2        |  100.0 % |  95.35 % |
+------------+----------+----------+
|   3        |  97.59 % |  83.47 % |
+------------+----------+----------+
|   4        |  99.04 % |  86.25 % |
+------------+----------+----------+
135

136
Of course, you can try out different combinations of digits for experiments 1 and 2.
137
138


139
140
Getting Help
------------
Rakesh MEHTA's avatar
Rakesh MEHTA committed
141

142
In case you experience problems with the code, or with downloading the required databases and/or software, please contact manuel.guenther@idiap.ch or file a bug report under https://github.com/bioidiap/bob.learn.boosting.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
143

144
.. _bob: http://www.idiap.ch/software/bob