README.rst 5.6 KB
Newer Older
1
========================================================================================
2
Generalized Boosting Framework using Stump and Look Up Table (LUT) based Weak Classifier
3
4
5
6
========================================================================================

The package implements a generalized boosting framework, which incorporates different boosting approaches.
The Boosting algorithms implemented in this package are
Rakesh MEHTA's avatar
Rakesh MEHTA committed
7

8
1) Gradient Boost (generalized version of Adaboost) for univariate cases
Rakesh MEHTA's avatar
Rakesh MEHTA committed
9
2) TaylorBoost for univariate and multivariate cases
Rakesh MEHTA's avatar
Rakesh MEHTA committed
10

11
The weak classifiers associated with these boosting algorithms are
Rakesh MEHTA's avatar
Rakesh MEHTA committed
12

13
1) Stump classifiers
Rakesh MEHTA's avatar
Rakesh MEHTA committed
14
2) LUT based classifiers
Rakesh MEHTA's avatar
Rakesh MEHTA committed
15

16
Check the following reference for the details:
Rakesh MEHTA's avatar
Rakesh MEHTA committed
17

18
1. Viola, Paul, and Michael J. Jones. "Robust real-time face detection." International journal of computer vision 57.2 (2004): 137-154.
19

20
2. Saberian, Mohammad J., Hamed Masnadi-Shirazi, and Nuno Vasconcelos. "Taylorboost: First and second-order boosting algorithms with explicit margin control." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
21

22
3. Cosmin Atanasoaei, "Multivariate Boosting with Look Up Table for face processing", PhD thesis (2012).
23

24
Installation:
Rakesh MEHTA's avatar
Rakesh MEHTA committed
25
----------
26

27
Once you have downloaded the package use the following two commands to install it:
28

29
  $ python bootstrap.py
30

31
  $ ./bin/buildout
32

33
These 2 commands should download and install all non-installed dependencies and get you a fully operational test and development environment.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
34

35

36
37
38
39
40
41
42
Example
-------
To show an exemplary usage of the boosting algorithm, the binary and multi-variate classification of hand-written digits from the MNIST database is performed.
For simplicity, we just use the pixel gray values as (discrete) features to classify the digits.
In each boosting round, a single pixel location is selected.
In case of the stump classifier, this pixel value is compared to a threshold (which is determined during training), and one of the two classes is assigned.
In case of the LUT, for each value of the pixel the most probable digit is determined.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
43

44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
The script ``./bin/boosting_example.py`` is provided to perform all different examples.
This script has several command line parameters, which vary the behavior of the training and/or testing procedure.
All parameters have a long value (starting with ``--``) and a shotcut (starting with a single ``-``).
These parameters are (see also ``./bin/boosting_example.py --help``):

To control the type of training, you can select:

* ``--trainer-type``: Select the type of weak classifier. Possible values are ``stump`` and ``lut``
* ``--loss-type``: Select the loss function. Possible values are ``tan``, ``log`` and ``exp``. By default, a loss function suitable to the trainer type is selected.
* ``--number-of-boosting-rounds``: The number of weak classifiers to select.
* ``--multi-variate`` (only valid for LUT trainer): Perform multi-vatriate classification, or binary (one-to-one) classification.
* ``--feature-selection-style`` (only valid for multi-variate training): Select the feature for each output ``independent``ly or ``shared``?

To control the experimentation, you can choose:

* ``--digits``: The digits to classify. For multi-variate training, one classifier is trained for all given digits, while for uni-variate training all possible one-to-one classifiers are trained.
* ``--all``: Select all 10 digits.
* ``--classifier-file``: Save the trained classifier(s) into the given file and/or read the classifier(s) from this file.
* ``--force``: Overwrite the given classifier file if it already exists.

For information and debugging purposes, it might be interesting to use:

* ``--verbose`` (can be used several times): Increases the verbosity level from 0 (error) over 1 (warning) and 2 (info) to 3 (debug). Verbosity level 2 (``-vv``) is recommended.
* ``number-of-elements``: Reduce the number of elements per class (digit) to the given value.

Four different kinds of experimentations can be performed:

1. Uni-variate classification using the stump trainer:

  $ ./bin/boosting_example.py -vv --trainer-type stump --digits 5 6 --classifier-file stump.hdf5

2. Uni-variate classification using the LUT trainer:

  $ ./bin/boosting_example.py -vv --trainer-type lut --digits 5 6 --classifier-file lut_uni.hdf5

3. Multi-variate classification using LUT training and shared features.

  $ ./bin/boosting_example.py -vv --trainer-type lut --all-digits ----classifier-file lut_shared.hdf5

4. Multi-variate classification using LUT training and independent features.

  $ ./bin/boosting_example.py -vv --trainer-type lut --all-digits --classifier-file lut_shared.hdf5
86
87
88
89
90


User Guide
----------

Rakesh MEHTA's avatar
Rakesh MEHTA committed
91
This section explains how to use the package in order to: a) test the MNIST dataset for binary classification
92
93
b) test the dataset for multi class classification.

94
a) The following command will run a single binary test for the digits specified and display the classification
95
96
accuracy on the console:

97
  $ ./bin/mnist_binary_one.py
98
99
100

if you want to see all the option associated with the command type:

Rakesh MEHTA's avatar
Rakesh MEHTA committed
101
  $ ./bin/mnist_binary_one.py -h
102
103
104

To run the tests for all the combination of of ten digits use the following command:

105
  $ ./bin/mnist_binary_all.py
106

107
This command tests all the possible calumniation of digits which results in 45 different binary tests. The
108
109
accuracy of individual tests and the final average accuracy of all the tests is displayed on the console.

Rakesh MEHTA's avatar
Rakesh MEHTA committed
110
b) The following command can be used for the multivariate digits test:
111

112
  $ ./bin/mnist_multi.py
113

114
115
Because of large number of samples and multivariate problem it requires times in days on a normal system. Use -h
option to see different option available with this command.
Rakesh MEHTA's avatar
Rakesh MEHTA committed
116
117