The package implements a generalized boosting framework, which incorporates different boosting approaches.
The Boosting algorithms implemented in this package are
1) Gradient Boost (generalized version of Adaboost) for univariate cases
2) TaylorBoost for univariate and multivariate cases
The weak classifiers associated with these boosting algorithms are
The weak classifiers associated with these boosting algorithms are
1) Stump classifiers
2) LUT based classifiers
Check the following reference for the details:
Check the following reference for the details:
1. Viola, Paul, and Michael J. Jones. "Robust real-time face detection."
International journal of computer vision 57.2 (2004): 137-154.
1. Viola, Paul, and Michael J. Jones. "Robust real-time face detection." International journal of computer vision 57.2 (2004): 137-154.
2. Saberian, Mohammad J., Hamed Masnadi-Shirazi, and Nuno Vasconcelos. "Taylorboost:
First and second-order boosting algorithms with explicit margin control." Computer
Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
2. Saberian, Mohammad J., Hamed Masnadi-Shirazi, and Nuno Vasconcelos. "Taylorboost: First and second-order boosting algorithms with explicit margin control." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
3. Cosmin Atanasoaei, "Multivariate Boosting with Look Up Table for face processing",
PhD thesis (2012).
3. Cosmin Atanasoaei, "Multivariate Boosting with Look Up Table for face processing", PhD thesis (2012).
Testdata:
Installation:
----------
The test are performed on the MNIST digits dataset. The tests can be mainly divided into
two categories:
Once you have downloaded the package use the following two commands to install it:
1) Univariate Test: It corresponds to binary classification problem. The digits are tested
one-vs-one and one-vs-all. Both the boosting algorithm (Gradient Boost and Taylor boost)
can be used for testing this scenario.
$ python bootstrap.py
2) Multivariate Test: It is the multi class classification problem. All the 10 digit classification
is considered in a single test. Only Multivariate Taylor boosting can be used for testing this scenario.
$ ./bin/buildout
Installation:
----------
These 2 commands should download and install all non-installed dependencies and get you a fully operational test and development environment.
Once you have downloaded the package use the following two commands to install it:
$ python bootstrap.py
Example
-------
To show an exemplary usage of the boosting algorithm, the binary and multi-variate classification of hand-written digits from the MNIST database is performed.
For simplicity, we just use the pixel gray values as (discrete) features to classify the digits.
In each boosting round, a single pixel location is selected.
In case of the stump classifier, this pixel value is compared to a threshold (which is determined during training), and one of the two classes is assigned.
In case of the LUT, for each value of the pixel the most probable digit is determined.
$ ./bin/buildout
The script ``./bin/boosting_example.py`` is provided to perform all different examples.
This script has several command line parameters, which vary the behavior of the training and/or testing procedure.
All parameters have a long value (starting with ``--``) and a shotcut (starting with a single ``-``).
These parameters are (see also ``./bin/boosting_example.py --help``):
To control the type of training, you can select:
* ``--trainer-type``: Select the type of weak classifier. Possible values are ``stump`` and ``lut``
* ``--loss-type``: Select the loss function. Possible values are ``tan``, ``log`` and ``exp``. By default, a loss function suitable to the trainer type is selected.
* ``--number-of-boosting-rounds``: The number of weak classifiers to select.
* ``--multi-variate`` (only valid for LUT trainer): Perform multi-vatriate classification, or binary (one-to-one) classification.
* ``--feature-selection-style`` (only valid for multi-variate training): Select the feature for each output ``independent``ly or ``shared``?
To control the experimentation, you can choose:
* ``--digits``: The digits to classify. For multi-variate training, one classifier is trained for all given digits, while for uni-variate training all possible one-to-one classifiers are trained.
* ``--all``: Select all 10 digits.
* ``--classifier-file``: Save the trained classifier(s) into the given file and/or read the classifier(s) from this file.
* ``--force``: Overwrite the given classifier file if it already exists.
For information and debugging purposes, it might be interesting to use:
* ``--verbose`` (can be used several times): Increases the verbosity level from 0 (error) over 1 (warning) and 2 (info) to 3 (debug). Verbosity level 2 (``-vv``) is recommended.
* ``number-of-elements``: Reduce the number of elements per class (digit) to the given value.
Four different kinds of experimentations can be performed:
1. Uni-variate classification using the stump trainer:
parser.add_argument('-t','--trainer-type',default='stump',choices=TRAINER.keys(),help="The type of weak trainer used for boosting.")
parser.add_argument('-l','--loss-type',default='exp',choices=LOSS.keys(),help="The type of loss function used in boosting to compute the weights for the weak classifiers.")
parser.add_argument('-r','--number-of-boosting-rounds',type=int,default=20,help="The number of boosting rounds, i.e., the number of weak classifiers.")
parser.add_argument('-l','--loss-type',choices=LOSS.keys(),help="The type of loss function used in boosting to compute the weights for the weak classifiers.")
parser.add_argument('-r','--number-of-boosting-rounds',type=int,default=100,help="The number of boosting rounds, i.e., the number of weak classifiers.")
parser.add_argument('-s','--feature-selection-style',default='independent',choices={'indepenent','shared'},help="The feature selection style (only for multivariate classification with the LUT trainer).")
parser.add_argument('-d','--digits',type=int,nargs="+",choices=range(10),default=[5,6],help="Select the digits you want to compare.")
parser.add_argument('-a','--all-digits',action='store_true',help="Use all digits")
parser.add_argument('-n','--number-of-elements',type=int,help="For testing purposes: limit the number of training and test examples for each class.")
parser.add_argument('-c','--classifier-file',help="If selected, the strong classifier will be stored in this file (or loaded from it if it already exists).")
returnparser.parse_args()
parser.add_argument('-F','--force',action='store_true',help="Re-train the strong classifier, even if the --classifier-file already exists.")
parser.add_argument('-v','--verbose',action='count',default=0,help="Increase the verbosity level (up too three times)")
"""The test script to perform the binary classification on the digits from the MNIST dataset.
The MNIST data is exported using the xbob.db.mnist module which provide the train and test
The MNIST data is exported using the xbob.db.mnist module which provide the train and test
partitions for the digits. Pixel values of grey scale images are used as features and the
available algorithms for classification are Lut based Boosting and Stump based Boosting.
The script test all the possible combination of the two digits which results in 45 different
binary classfication tests.
The script test all the possible combination of the two digits which results in 45 different
binary classfication tests.
"""
...
...
@@ -23,7 +23,7 @@ def main():
parser=argparse.ArgumentParser(description=" The arguments for the boosting. ")
parser.add_argument('-t',default='StumpTrainer',dest="trainer_type",type=string,choices={'StumpTrainer','LutTrainer'},help="This is the type of trainer used for the boosting.")
parser.add_argument('-r',default=20,dest="num_rnds",type=string,help="The number of round for the boosting")
parser.add_argument('-l',default='exp',dest="loss_type",type=string,choices={'log','exp'}help="The type of the loss function. Logit and Exponential functions are the avaliable options")
parser.add_argument('-l',default='exp',dest="loss_type",type=string,choices={'log','exp'},help="The type of the loss function. Logit and Exponential functions are the avaliable options")
parser.add_argument('-s',default='indep',dest="selection_type",choices={'indep','shared'},type=string,help="The feature selection type for the LUT based trainer. For multivarite case the features can be selected by sharing or independently ")
parser.add_argument('-n',default=256,dest="num_entries",type=int,help="The number of entries in the LookUp table. It is the range of the feature values, e.g. if LBP features are used this values is 256.")
...
...
@@ -60,13 +60,13 @@ def main():
boost_trainer=booster.Boost(args.trainer_type)
# Set the parameters for the boosting
boost_trainer.num_rnds=args.num_rnds
boost_trainer.loss_type=args.loss_type
boost_trainer.num_rnds=args.num_rnds
boost_trainer.loss_type=args.loss_type
boost_trainer.selection_type=args.selection_type
boost_trainer.num_entries=args.num_entries
# Perform boosting of the feature set samp
# Perform boosting of the feature set samp
model=boost_trainer.train(fea_train,label_train)
# Classify the test samples (testsamp) using the boosited classifier generated above