Commit 20c614bb authored by Rakesh MEHTA's avatar Rakesh MEHTA
Browse files

changes in test and commenting the files

parent 3be49ab2
Example buildout environment
============================
=============================================================================
Generalized Boosting Framework using Stump and Look Up Table (LUT) based Weak Classifier
=============================================================================
The package implements a generalized boosting framework which incorporate different
boosting approaches. The Boosting algorithms implemented in this package are
This simple example demonstrates how to wrap Bob-based scripts on buildout
environments. This may be useful for homework assignments, tests or as a way to
distribute code to reproduce your publication. In summary, if you need to give
out code to others, we recommend you do it following this template so your code
can be tested, documented and run in an orderly fashion.
1) Gradient Boost (generalized version of Adaboost) for univariate cases
2) TaylorBoost for univariante and multivariate cases
Installation
------------
The weak classfiers associated with these boosting algorithms are
.. note::
1) Stump classifiers
2) LUT based classfiers
To follow these instructions locally you will need a local copy of this
package. For that, you can use the github tarball API to download the package::
Check the following reference for the details:
$ wget --no-check-certificate https://github.com/idiap/bob.project.example/tarball/master -O- | tar xz
$ mv idiap-bob.project* bob.project.example
1) Viola, Paul, and Michael J. Jones. "Robust real-time face detection."
International journal of computer vision 57.2 (2004): 137-154.
2) Saberian, Mohammad J., Hamed Masnadi-Shirazi, and Nuno Vasconcelos. "Taylorboost:
First and second-order boosting algorithms with explicit margin control." Computer
Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
3) Cosmin Atanasoaei, "Multivariate Boosting with Look Up Table for face processing",
PhD thesis (2012).
Testdata:
The test are performed on the MNIST digits dataset. The tests can be mainly divided into
two categories:
1) Univariate Test: It corresponds to binary classification problem. The digits are tested
one-vs-one and one-vs-all. Both the boosting algorithm (Gradient Boost and Taylor boost)
can be used for testing this scenario.
2) Multivariate Test: It is the multi class classification problem. All the 10 digit classficaiton
is considered in a single test. Only Multivariate Taylor boosting can be used for testing this scenario.
Installation:
Once you have downloaded the package use the following two commands to install it:
$ python bootstrap.py
$ ./bin/buildout
These 2 commands should download and install all non-installed dependencies and
get you a fully operational test and development environment.
User Guide
----------
This section explains how to use the package in order to: a) test the MNIST dataset for binary clssification
b) test the dataset for multi class classification.
a) The following command will run a single binary test for the digits specified and display the classifcation
accuracy on the console:
$ ./bin/mnist_binary_one.py
if you want to see all the option associated with the command type:
$ ./bin/mnist_binary_one.py -h
To run the tests for all the combination of of ten digits use the following command:
$ ./bin/mnist_binary_all.py
This command tests all the possible comniation of digits which results in 45 different binary tests. The
accuracy of individual tests and the final average accuracy of all the tests is displayed on the console.
b) The following command can be used for the multivarite digits test:
$ ./bin/mnist_multi.py
Because of large number of samples and multivariate problem it requires times in days on a normal system. Use -h
option to see different option available with this command.
Documentation and Further Information
-------------------------------------
Please refer to the latest Bob user guide, accessing from the `Bob website
<http://idiap.github.com/bob/>`_ for how to create your own packages based on
this example. In particular, the Section entitled `Organize Your Work in
Satellite Packages <http://www.idiap.ch/software/bob/docs/releases/last/sphinx/html/OrganizeYourCode.html>`_
contains details on how to setup, build and roll out your code.
......@@ -91,6 +91,7 @@ setup(
'mnist_binary_one.py = xbob.boosting.scripts.mnist_binary_one:main',
'mnist_multi.py = xbob.boosting.scripts.mnist_multi:main',
'mnist_lbp.py = xbob.boosting.scripts.mnist_lbp:main',
'mnist_multi_lbp.py = xbob.boosting.scripts.mnist_multi_lbp:main',
],
# tests that are _exported_ (that can be executed by other packages) can
......
......@@ -145,8 +145,8 @@ class Boost:
loss_ = loss_class()
# For lut trainer the features should be integers
if(self.weak_trainer_type == 'LutTrainer'):
fset = fset.astype(int)
#if(self.weak_trainer_type == 'LutTrainer'):
# fset = fset.astype(int)
# Start boosting iterations for num_rnds rounds
for r in range(self.num_rnds):
......
......@@ -12,8 +12,6 @@
import numpy
import math
from scipy import optimize
class StumpTrainer():
......@@ -33,7 +31,9 @@ class StumpTrainer():
def compute_weak_trainer(self, fea, loss_grad):
"""The function computes the weak stump trainer. It is called at each boosting round.
""" The function to compute weak Stump trainer.
The function computes the weak stump trainer. It is called at each boosting round.
The best weak stump trainer is chosen to maximize the dot product of the outputs
and the weights (gain). The weights in the Adaboost are the negative of the loss gradient
for exponential loss.
......@@ -69,7 +69,9 @@ class StumpTrainer():
def compute_thresh(self, fea ,loss_grad):
""" Function to compute the threshold for a single feature. The threshold is computed for
""" Function computes the stump classifier (threshold) for a single feature
Function to compute the threshold for a single feature. The threshold is computed for
the given feature values using the weak learner algorithm given in the Voila Jones Robust Face classification
Inputs:
......@@ -80,7 +82,7 @@ class StumpTrainer():
Return: weak stump classifier for given feature
threshold: threshold which minimizes the error
threshold: threshold that minimizes the error
polarity: the polarity or the direction used for stump classification
gain: gain of the classifier"""
......@@ -104,7 +106,7 @@ class StumpTrainer():
gain_max = numpy.absolute(gain[opt_id])
# Find the corresponding threshold value
th = 0.0
threshold = 0.0
if(opt_id == num_samp-1):
threshold = fea[opt_id]
else:
......@@ -122,7 +124,9 @@ class StumpTrainer():
def get_weak_scores(self,test_features):
""" The function computes the classification scores for the test features using
""" The function to perform classification using a weak stump classifier.
The function computes the classification scores for the test features using
a weak stump trainer. Since we use the stump classifier the classification
scores are either +1 or -1.
Input: self: a weak stump trainer
......@@ -154,9 +158,27 @@ class LutTrainer():
def __init__(self, num_entries, selection_type, num_op):
""" Function to initilize the weak LutTrainer. Each weak Luttrainer is specified with a
""" Function to initilize the parameters.
Function to initilize the weak LutTrainer. Each weak Luttrainer is specified with a
LookUp Table and the feature index which corresponds to the feature on which the
current classifier has to applied. """
current classifier has to applied.
Inputs:
self:
num_entries: The number of entries for the LUT
type: int
selection_type: The feature selection can be either independent or shared. For independent
case the loss function is separately considered for each of the output. For
shared selection type the sum of the loss function is taken over the outputs
and a single feature is used for all the outputs. See Cosmin's thesis for more details.
Type: string {'indep', 'shared'}
num_op: The number of outputs for the classification task.
type: Interger
"""
self.num_entries = num_entries
self.luts = numpy.ones((num_entries, num_op), dtype = numpy.int)
self.selection_type = selection_type
......@@ -167,7 +189,23 @@ class LutTrainer():
def compute_weak_trainer(self, fea, loss_grad):
""" The function to learn the weak LutTrainer. """
""" The function to learn the weak LutTrainer.
The function searches for a features index that minimizes the the sum of the loss gradient and computes
the LUT corresponding to that feature index.
Inputs:
self: empty trainer object to be trained
fea: The training features samples
type: integer numpy array (#number of samples x number of features)
loss_grad: The loss gradient values for the training samples
type: numpy array (#number of samples)
Return:
self: a trained LUT trainer
"""
# Initializations
num_op = loss_grad.shape[1]
......@@ -215,13 +253,18 @@ class LutTrainer():
def compute_fgrad(self, loss_grad, fea):
""" The function computes the loss for whole set of features. The loss refers to the sum of the loss gradient
""" The function to compute the loss gradient for all the features.
The function computes the loss for whole set of features. The loss refers to the sum of the loss gradient
of the features which have the same values.
Inputs: loss_grad: The loss gradient for the features. No. of samples x No. of outputs
fea: set of features. No. of samples x No. of features
Output: sum_loss: the loss values for all features. No. of samples x No. of outputs"""
Inputs:
loss_grad: The loss gradient for the features. No. of samples x No. of outputs
Type: float numpy array
fea: set of features. No. of samples x No. of features
Output:
sum_loss: the loss values for all features. No. of samples x No. of outputs"""
# initialize values
num_fea = len(fea[0])
......@@ -243,7 +286,11 @@ class LutTrainer():
def compute_hgrad(self, loss_grado,fval):
""" The function computes the loss for a single feature
""" The function computes the loss for a single feature.
Function computes sum of the loss gradient that have same feature values.
Input: loss_grado: loss gradient for a single output values. No of Samples x 1
fval: single feature selected for all samples. No. of samples x 1
......@@ -262,9 +309,14 @@ class LutTrainer():
def get_weak_scores(self, fset):
""" Function computes classification results according to current weak classifier
Input: fset: The set test features. No. of test samples x No. of total features
return: The classification scores of the features based on current weak classifier"""
Function classifies the features based on a single weak classifier.
Input:
fset: The set test features. No. of test samples x No. of total features
return:
weak_scores: The classification scores of the features based on current weak classifier"""
num_samp = len(fset)
num_op = len(self.luts[0])
weak_scores = numpy.zeros([num_samp,num_op])
......
......@@ -31,16 +31,17 @@ class lbp_feature():
sy = isy +1
blk_int = int_img[sy+1:,sx+1:] + int_img[0:-(sy+1),0:-(sx+1)] - int_img[sy+1:,0:-(sx+1)] - int_img[0:-(sy+1),sx+1:]
blk_int = int_img[sy:,sx:] + int_img[0:-sy,0:-sx] - int_img[sy:,0:-sx] - int_img[0:-sy,sx:]
fmap_dimy, fmap_dimx = blk_int.shape -2
fmap_dimy = blk_int.shape[0] -2
fmap_dimx = blk_int.shape[1] -2
coord = [[0,0],[0,1],[0,2],[1,2],[2,2],[2,1],[2,0],[1,0]]
if(self.ftype == 'lbp'):
fmap = self.lbp(coord, fmap_dimx, fmap_dimy, blk_int)
else(self.ftype == 'tlbp'):
elif(self.ftype == 'tlbp'):
fmap = self.tlbp(coord, fmap_dimx, fmap_dimy, blk_int)
else(self.ftype == 'dlbp'):
elif(self.ftype == 'dlbp'):
fmap = self.dlbp(coord, fmap_dimx, fmap_dimy, blk_int)
else(self.ftype == 'mlbp'):
elif(self.ftype == 'mlbp'):
fmap = self.mlbp(coord, fmap_dimx, fmap_dimy, blk_int)
vec = np.reshape(fmap,fmap.shape[0]*fmap.shape[1],1)
......@@ -50,6 +51,7 @@ class lbp_feature():
def lbp(self, coord, fmap_dimx, fmap_dimy, blk_int):
num_neighbours = 8
blk_center = blk_int[1:1+fmap_dimy,1:1+fmap_dimx]
fmap = np.zeros([fmap_dimy, fmap_dimx])
for ind in range(num_neighbours):
fmap = fmap + (2**ind)*(blk_int[coord[ind][0]:coord[ind][0] + fmap_dimy,coord[ind][1]:coord[ind][1] + fmap_dimx]>= blk_center)
return fmap
......@@ -57,6 +59,7 @@ class lbp_feature():
def tlbp(self, coord, fmap_dimx, fmap_dimy, blk_int):
fmap = np.zeros([fmap_dimy, fmap_dimx])
num_neighbour = 8
for ind in range(num_neighbours):
......@@ -92,9 +95,9 @@ class lbp_feature():
return fmap
def get_feature_number(self, dimy, dimx, scale_y, scale_x)
def get_feature_number(self, dimy, dimx, scale_y, scale_x):
img = np.zeros([dimy, dimx])
feature_vector = self.get_features(img, scale_y, scale_x)
return feature_vector.shape
return feature_vector.shape[0]
......@@ -33,6 +33,7 @@ def main():
# Initializations
accu = 0
test_num = 0
img_size = 28
# download the dataset
db_object = xbob.db.mnist.Database()
......@@ -61,17 +62,15 @@ def main():
feature_extractor = local_feature.lbp_feature('lbp')
scale_y = 4
scale_x = 4
num_fea = feature_extractor.get_feature_number(28,28,scale_y, scale_x)
print num_fea
print img_train.shape[0]
num_fea = feature_extractor.get_feature_number(img_size,img_size,scale_y, scale_x)
train_fea = numpy.zeros([img_train.shape[0], num_fea],dtype = 'uint8')
test_fea = numpy.zeros([img_test.shape[0], num_fea], dtype = 'uint8')
for img_num in range(img_train.shape[0]):
img = img_train[img_num,:].reshape([28,28])
img = img_train[img_num,:].reshape([img_size,img_size])
train_fea[img_num,:] = feature_extractor.get_features(img, scale_y, scale_x)
for img_num in range(img_test.shape[0]):
img = img_test[img_num,:].reshape([28,28])
img = img_test[img_num,:].reshape([img_size,img_size])
test_fea[img_num,:] = feature_extractor.get_features(img, scale_y, scale_x)
......
#!/usr/bin/env python
"""The test script to perform the binary classification on the digits from the MNIST dataset.
The MNIST data is exported using the xbob.db.mnist module which provide the train and test
partitions for the digits. Pixel values of grey scale images are used as features and the
available algorithms for classification are Lut based Boosting and Stump based Boosting.
The script test digits provided by the command line. Thus it conducts only one binary classifcation test.
"""
import xbob.db.mnist
import numpy
import sys, getopt
import argparse
import string
from ..features import local_feature
from ..core import boosting
import matplotlib.pyplot as mpl
def main():
parser = argparse.ArgumentParser(description = " The arguments for the boosting. ")
parser.add_argument('-r', default = 20, dest = "num_rnds", type = int, help = "The number of round for the boosting")
parser.add_argument('-l', default = 'exp', dest = "loss_type", type= str, choices = {'log','exp'}, help = "The type of the loss function. Logit and Exponential functions are the avaliable options")
parser.add_argument('-s', default = 'indep', dest = "selection_type", choices = {'indep', 'shared'}, type = str, help = "The feature selection type for the LUT based trainer. For multivarite case the features can be selected by sharing or independently ")
parser.add_argument('-n', default = 256, dest = "num_entries", type = int, help = "The number of entries in the LookUp table. It is the range of the feature values, e.g. if LBP features are used this values is 256.")
args = parser.parse_args()
# download the dataset
db_object = xbob.db.mnist.Database()
# Hardcode the number of digits and the image size
num_digits = 10
img_size = 28
# get the data (features and labels) for the selected digits from the xbob_db_mnist class functions
img_train, label_train = db_object.data('train',labels = range(num_digits))
img_test, label_test = db_object.data('test', labels = range(num_digits))
# Format the label data into int and change the class labels to -1 and +1
label_train = label_train.astype(int)
label_test = label_test.astype(int)
# initialize the label data for multivariate case
train_targets = -numpy.ones([img_train.shape[0],num_digits])
test_targets = -numpy.ones([img_test.shape[0],num_digits])
for i in range(num_digits):
train_targets[label_train == i,i] = 1
test_targets[label_test == i,i] = 1
# Extract the local features from the images
feature_extractor = local_feature.lbp_feature('lbp')
scale_y = 4
scale_x = 4
num_fea = feature_extractor.get_feature_number(img_size,img_size,scale_y, scale_x)
train_fea = numpy.zeros([img_train.shape[0], num_fea],dtype = 'uint8')
test_fea = numpy.zeros([img_test.shape[0], num_fea], dtype = 'uint8')
for img_num in range(img_train.shape[0]):
img = img_train[img_num,:].reshape([img_size,img_size])
train_fea[img_num,:] = feature_extractor.get_features(img, scale_y, scale_x)
for img_num in range(img_test.shape[0]):
img = img_test[img_num,:].reshape([img_size,img_size])
test_fea[img_num,:] = feature_extractor.get_features(img, scale_y, scale_x)
# Initilize the trainer with LutTrainer
boost_trainer = boosting.Boost('LutTrainer')
# Set the parameters for the boosting
boost_trainer.num_rnds = args.num_rnds
boost_trainer.loss_type = args.loss_type
boost_trainer.selection_type = args.selection_type
boost_trainer.num_entries = args.num_entries
# Perform boosting of the feature set samp
machine = boost_trainer.train(train_fea, train_targets)
# Classify the test samples (testsamp) using the boosited classifier generated above
prediction_labels = machine.classify(test_fea)
# Calulate the values for confusion matrix
score = np.zeros([10,10])
for i in range(num_digits):
prediction_i = prediction_labels[test_targets[:,i] == 1,:]
print prediction_i.shape
for j in range(num_digits):
score[i,j] = sum(prediction_i[:,j] == 1)
np.savetxt('conf_mat.out', score, delimiter=',')
cm = score/np.sum(score,1)
res = mpl.imshow(cm, cmap=mpl.cm.summer, interpolation='nearest')
for x in numpy.arange(cm.shape[0]):
for y in numpy.arange(cm.shape[1]):
col = 'white'
if cm[x,y] > 0.5: col = 'black'
mpl.annotate('%.2f' % (100*cm[x,y],), xy=(y,x), color=col,
fontsize=8, horizontalalignment='center', verticalalignment='center')
classes = [str(k) for k in range(10)]
mpl.xticks(numpy.arange(10), classes)
mpl.yticks(numpy.arange(10), classes, rotation=90)
mpl.ylabel("(Your prediction)")
mpl.xlabel("(Real class)")
mpl.title("Confusion Matrix (%s set) - in %%" % set_name)
mpl.show()
# Calculate the accuracy in percentage for the curent classificaiton test
accuracy = 100*float(sum(numpy.sum(prediction_labels == test_targets,1) == num_digits))/float(prediction_labels.shape[0])
print "The average accuracy of classification is %f " % (accuracy)
if __name__ == "__main__":
main()
......@@ -32,7 +32,53 @@ class TestLossFunctions(unittest.TestCase):
val3 = sum(numpy.exp(-target * curr_scores))
self.assertEqual(val3, l3)
# Check the gradient sum
weak_scores = numpy.random.rand(10)
prev_scores = numpy.random.rand(10)
x = numpy.random.rand(1)
curr_scores = prev_scores + x*weak_scores
l4 = exp_.loss_grad_sum(x, target, prev_scores, weak_scores)
temp = numpy.exp(-target * curr_scores)
grad = -target * temp
val4 = numpy.sum(grad * weak_scores,0)
self.assertEqual(val4, l4)
def test_log_loss(self):
exp_ = xbob.boosting.core.losses.LogLossFunction()
target = 1
score = numpy.random.rand()
# check the loss values
l1 = exp_.update_loss(target, score)
val1 = numpy.log(1 + numpy.exp(- target * score))
self.assertEqual(l1,val1)
# Check loss gradient
l2 = exp_.update_loss_grad( target, score)
temp = numpy.exp(-target * score)
val2 = -(target * temp* (1/(1 + temp)) )
self.assertEqual(l2,val2)
# Check loss sum
weak_scores = numpy.random.rand(10)
prev_scores = numpy.random.rand(10)
x = numpy.random.rand(1)
curr_scores = prev_scores + x*weak_scores
l3 = exp_.loss_sum(x, target, prev_scores, weak_scores)
val3 = sum(numpy.log(1 + numpy.exp(-target * curr_scores)))
self.assertEqual(val3, l3)
# Check the gradient sum
weak_scores = numpy.random.rand(10)
prev_scores = numpy.random.rand(10)
x = numpy.random.rand(1)
curr_scores = prev_scores + x*weak_scores
l3 = exp_.loss_grad_sum(x, target, prev_scores, weak_scores)
temp = numpy.exp(-target * curr_scores)
grad = -target * temp *(1/ (1 + temp))
val3 = numpy.sum(grad * weak_scores)
self.assertEqual(val3, l3)
......
import unittest
import random
import xbob.boosting
import numpy
class TestStumpTrainer(unittest.TestCase):
"""Perform test on loss function """
def test_stump_trainer(self):
# test the stump trainer for basic linearly seperable case and check the conditions on stump parameters
trainer = xbob.boosting.core.trainers.StumpTrainer()
n_samples = 100
dim = 5
x_train1 = numpy.random.randn(n_samples, dim) + 4
x_train2 = numpy.random.randn(n_samples, dim) - 4
x_train = numpy.vstack((x_train1, x_train2))
y_train = numpy.hstack((numpy.ones(n_samples),-numpy.ones(n_samples)))
scores = numpy.zeros(2*n_samples)
t = y_train*scores
loss = -y_train*(numpy.exp(y_train*scores))
stump = trainer.compute_weak_trainer(x_train,loss)
self.assertTrue(stump.threshold <= numpy.max(x_train))
self.assertTrue(stump.threshold >= numpy.min(x_train))
self.assertTrue(stump.selected_indices >= 0)
self.assertTrue(stump.selected_indices < dim)
x_test1 = numpy.random.randn(n_samples, dim) + 4
x_test2 = numpy.random.randn(n_samples, dim) - 4
x_test = numpy.vstack((x_test1, x_test2))
y_test = numpy.hstack((numpy.ones(n_samples),-numpy.ones(n_samples)))
prediction = trainer.get_weak_scores(x_test) # return negative labels
self.assertTrue(numpy.all(prediction.T * y_test < 0) )
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment