guide.rst 7.08 KB
Newer Older
André Anjos's avatar
André Anjos committed
1 2 3 4 5 6
.. vim: set fileencoding=utf-8 :
.. consolidated by Andre Anjos <andre.anjos@idiap.ch>
.. Wed 26 Mar 2014 10:38:10 CET
..
.. Copyright (C) 2011-2014 Idiap Research Institute, Martigny, Switzerland

André Anjos's avatar
André Anjos committed
7 8 9 10
.. testsetup::

  import os
  import numpy
André Anjos's avatar
André Anjos committed
11
  import bob.learn.libsvm
André Anjos's avatar
André Anjos committed
12 13 14 15 16

  def F(m, f):
    from pkg_resources import resource_filename
    return resource_filename(m, os.path.join('data', f))

André Anjos's avatar
André Anjos committed
17
  heart_model = F('bob.learn.libsvm', 'heart.svmmodel')
André Anjos's avatar
André Anjos committed
18

André Anjos's avatar
André Anjos committed
19
  svm = bob.learn.libsvm.Machine(heart_model)
André Anjos's avatar
André Anjos committed
20

André Anjos's avatar
André Anjos committed
21
  heart_data = F('bob.learn.libsvm', 'heart.svmdata')
André Anjos's avatar
André Anjos committed
22

André Anjos's avatar
André Anjos committed
23
  f = bob.learn.libsvm.File(heart_data)
André Anjos's avatar
André Anjos committed
24

André Anjos's avatar
André Anjos committed
25 26 27 28 29 30 31 32
======================================
 Support Vector Machines and Trainers
======================================

A **Support vector machine** (SVM) [1]_ is a very popular `supervised` learning
technique. |project| provides a bridge to `LIBSVM`_ which allows you to `train`
such a `machine` and use it for classification. This section contains a
tutorial on how to use |project|'s Pythonic bindings to LIBSVM. It starts by
André Anjos's avatar
André Anjos committed
33
introducing the support vector :py:class:`bob.learn.libsvm.Machine` followed
André Anjos's avatar
André Anjos committed
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
by the trainer usage.

Machines
--------

The functionality of this bridge includes loading and saving SVM data files and
machine models, which you can produce or download following the instructions
found on `LIBSVM`_'s home page. |project| bindings to `LIBSVM`_ **do not**
allow you to explicitly set the machine's internal values. You must use the
a trainer to create a machine first, as explained further down. Once you have
followed those instructions, you can come back to this point and follow the
remaining examples here.

.. note::

  Our current ``svm`` object was trained with the file called ``heart_scale``,
  distributed with `LIBSVM`_ and `available here
  <http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/heart_scale>`_.
  This dataset proposes a binary classification problem (i.e., 2 classes of
  features to be discriminated). The number of features is 13.

Our extensions to `LIBSVM`_ also allows you to feed data through a
André Anjos's avatar
André Anjos committed
56
:py:class:`bob.learn.libsvm.Machine` using :py:class:`numpy.ndarray` objects
André Anjos's avatar
André Anjos committed
57
and collect results in that format. For the following lines, we assume you have
André Anjos's avatar
André Anjos committed
58
available a :py:class:`bob.learn.libsvm.Machine` named ``svm``. (For this
André Anjos's avatar
André Anjos committed
59 60 61 62 63
example, the variable ``svm`` was generated from the ``heart_scale`` dataset
using the application ``svm-train`` with default parameters). The ``shape``
attribute, indicates how many features a machine from this module can input and
how many it outputs (typically, just 1):

André Anjos's avatar
André Anjos committed
64
.. doctest::
André Anjos's avatar
André Anjos committed
65 66 67 68 69 70

  >>> svm.shape
  (13, 1)

To run a single example through the SVM, just use the ``()`` operator:

André Anjos's avatar
André Anjos committed
71
.. doctest::
André Anjos's avatar
André Anjos committed
72 73 74 75 76 77

  >> svm(numpy.ones((13,), 'float64'))
  1
  >> svm(numpy.ones((10,13), 'float64'))
  (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)

André Anjos's avatar
André Anjos committed
78
Visit the documentation for :py:class:`bob.learn.libsvm.Machine` to find more
André Anjos's avatar
André Anjos committed
79
information about these bindings and methods you can call on such a machine.
André Anjos's avatar
André Anjos committed
80
Visit the documentation for :py:class:`bob.learn.libsvm.File` for information
André Anjos's avatar
André Anjos committed
81 82 83 84
on loading `LIBSVM`_ data files directly into python and producing
:py:class:`numpy.ndarray` objects.

Below is a quick example: Suppose the variable ``f`` contains an object of
André Anjos's avatar
André Anjos committed
85
type :py:class:`bob.learn.libsvm.File`. Then, you could read data (and labels)
André Anjos's avatar
André Anjos committed
86 87
from the file like this:

André Anjos's avatar
André Anjos committed
88
.. doctest::
André Anjos's avatar
André Anjos committed
89 90 91 92 93 94 95 96
   :options: +NORMALIZE_WHITESPACE

   >>> labels, data = f.read_all()
   >>> data = numpy.vstack(data) #creates a single 2D array

Then you can throw the data into the ``svm`` machine you trained earlier like
this:

André Anjos's avatar
André Anjos committed
97
.. doctest::
André Anjos's avatar
André Anjos committed
98 99 100 101 102 103 104 105 106 107 108 109 110
   :options: +NORMALIZE_WHITESPACE

   >>> predicted_labels = svm(data)

Training
--------

The training set for SVM's consists of a list of 2D `NumPy` arrays, one for
each class. The first dimension of each 2D `NumPy` array is the number of
training samples for the given class and the second dimension is the
dimensionality of the feature. For instance, let's consider the following
training set for a two class problem:

André Anjos's avatar
André Anjos committed
111
.. doctest::
André Anjos's avatar
André Anjos committed
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
   :options: +NORMALIZE_WHITESPACE

   >>> pos = numpy.array([[1,-1,1], [0.5,-0.5,0.5], [0.75,-0.75,0.8]], 'float64')
   >>> neg = numpy.array([[-1,1,-0.75], [-0.25,0.5,-0.8]], 'float64')
   >>> data = [pos,neg]
   >>> print(data) # doctest: +SKIP

.. note::

   Please note that in the above training set, the data is pre-scaled so
   features remain in the range between -1 and +1. libsvm, apparently, suggests
   you do that for all features. Our bindings to libsvm do not include scaling.
   If you want to implement that generically, please do it.

Then, an SVM [1]_ can be trained easily using the
André Anjos's avatar
André Anjos committed
127
:py:class:`bob.learn.libsvm.Trainer` class.
André Anjos's avatar
André Anjos committed
128

André Anjos's avatar
André Anjos committed
129
.. doctest::
André Anjos's avatar
André Anjos committed
130 131
   :options: +NORMALIZE_WHITESPACE

André Anjos's avatar
André Anjos committed
132
   >>> trainer = bob.learn.libsvm.Trainer()
André Anjos's avatar
André Anjos committed
133 134
   >>> machine = trainer.train(data) #ordering only affects labels

André Anjos's avatar
André Anjos committed
135
This returns a :py:class:`bob.learn.libsvm.Machine` which can later be used
André Anjos's avatar
André Anjos committed
136 137
for classification, as explained before.

André Anjos's avatar
André Anjos committed
138
.. doctest::
André Anjos's avatar
André Anjos committed
139 140 141 142
   :options: +NORMALIZE_WHITESPACE

   >>> predicted_label = machine(numpy.array([1.,-1.,1.]))
   >>> print(predicted_label)
André Anjos's avatar
André Anjos committed
143
   [1]
André Anjos's avatar
André Anjos committed
144 145 146 147

The `training` procedure allows setting several different options. For
instance, the default `kernel` is an `RBF`. If we would like a `linear SVM`
instead, this can be set before calling the
André Anjos's avatar
André Anjos committed
148
:py:meth:`bob.learn.libsvm.Trainer.train` method.
André Anjos's avatar
André Anjos committed
149

André Anjos's avatar
André Anjos committed
150
.. doctest::
André Anjos's avatar
André Anjos committed
151 152
   :options: +NORMALIZE_WHITESPACE

André Anjos's avatar
André Anjos committed
153
   >>> trainer.kernel_type = 'LINEAR'
André Anjos's avatar
André Anjos committed
154

155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188
One Class SVM
=============

On the other hand, the package allows you to train a One Class Support Vector Machine. For training this kind of classifier take into account the following example.

.. doctest::
   :options: +NORMALIZE_WHITESPACE

   >>> pos = 0.4 * numpy.random.randn(100, 2).astype(numpy.float64)
   >>> data = [pos]
   >>> print(data) # doctest: +SKIP


As the above example, an SVM [1]_ for one class problem can be trained easily using the
:py:class:`bob.learn.libsvm.Trainer` class and selecting the appropiete machine_type (ONE_CLASS).

.. doctest::
   :options: +NORMALIZE_WHITESPACE

   >>> trainer = bob.learn.libsvm.Trainer(machine_type='ONE_CLASS')
   >>> machine = trainer.train(data)

Then, as explained before, a :py:class:`bob.learn.libsvm.Machine` can be used for classify the new entries. 

.. doctest::
   :options: +NORMALIZE_WHITESPACE

   >>> test = 0.4 * numpy.random.randn(20, 2).astype(numpy.float64)
   >>> outliers = numpy.random.uniform(low=-4, high=4, size=(20, 2)).astype(numpy.float64)
   >>> predicted_label_test = machine(test)
   >>> predicted_label_outliers = machine(outliers)
   >>> print(predicted_label)
   >>> print(predicted_label)

André Anjos's avatar
André Anjos committed
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211

Acknowledgements
----------------

As a final note, if you decide to use our `LIBSVM`_ bindings for your
publication, be sure to also cite:

.. code-block:: latex

  @article{CC01a,
   author  = {Chang, Chih-Chung and Lin, Chih-Jen},
   title   = {{LIBSVM}: A library for support vector machines},
   journal = {ACM Transactions on Intelligent Systems and Technology},
   volume  = {2},
   issue   = {3},
   year    = {2011},
   pages   = {27:1--27:27},
   note    = {Software available at \url{http://www.csie.ntu.edu.tw/~cjlin/libsvm}}
  }

.. Place here your external references
.. include:: links.rst
.. [1] http://en.wikipedia.org/wiki/Support_vector_machine