Commit 5ecc69bf authored by Amir MOHAMMADI's avatar Amir MOHAMMADI

Move documentation of Bob to bob/docs

parent 5bc984e3
Pipeline #36987 failed with stages
in 165 minutes and 48 seconds
.. vim: set fileencoding=utf-8 :
.. _bob.iris_example:
A Complete Application: Analysis of the Fisher Iris Dataset
The `Iris flower data set <>`_ or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Aylmer Fisher (1936) as an example of discriminant analysis.
It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species.
The dataset consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor).
Four features were measured from each sample, they are the length and the width of sepal and petal, in centimeters.
Based on the combination of the four features, Fisher developed a linear discriminant model to distinguish the species from each other.
In this example, we collect bits and pieces of the previous tutorials and build a complete example that discriminates Iris species based on Bob.
.. note::
This example will consider all 3 classes for the LDA training.
This is **not** what Fisher did in his paper [Fisher1936]_ .
In that work Fisher did the *right* thing only for the first 2-class problem (setosa *versus* versicolor).
You can reproduce the 2-class LDA using bob's LDA training system without problems.
When inserting the virginica class, Fisher decides for a different metric (:math:`4vi + ve - 5se`) and solves for the matrices in the last row of Table VIII.
This is OK, but does not generalize the method proposed in the beginning of his paper.
Results achieved by the generalized LDA method [Duda1973]_ will not match Fisher's result on that last table, be aware.
That being said, the final histogram presented on that paper looks quite similar to the one produced by this script, showing that Fisher's solution was a good approximation for the generalized LDA implementation available in Bob.
.. [Fisher1936] **R. A. FISHER**, *The Use of Multiple Measurements in Taxonomic Problems*, Annals of Eugenics, pp. 179-188, 1936
.. [Duda1973] **R.O. Duda and P.E. Hart**, *Pattern Classification and Scene Analysis*, (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. 1973 (See page 218).
.. testsetup:: iris
import bob
import numpy
import matplotlib
if not hasattr(matplotlib, 'backends'):
matplotlib.use('pdf') #non-interactive avoids exception on display
Training a :py:class:`bob.learn.linear.Machine` with LDA
Creating a :py:class:`bob.learn.linear.Machine` to perform Linear Discriminant Analysis on the Iris dataset involves using the :py:class:`bob.learn.linear.FisherLDATrainer`:
.. doctest:: iris
>>> import bob.db.iris
>>> import bob.learn.linear
>>> trainer = bob.learn.linear.FisherLDATrainer()
>>> data =
>>> machine, unused_eigen_values = trainer.train(data.values())
>>> machine.shape
(4, 2)
That is it! The returned :py:class:`bob.learn.linear.Machine` is now setup to perform LDA on the Iris data set.
A few things should be noted:
1. The returned :py:class:`bob.learn.linear.Machine` represents the linear projection of the input features to a new 3D space which maximizes the between-class scatter and minimizes the within-class scatter.
In other words, the internal matrix :math:`\mathbf{W}` is 4-by-2.
The projections are calculated internally using `Singular Value Decomposition <>`_ (SVD).
The first projection (first row of :math:`\mathbf{W}` corresponds to the highest eigenvalue resulting from the decomposition, the second, the second highest, and so on;
2. The trainer also returns the eigenvalues generated after the SVD for our LDA implementation, in case you would like to use them.
For this example, we just discard this information.
Looking at the first LDA component
To reproduce Fisher's results, we must pass the data through the created machine:
.. doctest:: iris
>>> output = {}
>>> for key in data:
... output[key] = machine.forward(data[key])
At this point the variable ``output`` contains the LDA-projected information as 2D :py:class:`numpy.ndarray` objects.
The only step missing is the visualization of the results.
Fisher proposed the use of a histogram showing the separation achieved by looking at the first only.
Let's reproduce it.
.. doctest:: iris
>>> from matplotlib import pyplot
>>> pyplot.hist(output['setosa'][:,0], bins=8, color='green', label='Setosa', alpha=0.5) # doctest: +SKIP
>>> pyplot.hist(output['versicolor'][:,0], bins=8, color='blue', label='Versicolor', alpha=0.5) # doctest: +SKIP
>>> pyplot.hist(output['virginica'][:,0], bins=8, color='red', label='Virginica', alpha=0.5) # doctest: +SKIP
We can certainly throw in more decoration:
.. doctest:: iris
>>> pyplot.legend() # doctest: +SKIP
>>> pyplot.grid(True) # doctest: +SKIP
>>> pyplot.axis([-3,+3,0,20]) # doctest: +SKIP
>>> pyplot.title("Iris Plants / 1st. LDA component") # doctest: +SKIP
>>> pyplot.xlabel("LDA[0]") # doctest: +SKIP
>>> pyplot.ylabel("Count") # doctest: +SKIP
Finally, to display the plot, do:
.. code-block:: python
You should see an image like this:
.. plot:: plot/
Measuring performance
You can measure the performance of the system on classifying, say, *Iris Virginica* as compared to the other two variants.
We can use the functions in :ref:`bob.measure <bob.measure>` for that purpose.
Let's first find a threshold that separates this variant from the others.
We choose to find the threshold at the point where the relative error rate considering both *Versicolor* and *Setosa* variants is the same as for the *Virginica* one.
.. doctest:: iris
>>> import bob.measure
>>> negatives = numpy.vstack([output['setosa'], output['versicolor']])[:,0]
>>> positives = output['virginica'][:,0]
>>> threshold= bob.measure.eer_threshold(negatives, positives)
With the threshold at hand, we can estimate the number of correctly classified *negatives* (or true-rejections) and *positives* (or true-accepts).
Let's translate that: plants from the *Versicolor* and *Setosa* variants that have the first LDA component smaller than the threshold (so called *negatives* at this point) and plants from the *Virginica* variant that have the first LDA component greater than the threshold defined (the *positives*).
To calculate the rates, we just use :ref:`bob.measure <bob.measure>` again:
.. doctest:: iris
>>> true_rejects = bob.measure.correctly_classified_negatives(negatives, threshold)
>>> true_accepts = bob.measure.correctly_classified_positives(positives, threshold)
From that you can calculate, for example, the number of misses at the defined ``threshold``:
.. doctest:: iris
>>> sum(true_rejects)
>>> sum(true_accepts)
You can also plot an ROC curve.
Here is the full code that will lead you to the following plot:
.. plot:: plot/
:include-source: True
.. include:: links.rst
.. vim: set fileencoding=utf-8 :
.. _bob_main_page:
......@@ -9,23 +8,6 @@
Bob_ is a free signal-processing and machine learning toolbox originally
developed by the Biometrics group at `Idiap`_ Research Institute, Switzerland.
The toolbox is written in a mix of `Python`_ and `C++`_ and is designed to be
both efficient and reduce development time. It is composed of a reasonably
large number of `packages`_ that implement tools for image, audio & video
processing, machine learning & pattern recognition, and a lot more task
specific packages.
The documentation of Bob has moved. Please visit for
up-to-date links.
.. todolist::
.. toctree::
:maxdepth: 2
Bob's Wiki <>
.. include:: links.rst
.. _bob.install:
Installation Instructions
We offer pre-compiled binary installations of Bob using `conda`_ for Linux and
MacOS 64-bit operating systems.
#. Please install `conda`_ (miniconda is preferred) and get familiar with it.
#. Make sure you have an up-to-date `conda`_ installation (conda 4.4 and above
is needed) with the **correct configuration** by running the commands
.. code:: sh
$ conda update -n base conda
$ conda config --set show_channel_urls True
#. Create an environment for Bob:
.. code:: sh
$ conda create --name bob_py3 --override-channels \
-c -c defaults \
python=3 bob
$ conda activate bob_py3
$ conda config --env --add channels defaults
$ conda config --env --add channels
#. Install the Bob packages that you need in that environment:
.. code:: sh
$ conda install ...
**Repeat the last two steps for every conda environment that you create for
For a comprehensive list of packages that are either part of |project| or use
|project|, please visit `packages`_.
.. warning::
Be aware that if you use packages from our channel and other user/community
channels (especially ``conda-forge``) in one environment, you may end up
with a broken envrionment. We can only guarantee that the packages in our
channel are compatible with the ``defaults`` channel.
.. note::
Bob does not work on Windows and hence no conda packages are available for
it. It will not work even if you install it from source. If you are an
experienced user and manage to make Bob work on Windows, please let us know
through our `mailing list`_.
.. note::
Bob has been reported to run on arm processors (e.g. Raspberry Pi) but is
not installable with conda. Please see :ref:`bob.source` for installations
on how to install Bob from source.
Installing older versions of Bob
Since Bob 4, you can easily select the Bob version that you want to install
using conda. For example:
.. code:: sh
$ conda install bob=4.0.0
will install the version of ```` that was associated with the Bob
4.0.0 release.
Bob packages that were released before Bob 4 are not easily installable. Here,
we provide conda environment files (**Linux 64-bit only**) that will install
all Bob packages associated with an older release of Bob:
=========== ==============================================================
Bob Version Environment Files
=========== ==============================================================
2.6.2 :download:`envs/v262py27.yaml`, :download:`envs/v262py35.yaml`
2.7.0 :download:`envs/v270py27.yaml`, :download:`envs/v270py35.yaml`
3.0.0 :download:`envs/v300py27.yaml`, :download:`envs/v300py36.yaml`
=========== ==============================================================
To install them, download one of the files above and run:
.. code:: sh
$ conda env create --file v300py36.yaml
Details (Advanced Users)
Since Bob 4, the ``bob`` conda package is just a meta package that pins all
packages to a specific version. Installing ``bob`` will not install anything;
it will just impose pinnings in your environment. Normally, installations of
Bob packages should work without installing ``bob`` itself. For example,
.. code:: sh
$ conda create --name env_name --override-channels \
-c -c defaults \
should always create a working environment. If it doesn't, please let us know.
.. include:: links.rst
.. _anaconda:
.. _Artistic-2.0:
.. _Blitz++:
.. _Bob:
.. _Boost:
.. _BSD-2-Clause:
.. _BSD-3-Clause:
.. _BSL-1.0:
.. _c++:
.. _CMake:
.. _conda:
.. _Dvipng:
.. _FFMpeg:
.. _fftw:
.. _giflib:
.. _GPL-2.0:
.. _GPL-3.0:
.. _HDF5 License:
.. _HDF5:
.. _idiap:
.. _install:
.. _IPython:
.. _Lapack:
.. _LaTeX:
.. _LGPL-2.1:
.. _libAV:
.. _libjpeg:
.. _libpng license:
.. _libpng:
.. _libtiff:
.. _mailing list:
.. _MatIO:
.. _Matplotlib:
.. _miniconda:
.. _MIT:
.. _nose:
.. _NumPy Reference:
.. _NumPy:
.. _packages:
.. _Pillow:
.. _pkg-config:
.. _Python-2.0:
.. _python:
.. _SciPy:
.. _Setuptools:
.. _Sphinx:
.. _SQLAlchemy:
.. _SQLite:
.. _VLFeat:
.. _Wiki:
List of Bob packages
Bob is organized in several independent python packages.
* You can
`search PyPI <>`_
for a comprehensive list of packages **that either use Bob or are part of
* Also, we maintain a list of active `packages`_.
.. include:: links.rst
.. _bob.source:
Compiling from Source
Please refer to :ref:`bob.buildout` and :ref:`bob.extension` for a complete
guide on how to install, develop existing, and create new |project| packages.
.. include:: links.rst
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment