guide.rst 28.9 KB
Newer Older
André Anjos's avatar
André Anjos committed
1
2
3
4
5
6
7
.. vim: set fileencoding=utf-8 :

.. testsetup:: *

   import numpy
   numpy.set_printoptions(precision=3, suppress=True)

8
   import bob.learn.em
André Anjos's avatar
André Anjos committed
9
10
11
12

   import os
   import tempfile
   current_directory = os.path.realpath(os.curdir)
André Anjos's avatar
André Anjos committed
13
   temp_dir = tempfile.mkdtemp(prefix='bob_doctest_')
André Anjos's avatar
André Anjos committed
14
15
16
17
18
19
   os.chdir(temp_dir)

============
 User guide
============

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
20
21
22
23
24
25
26
27
28
29
30
The EM algorithm is an iterative method that estimates parameters for
statistical models, where the model depends on unobserved latent variables. The
EM iteration alternates between performing an expectation (E) step, which
creates a function for the expectation of the log-likelihood evaluated using
the current estimate for the parameters, and a maximization (M) step, which
computes parameters maximizing the expected log-likelihood found on the E step.
These parameter-estimates are then used to determine the distribution of the
latent variables in the next E step [8]_.

*Machines* and *trainers* are the core components of Bob's machine learning
packages. *Machines* represent statistical models or other functions defined by
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
31
parameters that can be learned by *trainers* or manually set. Below you will
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
32
find machine/trainer guides for learning techniques available in this package.
André Anjos's avatar
André Anjos committed
33
34


35
36
37
K-Means
-------
.. _kmeans:
André Anjos's avatar
André Anjos committed
38

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
39
40
**k-means** [7]_ is a clustering method which aims to partition a set of
:math:`N` observations into
41
:math:`C` clusters with equal variance minimizing the following cost function
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
42
43
:math:`J = \sum_{i=0}^{N} \min_{\mu_j \in C} ||x_i - \mu_j||`, where
:math:`\mu` is a given mean (also called centroid) and
44
:math:`x_i` is an observation.
André Anjos's avatar
André Anjos committed
45

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
46
This implementation has two stopping criteria. The first one is when the
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
47
48
49
maximum number of iterations is reached; the second one is when the difference
between :math:`Js` of successive iterations are lower than a convergence
threshold.
André Anjos's avatar
André Anjos committed
50

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
51
52
In this implementation, the training consists in the definition of the
statistical model, called machine, (:py:class:`bob.learn.em.KMeansMachine`) and
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
53
this statistical model is learned via a trainer
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
54
(:py:class:`bob.learn.em.KMeansTrainer`).
André Anjos's avatar
André Anjos committed
55

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
56
Follow bellow an snippet on how to train a KMeans using Bob_.
André Anjos's avatar
André Anjos committed
57
58
59
60

.. doctest::
   :options: +NORMALIZE_WHITESPACE

61
62
   >>> import bob.learn.em
   >>> import numpy
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
63
64
65
66
67
68
69
70
   >>> data = numpy.array(
   ...     [[3,-3,100],
   ...      [4,-4,98],
   ...      [3.5,-3.5,99],
   ...      [-7,7,-100],
   ...      [-5,5,-101]], dtype='float64')
   >>> # Create a kmeans m with k=2 clusters with a dimensionality equal to 3
   >>> kmeans_machine = bob.learn.em.KMeansMachine(2, 3)
71
72
73
   >>> kmeans_trainer = bob.learn.em.KMeansTrainer()
   >>> max_iterations = 200
   >>> convergence_threshold = 1e-5
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
74
75
76
77
   >>> # Train the KMeansMachine
   >>> bob.learn.em.train(kmeans_trainer, kmeans_machine, data,
   ...     max_iterations=max_iterations,
   ...     convergence_threshold=convergence_threshold)
78
79
80
   >>> print(kmeans_machine.means)
   [[ -6.   6.  -100.5]
    [  3.5 -3.5   99. ]]
André Anjos's avatar
André Anjos committed
81
82


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
83
84
Bellow follow an intuition (source code + plot) of a kmeans training using the
Iris flower `dataset <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_.
André Anjos's avatar
André Anjos committed
85

86
87
.. plot:: plot/plot_kmeans.py
   :include-source: False
André Anjos's avatar
André Anjos committed
88
89
90
91



Gaussian mixture models
92
-----------------------
André Anjos's avatar
André Anjos committed
93
94


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
95
96
97
98
99
100
A Gaussian mixture model (`GMM <http://en.wikipedia.org/wiki/Mixture_model>`_)
is a probabilistic model for density estimation. It assumes that all the data
points are generated from a mixture of a finite number of Gaussian
distributions. More formally, a GMM can be defined as:
:math:`P(x|\Theta) = \sum_{c=0}^{C} \omega_c \mathcal{N}(x | \mu_c, \sigma_c)`
, where :math:`\Theta = \{ \omega_c, \mu_c, \sigma_c \}`.
André Anjos's avatar
André Anjos committed
101

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
102
103
This statistical model is defined in the class
:py:class:`bob.learn.em.GMMMachine` as bellow.
André Anjos's avatar
André Anjos committed
104
105

.. doctest::
106
   :options: +NORMALIZE_WHITESPACE
André Anjos's avatar
André Anjos committed
107

108
   >>> import bob.learn.em
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
109
   >>> # Create a GMM with k=2 Gaussians with the dimensionality of 3
110
   >>> gmm_machine = bob.learn.em.GMMMachine(2, 3)
André Anjos's avatar
André Anjos committed
111
112


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
113
114
There are plenty of ways to estimate :math:`\Theta`; the next subsections
explains some that are implemented in Bob.
André Anjos's avatar
André Anjos committed
115
116


117
Maximum likelihood Estimator (MLE)
André Anjos's avatar
André Anjos committed
118
==================================
119
.. _mle:
André Anjos's avatar
André Anjos committed
120

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
121
122
123
In statistics, maximum likelihood estimation (MLE) is a method of estimating
the parameters of a statistical model given observations by finding the
:math:`\Theta` that maximizes :math:`P(x|\Theta)` for all :math:`x` in your
124
dataset [9]_. This optimization is done by the **Expectation-Maximization**
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
125
126
(EM) algorithm [8]_ and it is implemented by
:py:class:`bob.learn.em.ML_GMMTrainer`.
André Anjos's avatar
André Anjos committed
127

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
128
129
A very nice explanation of EM algorithm for the maximum likelihood estimation
can be found in this
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
130
`Mathematical Monk <https://www.youtube.com/watch?v=AnbiNaVp3eQ>`_ YouTube
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
131
video.
André Anjos's avatar
André Anjos committed
132

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
133
134
Follow bellow an snippet on how to train a GMM using the maximum likelihood
estimator.
André Anjos's avatar
André Anjos committed
135

Tiago de Freitas Pereira's avatar
Tiago de Freitas Pereira committed
136
137
138
139

.. doctest::
   :options: +NORMALIZE_WHITESPACE

140
141
   >>> import bob.learn.em
   >>> import numpy
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
142
143
144
145
146
147
148
149
   >>> data = numpy.array(
   ...     [[3,-3,100],
   ...      [4,-4,98],
   ...      [3.5,-3.5,99],
   ...      [-7,7,-100],
   ...      [-5,5,-101]], dtype='float64')
   >>> # Create a kmeans model (machine) m with k=2 clusters
   >>> # with a dimensionality equal to 3
150
   >>> gmm_machine = bob.learn.em.GMMMachine(2, 3)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
151
152
153
154
   >>> # Using the MLE trainer to train the GMM:
   >>> # True, True, True means update means/variances/weights at each
   >>> # iteration
   >>> gmm_trainer = bob.learn.em.ML_GMMTrainer(True, True, True)
155
156
   >>> # Setting some means to start the training.
   >>> # In practice, the output of kmeans is a good start for the MLE training
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
157
158
159
   >>> gmm_machine.means = numpy.array(
   ...     [[ -4.,   2.3,  -10.5],
   ...      [  2.5, -4.5,   59. ]])
160
161
162
   >>> max_iterations = 200
   >>> convergence_threshold = 1e-5
   >>> # Training
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
163
164
165
   >>> bob.learn.em.train(gmm_trainer, gmm_machine, data,
   ...                    max_iterations=max_iterations,
   ...                    convergence_threshold=convergence_threshold)
166
167
168
   >>> print(gmm_machine.means)
   [[ -6.   6.  -100.5]
    [  3.5 -3.5   99. ]]
Tiago de Freitas Pereira's avatar
Tiago de Freitas Pereira committed
169

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
170
171
172
Bellow follow an intuition of the GMM trained the maximum likelihood estimator
using the Iris flower
`dataset <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_.
Tiago de Freitas Pereira's avatar
Tiago de Freitas Pereira committed
173

174
175
.. plot:: plot/plot_ML.py
   :include-source: False
Tiago de Freitas Pereira's avatar
Tiago de Freitas Pereira committed
176
177


178
179
180
Maximum a posteriori Estimator (MAP)
====================================
.. _map:
Tiago de Freitas Pereira's avatar
Tiago de Freitas Pereira committed
181

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
182
183
Closely related to the MLE, Maximum a posteriori probability (MAP) is an
estimate that equals the mode of the posterior distribution by incorporating in
184
its loss function a prior distribution [10]_. Commonly this prior distribution
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
185
186
187
188
189
190
191
192
(the values of :math:`\Theta`) is estimated with MLE. This optimization is done
by the **Expectation-Maximization** (EM) algorithm [8]_ and it is implemented
by :py:class:`bob.learn.em.MAP_GMMTrainer`.

A compact way to write relevance MAP adaptation is by using GMM supervector
notation (this will be useful in the next subsections). The GMM supervector
notation consists of taking the parameters of :math:`\Theta` (weights, means
and covariance matrices) of a GMM and create a single vector or matrix to
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
193
represent each of them. For each Gaussian component :math:`c`, we can
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
194
195
represent the MAP adaptation as the following :math:`\mu_i = m + d_i`, where
:math:`m` is our prior and :math:`d_i` is the class offset.
Tiago de Freitas Pereira's avatar
Tiago de Freitas Pereira committed
196

197
Follow bellow an snippet on how to train a GMM using the MAP estimator.
André Anjos's avatar
André Anjos committed
198
199
200
201
202


.. doctest::
   :options: +NORMALIZE_WHITESPACE

203
204
   >>> import bob.learn.em
   >>> import numpy
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
205
206
207
208
209
210
   >>> data = numpy.array(
   ...     [[3,-3,100],
   ...      [4,-4,98],
   ...      [3.5,-3.5,99],
   ...      [-7,7,-100],
   ...      [-5,5,-101]], dtype='float64')
211
212
   >>> # Creating a fake prior
   >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
213
214
215
216
   >>> # Set some random means for the example
   >>> prior_gmm.means = numpy.array(
   ...     [[ -4.,   2.3,  -10.5],
   ...      [  2.5, -4.5,   59. ]])
217
218
219
220
221
222
223
224
   >>> # Creating the model for the adapted GMM
   >>> adapted_gmm = bob.learn.em.GMMMachine(2, 3)
   >>> # Creating the MAP trainer
   >>> gmm_trainer = bob.learn.em.MAP_GMMTrainer(prior_gmm, relevance_factor=4)
   >>>
   >>> max_iterations = 200
   >>> convergence_threshold = 1e-5
   >>> # Training
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
225
226
227
   >>> bob.learn.em.train(gmm_trainer, adapted_gmm, data,
   ...                    max_iterations=max_iterations,
   ...                    convergence_threshold=convergence_threshold)
228
   >>> print(adapted_gmm.means)
Tiago Pereira's avatar
Tiago Pereira committed
229
230
    [[ -4.667   3.533 -40.5  ]
     [  2.929  -4.071  76.143]]
André Anjos's avatar
André Anjos committed
231

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
232
233
Bellow follow an intuition of the GMM trained with the MAP estimator using the
Iris flower `dataset <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_.
André Anjos's avatar
André Anjos committed
234

235
236
.. plot:: plot/plot_MAP.py
   :include-source: False
André Anjos's avatar
André Anjos committed
237
238


239
Session Variability Modeling with Gaussian Mixture Models
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
240
---------------------------------------------------------
André Anjos's avatar
André Anjos committed
241

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
242
243
244
In the aforementioned GMM based algorithms there is no explicit modeling of
session variability. This section will introduce some session variability
algorithms built on top of GMMs.
André Anjos's avatar
André Anjos committed
245
246


247
GMM statistics
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
248
249
250
251
252
253
254
255
==============

Before introduce session variability for GMM based algorithms, we must
introduce a component called :py:class:`bob.learn.em.GMMStats`. This component
is useful for some computation in the next sections.
:py:class:`bob.learn.em.GMMStats` is a container that solves the Equations 8, 9
and 10 in [Reynolds2000]_ (also called, zeroth, first and second order GMM
statistics).
André Anjos's avatar
André Anjos committed
256

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
257
Given a GMM (:math:`\Theta`) and a set of samples :math:`x_t` this component
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
258
accumulates statistics for each Gaussian component :math:`c`.
André Anjos's avatar
André Anjos committed
259

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
260
261
Follow bellow a 1-1 relationship between statistics in [Reynolds2000]_ and the
properties in :py:class:`bob.learn.em.GMMStats`:
André Anjos's avatar
André Anjos committed
262

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
263
264
265
266
267
268
   - Eq (8) is :py:class:`bob.learn.em.GMMStats.n`:
     :math:`n_c=\sum\limits_{t=1}^T Pr(c | x_t)` (also called responsibilities)
   - Eq (9) is :py:class:`bob.learn.em.GMMStats.sum_px`:
     :math:`E_c(x)=\frac{1}{n(c)}\sum\limits_{t=1}^T Pr(c | x_t)x_t`
   - Eq (10) is :py:class:`bob.learn.em.GMMStats.sum_pxx`:
     :math:`E_c(x^2)=\frac{1}{n(c)}\sum\limits_{t=1}^T Pr(c | x_t)x_t^2`
André Anjos's avatar
André Anjos committed
269

270
where :math:`T` is the number of samples used to generate the stats.
André Anjos's avatar
André Anjos committed
271

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
272
273
The snippet bellow shows how to compute accumulated these statistics given a
prior GMM.
André Anjos's avatar
André Anjos committed
274
275
276
277
278


.. doctest::
   :options: +NORMALIZE_WHITESPACE

279
280
281
282
    >>> import bob.learn.em
    >>> import numpy
    >>> numpy.random.seed(10)
    >>>
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
283
284
285
286
287
288
    >>> data = numpy.array(
    ...     [[0, 0.3, -0.2],
    ...      [0.4, 0.1, 0.15],
    ...      [-0.3, -0.1, 0],
    ...      [1.2, 1.4, 1],
    ...      [0.8, 1., 1]], dtype='float64')
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
289
    >>> # Creating a fake prior with 2 Gaussians of dimension 3
290
    >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
291
292
    >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)),
    ...                                 numpy.random.normal(1, 0.5, (1, 3))))
293
294
295
296
297
298
    >>> # All nice and round diagonal covariance
    >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5
    >>> prior_gmm.weights = numpy.array([0.3, 0.7])
    >>> # Creating the container
    >>> gmm_stats_container = bob.learn.em.GMMStats(2, 3)
    >>> for d in data:
Tiago Pereira's avatar
Tiago Pereira committed
299
    ...    prior_gmm.acc_statistics(d, gmm_stats_container)
300
301
    >>>
    >>> # Printing the responsibilities
302
    >>> print(gmm_stats_container.n/gmm_stats_container.t)
Tiago Pereira's avatar
Tiago Pereira committed
303
     [ 0.429  0.571]
André Anjos's avatar
André Anjos committed
304
305


306
Inter-Session Variability
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
307
=========================
308
.. _isv:
André Anjos's avatar
André Anjos committed
309

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
310
311
Inter-Session Variability (ISV) modeling [3]_ [2]_ is a session variability
modeling technique built on top of the Gaussian mixture modeling approach. It
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
312
hypothesizes that within-class variations are embedded in a linear subspace in
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
313
the GMM means subspace and these variations can be suppressed by an offset w.r.t
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
314
315
316
317
318
319
320
321
322
each mean during the MAP adaptation.

In this generative model each sample is assumed to have been generated by a GMM
mean supervector with the following shape:
:math:`\mu_{i, j} = m + Ux_{i, j} + D_z{i}`, where :math:`m` is our prior,
:math:`Ux_{i, j}` is the session offset that we want to suppress and
:math:`D_z{i}` is the class offset (with all session effects suppressed).

All possible sources of session variations is embedded in this matrix
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
323
:math:`U`. Follow bellow an intuition of what is modeled with :math:`U` in the
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
324
325
Iris flower `dataset <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_.
The arrows :math:`U_{1}`, :math:`U_{2}` and :math:`U_{3}` are the directions of
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
326
the within class variations, with respect to each Gaussian component, that will
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
327
be suppressed a posteriori.
André Anjos's avatar
André Anjos committed
328

329
330
.. plot:: plot/plot_ISV.py
   :include-source: False
André Anjos's avatar
André Anjos committed
331
332


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
333
334
335
The ISV statistical model is stored in this container
:py:class:`bob.learn.em.ISVBase` and the training is performed by
:py:class:`bob.learn.em.ISVTrainer`. The snippet bellow shows how to train a
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
336
Intersession variability modeling.
André Anjos's avatar
André Anjos committed
337
338
339
340
341


.. doctest::
   :options: +NORMALIZE_WHITESPACE

342
343
344
345
346
347
348
349
350
351
352
    >>> import bob.learn.em
    >>> import numpy
    >>> numpy.random.seed(10)
    >>>
    >>> # Generating some fake data
    >>> data_class1 = numpy.random.normal(0, 0.5, (10, 3))
    >>> data_class2 = numpy.random.normal(-0.2, 0.2, (10, 3))
    >>> data = [data_class1, data_class2]

    >>> # Creating a fake prior with 2 gaussians of dimension 3
    >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
353
354
    >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)),
    ...                                 numpy.random.normal(1, 0.5, (1, 3))))
355
356
357
358
359
360
361
    >>> # All nice and round diagonal covariance
    >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5
    >>> prior_gmm.weights = numpy.array([0.3, 0.7])
    >>> # The input the the ISV Training is the statistics of the GMM
    >>> gmm_stats_per_class = []
    >>> for d in data:
    ...   stats = []
Tiago Pereira's avatar
Tiago Pereira committed
362
363
364
365
366
    ...   for i in d:
    ...     gmm_stats_container = bob.learn.em.GMMStats(2, 3)
    ...     prior_gmm.acc_statistics(i, gmm_stats_container)
    ...     stats.append(gmm_stats_container)
    ...   gmm_stats_per_class.append(stats)
367
368
369
370
371
372

    >>> # Finally doing the ISV training
    >>> subspace_dimension_of_u = 2
    >>> relevance_factor = 4
    >>> isvbase = bob.learn.em.ISVBase(prior_gmm, subspace_dimension_of_u)
    >>> trainer = bob.learn.em.ISVTrainer(relevance_factor)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
373
374
    >>> bob.learn.em.train(trainer, isvbase, gmm_stats_per_class,
    ...                    max_iterations=50)
375
    >>> # Printing the session offset w.r.t each Gaussian component
376
    >>> print(isvbase.u)
Tiago Pereira's avatar
Tiago Pereira committed
377
378
379
380
381
382
      [[-0.01  -0.027]
      [-0.002 -0.004]
      [ 0.028  0.074]
      [ 0.012  0.03 ]
      [ 0.033  0.085]
      [ 0.046  0.12 ]]
André Anjos's avatar
André Anjos committed
383
384


385
Joint Factor Analysis
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
386
=====================
387
.. _jfa:
André Anjos's avatar
André Anjos committed
388

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
389
390
Joint Factor Analysis (JFA) [1]_ [2]_ is an extension of ISV. Besides the
within-class assumption (modeled with :math:`U`), it also hypothesize that
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
391
392
between class variations are embedded in a low rank rectangular matrix
:math:`V`. In the supervector notation, this modeling has the following shape:
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
393
:math:`\mu_{i, j} = m + Ux_{i, j}  + Vy_{i} + D_z{i}`.
André Anjos's avatar
André Anjos committed
394

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
395
Follow bellow an intuition of what is modeled with :math:`U` and :math:`V` in
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
396
397
398
the Iris flower
`dataset <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_. The arrows
:math:`V_{1}`, :math:`V_{2}` and :math:`V_{3}` are the directions of the
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
399
between class variations with respect to each Gaussian component that will be
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
400
added a posteriori.
André Anjos's avatar
André Anjos committed
401
402


403
404
.. plot:: plot/plot_JFA.py
   :include-source: False
André Anjos's avatar
André Anjos committed
405

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
406
407
408
The JFA statistical model is stored in this container
:py:class:`bob.learn.em.JFABase` and the training is performed by
:py:class:`bob.learn.em.JFATrainer`. The snippet bellow shows how to train a
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
409
Intersession variability modeling.
André Anjos's avatar
André Anjos committed
410
411
412
413

.. doctest::
   :options: +NORMALIZE_WHITESPACE

414
415
416
417
418
419
420
421
422
    >>> import bob.learn.em
    >>> import numpy
    >>> numpy.random.seed(10)
    >>>
    >>> # Generating some fake data
    >>> data_class1 = numpy.random.normal(0, 0.5, (10, 3))
    >>> data_class2 = numpy.random.normal(-0.2, 0.2, (10, 3))
    >>> data = [data_class1, data_class2]

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
423
    >>> # Creating a fake prior with 2 Gaussians of dimension 3
424
    >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
425
426
    >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)),
    ...                                 numpy.random.normal(1, 0.5, (1, 3))))
427
428
429
430
431
432
433
434
    >>> # All nice and round diagonal covariance
    >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5
    >>> prior_gmm.weights = numpy.array([0.3, 0.7])
    >>>
    >>> # The input the the JFA Training is the statistics of the GMM
    >>> gmm_stats_per_class = []
    >>> for d in data:
    ...   stats = []
Tiago Pereira's avatar
Tiago Pereira committed
435
436
437
438
439
    ...   for i in d:
    ...     gmm_stats_container = bob.learn.em.GMMStats(2, 3)
    ...     prior_gmm.acc_statistics(i, gmm_stats_container)
    ...     stats.append(gmm_stats_container)
    ...   gmm_stats_per_class.append(stats)
440
441
442
443
444
    >>>
    >>> # Finally doing the JFA training
    >>> subspace_dimension_of_u = 2
    >>> subspace_dimension_of_v = 2
    >>> relevance_factor = 4
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
445
446
    >>> jfabase = bob.learn.em.JFABase(prior_gmm, subspace_dimension_of_u,
    ...                                subspace_dimension_of_v)
447
    >>> trainer = bob.learn.em.JFATrainer()
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
448
449
    >>> bob.learn.em.train_jfa(trainer, jfabase, gmm_stats_per_class,
    ...                        max_iterations=50)
450
451

    >>> # Printing the session offset w.r.t each Gaussian component
452
    >>> print(jfabase.v)
Tiago Pereira's avatar
Tiago Pereira committed
453
454
455
456
457
458
     [[ 0.003 -0.006]
      [ 0.041 -0.084]
      [-0.261  0.53 ]
      [-0.252  0.51 ]
      [-0.387  0.785]
      [-0.36   0.73 ]]
459
460

Total variability Modelling
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
461
===========================
462
.. _ivector:
André Anjos's avatar
André Anjos committed
463

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
464
Total Variability (TV) modeling [4]_ is a front-end initially introduced for
465
speaker recognition, which aims at describing samples by vectors of low
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
466
467
468
dimensionality called ``i-vectors``. The model consists of a subspace :math:`T`
and a residual diagonal covariance matrix :math:`\Sigma`, that are then used to
extract i-vectors, and is built upon the GMM approach. In the supervector
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
469
notation this modeling has the following shape: :math:`\mu = m + Tv`.
André Anjos's avatar
André Anjos committed
470

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
471
472
473
Follow bellow an intuition of the data from the Iris flower
`dataset <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_, embedded in
the iVector space.
André Anjos's avatar
André Anjos committed
474

475
476
.. plot:: plot/plot_iVector.py
   :include-source: False
André Anjos's avatar
André Anjos committed
477
478


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
479
480
481
The iVector statistical model is stored in this container
:py:class:`bob.learn.em.IVectorMachine` and the training is performed by
:py:class:`bob.learn.em.IVectorTrainer`. The snippet bellow shows how to train
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
482
a Total variability modeling.
André Anjos's avatar
André Anjos committed
483
484
485
486

.. doctest::
   :options: +NORMALIZE_WHITESPACE

487
488
489
490
491
492
493
494
495
496
497
    >>> import bob.learn.em
    >>> import numpy
    >>> numpy.random.seed(10)
    >>>
    >>> # Generating some fake data
    >>> data_class1 = numpy.random.normal(0, 0.5, (10, 3))
    >>> data_class2 = numpy.random.normal(-0.2, 0.2, (10, 3))
    >>> data = [data_class1, data_class2]
    >>>
    >>> # Creating a fake prior with 2 gaussians of dimension 3
    >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
498
499
    >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)),
    ...                                 numpy.random.normal(1, 0.5, (1, 3))))
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
    >>> # All nice and round diagonal covariance
    >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5
    >>> prior_gmm.weights = numpy.array([0.3, 0.7])
    >>>
    >>> # The input the the TV Training is the statistics of the GMM
    >>> gmm_stats_per_class = []
    >>> for d in data:
    ...     for i in d:
    ...       gmm_stats_container = bob.learn.em.GMMStats(2, 3)
    ...       prior_gmm.acc_statistics(i, gmm_stats_container)
    ...       gmm_stats_per_class.append(gmm_stats_container)
    >>>
    >>> # Finally doing the TV training
    >>> subspace_dimension_of_t = 2
    >>>
    >>> ivector_trainer = bob.learn.em.IVectorTrainer(update_sigma=True)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
516
517
    >>> ivector_machine = bob.learn.em.IVectorMachine(
    ...     prior_gmm, subspace_dimension_of_t, 10e-5)
518
    >>> # train IVector model
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
519
520
    >>> bob.learn.em.train(ivector_trainer, ivector_machine,
    ...                    gmm_stats_per_class, 500)
521
522
    >>>
    >>> # Printing the session offset w.r.t each Gaussian component
523
    >>> print(ivector_machine.t)
Tiago Pereira's avatar
Tiago Pereira committed
524
525
526
527
528
529
     [[ 0.11  -0.203]
      [-0.124  0.014]
      [ 0.296  0.674]
      [ 0.447  0.174]
      [ 0.425  0.583]
      [ 0.394  0.794]]
530
531

Linear Scoring
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
532
==============
533
534
.. _linearscoring:

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
535
536
537
In :ref:`MAP <map>` adaptation, :ref:`ISV <isv>` and :ref:`JFA <jfa>` a
traditional way to do scoring is via the log-likelihood ratio between the
adapted model and the prior as the following:
André Anjos's avatar
André Anjos committed
538

539
540
.. math::
   score = ln(P(x | \Theta)) -  ln(P(x | \Theta_{prior})),
André Anjos's avatar
André Anjos committed
541
542


543
(with :math:`\Theta` varying for each approach).
André Anjos's avatar
André Anjos committed
544

545
546
A simplification proposed by [Glembek2009]_, called linear scoring,
approximate this ratio using a first order Taylor series as the following:
André Anjos's avatar
André Anjos committed
547

548
549
.. math::
   score = \frac{\mu - \mu_{prior}}{\sigma_{prior}} f * (\mu_{prior} + U_x),
André Anjos's avatar
André Anjos committed
550

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
551
552
553
554
where :math:`\mu` is the the GMM mean supervector (of the prior and the adapted
model), :math:`\sigma` is the variance, supervector, :math:`f` is the first
order GMM statistics (:py:class:`bob.learn.em.GMMStats.sum_px`) and
:math:`U_x`, is possible channel offset (:ref:`ISV <isv>`).
André Anjos's avatar
André Anjos committed
555

556
557
This scoring technique is implemented in :py:func:`bob.learn.em.linear_scoring`.
The snippet bellow shows how to compute scores using this approximation.
André Anjos's avatar
André Anjos committed
558
559
560
561

.. doctest::
   :options: +NORMALIZE_WHITESPACE

562
563
564
565
566
567
568
569
570
571
572
573
574
   >>> import bob.learn.em
   >>> import numpy
   >>> # Defining a fake prior
   >>> prior_gmm = bob.learn.em.GMMMachine(3, 2)
   >>> prior_gmm.means = numpy.array([[1, 1], [2, 2.1], [3, 3]])
   >>> # Defining a fake prior
   >>> adapted_gmm = bob.learn.em.GMMMachine(3,2)
   >>> adapted_gmm.means = numpy.array([[1.5, 1.5], [2.5, 2.5], [2, 2]])
   >>> # Defining an input
   >>> input = numpy.array([[1.5, 1.5], [1.6, 1.6]])
   >>> #Accumulating statistics of the GMM
   >>> stats = bob.learn.em.GMMStats(3, 2)
   >>> prior_gmm.acc_statistics(input, stats)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
575
576
577
578
   >>> score = bob.learn.em.linear_scoring(
   ...     [adapted_gmm], prior_gmm, [stats], [],
   ...     frame_length_normalisation=True)
   >>> print(score)
Tiago Pereira's avatar
Tiago Pereira committed
579
    [[ 0.254]]
André Anjos's avatar
André Anjos committed
580
581
582


Probabilistic Linear Discriminant Analysis (PLDA)
583
-------------------------------------------------
André Anjos's avatar
André Anjos committed
584

585
Probabilistic Linear Discriminant Analysis [5]_ is a probabilistic model that
André Anjos's avatar
André Anjos committed
586
587
588
589
590
591
592
593
594
595
596
incorporates components describing both between-class and within-class
variations. Given a mean :math:`\mu`, between-class and within-class subspaces
:math:`F` and :math:`G` and residual noise :math:`\epsilon` with zero mean and
diagonal covariance matrix :math:`\Sigma`, the model assumes that a sample
:math:`x_{i,j}` is generated by the following process:

.. math::

   x_{i,j} = \mu + F h_{i} + G w_{i,j} + \epsilon_{i,j}


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
597
An Expectation-Maximization algorithm can be used to learn the parameters of
André Anjos's avatar
André Anjos committed
598
599
this model :math:`\mu`, :math:`F` :math:`G` and :math:`\Sigma`. As these
parameters can be shared between classes, there is a specific container class
600
for this purpose, which is :py:class:`bob.learn.em.PLDABase`. The process is
601
described in detail in [6]_.
André Anjos's avatar
André Anjos committed
602
603
604
605
606
607
608

Let us consider a training set of two classes, each with 3 samples of
dimensionality 3.

.. doctest::
   :options: +NORMALIZE_WHITESPACE

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
609
610
611
612
613
614
615
616
   >>> data1 = numpy.array(
   ...     [[3,-3,100],
   ...      [4,-4,50],
   ...      [40,-40,150]], dtype=numpy.float64)
   >>> data2 = numpy.array(
   ...     [[3,6,-50],
   ...      [4,8,-100],
   ...      [40,79,-800]], dtype=numpy.float64)
André Anjos's avatar
André Anjos committed
617
618
619
   >>> data = [data1,data2]

Learning a PLDA model can be performed by instantiating the class
620
:py:class:`bob.learn.em.PLDATrainer`, and calling the
621
:py:meth:`bob.learn.em.train` method.
André Anjos's avatar
André Anjos committed
622
623
624

.. doctest::

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
625
626
   >>> # This creates a PLDABase container for input feature of dimensionality
   >>> # 3 and with subspaces F and G of rank 1 and 2, respectively.
627
   >>> pldabase = bob.learn.em.PLDABase(3,1,2)
André Anjos's avatar
André Anjos committed
628

629
   >>> trainer = bob.learn.em.PLDATrainer()
630
   >>> bob.learn.em.train(trainer, pldabase, data, max_iterations=10)
André Anjos's avatar
André Anjos committed
631
632
633

Once trained, this PLDA model can be used to compute the log-likelihood of a
set of samples given some hypothesis. For this purpose, a
634
:py:class:`bob.learn.em.PLDAMachine` should be instantiated. Then, the
André Anjos's avatar
André Anjos committed
635
636
637
log-likelihood that a set of samples share the same latent identity variable
:math:`h_{i}` (i.e. the samples are coming from the same identity/class) is
obtained by calling the
638
:py:meth:`bob.learn.em.PLDAMachine.compute_log_likelihood()` method.
André Anjos's avatar
André Anjos committed
639
640
641

.. doctest::

642
   >>> plda = bob.learn.em.PLDAMachine(pldabase)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
643
644
645
   >>> samples = numpy.array(
   ...     [[3.5,-3.4,102],
   ...      [4.5,-4.3,56]], dtype=numpy.float64)
André Anjos's avatar
André Anjos committed
646
647
648
   >>> loglike = plda.compute_log_likelihood(samples)

If separate models for different classes need to be enrolled, each of them with
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
649
a set of enrollment samples, then, several instances of
650
651
:py:class:`bob.learn.em.PLDAMachine` need to be created and enrolled using
the :py:meth:`bob.learn.em.PLDATrainer.enroll()` method as follows.
André Anjos's avatar
André Anjos committed
652
653
654

.. doctest::

655
   >>> plda1 = bob.learn.em.PLDAMachine(pldabase)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
656
657
658
   >>> samples1 = numpy.array(
   ...     [[3.5,-3.4,102],
   ...      [4.5,-4.3,56]], dtype=numpy.float64)
659
   >>> trainer.enroll(plda1, samples1)
660
   >>> plda2 = bob.learn.em.PLDAMachine(pldabase)
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
661
662
663
   >>> samples2 = numpy.array(
   ...     [[3.5,7,-49],
   ...      [4.5,8.9,-99]], dtype=numpy.float64)
664
   >>> trainer.enroll(plda2, samples2)
André Anjos's avatar
André Anjos committed
665
666
667
668
669
670
671
672
673
674
675

Afterwards, the joint log-likelihood of the enrollment samples and of one or
several test samples can be computed as previously described, and this
separately for each model.

.. doctest::

   >>> sample = numpy.array([3.2,-3.3,58], dtype=numpy.float64)
   >>> l1 = plda1.compute_log_likelihood(sample)
   >>> l2 = plda2.compute_log_likelihood(sample)

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
676
677
678
679
680
681
682
683
In a verification scenario, there are two possible hypotheses:

#. :math:`x_{test}` and :math:`x_{enroll}` share the same class.
#. :math:`x_{test}` and :math:`x_{enroll}` are from different classes.

Using the methods :py:meth:`bob.learn.em.PLDAMachine.log_likelihood_ratio` or
its alias ``__call__`` function, the corresponding log-likelihood ratio will be
computed, which is defined in more formal way by:
684
:math:`s = \ln(P(x_{test},x_{enroll})) - \ln(P(x_{test})P(x_{enroll}))`
André Anjos's avatar
André Anjos committed
685
686
687
688
689
690
691
692
693
694
695
696

.. doctest::

   >>> s1 = plda1(sample)
   >>> s2 = plda2(sample)

.. testcleanup:: *

  import shutil
  os.chdir(current_directory)
  shutil.rmtree(temp_dir)

697
698
699
700

Score Normalization
-------------------

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
701
702
Score normalization aims to compensate statistical variations in output scores
due to changes in the conditions across different enrollment and probe samples.
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
703
704
This is achieved by scaling distributions of system output scores to better
facilitate the application of a single, global threshold for authentication.
705

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
706
707
Bob has implemented 3 different strategies to normalize scores and these
strategies are presented in the next subsections.
708
709
710
711
712

Z-Norm
======
.. _znorm:

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
713
714
715
716
Given a score :math:`s_i`, Z-Norm [Auckenthaler2000]_ and [Mariethoz2005]_
(zero-normalization) scales this value by the mean (:math:`\mu`) and standard
deviation (:math:`\sigma`) of an impostor score distribution. This score
distribution can be computed before hand and it is defined as the following.
717
718
719
720
721
722

.. math::

   zs_i = \frac{s_i - \mu}{\sigma}


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
723
724
This scoring technique is implemented in :py:func:`bob.learn.em.znorm`. Follow
bellow an example of score normalization using :py:func:`bob.learn.em.znorm`.
725
726
727
728
729

.. plot:: plot/plot_Znorm.py
   :include-source: True

.. note::
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
730

731
732
733
734
735
736
737
   Observe how the scores were scaled in the plot above.


T-Norm
======
.. _tnorm:

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
738
739
740
741
742
743
744
T-norm [Auckenthaler2000]_ and [Mariethoz2005]_ (Test-normalization) operates
in a probe-centric manner. If in the Z-Norm :math:`\mu` and :math:`\sigma` are
estimated using an impostor set of models and its scores, the t-norm computes
these statistics using the current probe sample against at set of models in a
co-hort :math:`\Theta_{c}`. A co-hort can be any semantic organization that is
sensible to your recognition task, such as sex (male and females), ethnicity,
age, etc and is defined as the following.
745
746
747
748
749

.. math::

   zs_i = \frac{s_i - \mu}{\sigma}

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
750
751
752
753
754
where, :math:`s_i` is :math:`P(x_i | \Theta)` (the score given the claimed
model), :math:`\mu = \frac{ \sum\limits_{i=0}^{N} P(x_i | \Theta_{c}) }{N}`
(:math:`\Theta_{c}` are the models of one co-hort) and :math:`\sigma` is the
standard deviation computed using the same criteria used to compute
:math:`\mu`.
755
756


Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
757
758
This scoring technique is implemented in :py:func:`bob.learn.em.tnorm`. Follow
bellow an example of score normalization using :py:func:`bob.learn.em.tnorm`.
759
760
761
762
763
764

.. plot:: plot/plot_Tnorm.py
   :include-source: True


.. note::
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
765
766
767
768

   T-norm introduces extra computation during scoring, as the probe samples
   need to be compared to each cohort model in order to have :math:`\mu` and
   :math:`\sigma`.
769
770
771
772
773
774


ZT-Norm
=======
.. _ztnorm:

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
775
776
777
ZT-Norm [Auckenthaler2000]_ and [Mariethoz2005]_ consists in the application of
:ref:`Z-Norm <znorm>` followed by a :ref:`T-Norm <tnorm>` and it is implemented
in :py:func:`bob.learn.em.ztnorm`.
778

Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
779
780
Follow bellow an example of score normalization using
:py:func:`bob.learn.em.ztnorm`.
781
782
783
784
785

.. plot:: plot/plot_ZTnorm.py
   :include-source: True

.. note::
Amir MOHAMMADI's avatar
Amir MOHAMMADI committed
786

787
788
789
   Observe how the scores were scaled in the plot above.


André Anjos's avatar
André Anjos committed
790
791
792
793
794
795
.. Place here your external references
.. include:: links.rst
.. [1] http://dx.doi.org/10.1109/TASL.2006.881693
.. [2] http://publications.idiap.ch/index.php/publications/show/2606
.. [3] http://dx.doi.org/10.1016/j.csl.2007.05.003
.. [4] http://dx.doi.org/10.1109/TASL.2010.2064307
796
797
.. [5] http://dx.doi.org/10.1109/ICCV.2007.4409052
.. [6] http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.38
André Anjos's avatar
André Anjos committed
798
799
.. [7] http://en.wikipedia.org/wiki/K-means_clustering
.. [8] http://en.wikipedia.org/wiki/Expectation-maximization_algorithm
800
801
802
803
.. [9] http://en.wikipedia.org/wiki/Maximum_likelihood
.. [10] http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation