guide.rst 28.9 KB
 André Anjos committed May 26, 2014 1 2 3 4 5 6 7 .. vim: set fileencoding=utf-8 : .. testsetup:: * import numpy numpy.set_printoptions(precision=3, suppress=True)  8  import bob.learn.em  André Anjos committed May 26, 2014 9 10 11 12  import os import tempfile current_directory = os.path.realpath(os.curdir)  André Anjos committed May 26, 2014 13  temp_dir = tempfile.mkdtemp(prefix='bob_doctest_')  André Anjos committed May 26, 2014 14 15 16 17 18 19  os.chdir(temp_dir) ============ User guide ============  Amir MOHAMMADI committed May 30, 2017 20 21 22 23 24 25 26 27 28 29 30 The EM algorithm is an iterative method that estimates parameters for statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step [8]_. *Machines* and *trainers* are the core components of Bob's machine learning packages. *Machines* represent statistical models or other functions defined by  Amir MOHAMMADI committed May 30, 2017 31 parameters that can be learned by *trainers* or manually set. Below you will  Amir MOHAMMADI committed May 30, 2017 32 find machine/trainer guides for learning techniques available in this package.  André Anjos committed May 26, 2014 33 34   Tiago Pereira committed May 30, 2017 35 36 37 K-Means ------- .. _kmeans:  André Anjos committed May 26, 2014 38   Amir MOHAMMADI committed May 30, 2017 39 40 **k-means** [7]_ is a clustering method which aims to partition a set of :math:N observations into  Tiago Pereira committed May 30, 2017 41 :math:C clusters with equal variance minimizing the following cost function  Amir MOHAMMADI committed May 30, 2017 42 43 :math:J = \sum_{i=0}^{N} \min_{\mu_j \in C} ||x_i - \mu_j||, where :math:\mu is a given mean (also called centroid) and  Tiago Pereira committed May 30, 2017 44 :math:x_i is an observation.  André Anjos committed May 26, 2014 45   Amir MOHAMMADI committed May 30, 2017 46 This implementation has two stopping criteria. The first one is when the  Amir MOHAMMADI committed May 30, 2017 47 48 49 maximum number of iterations is reached; the second one is when the difference between :math:Js of successive iterations are lower than a convergence threshold.  André Anjos committed May 26, 2014 50   Amir MOHAMMADI committed May 30, 2017 51 52 In this implementation, the training consists in the definition of the statistical model, called machine, (:py:class:bob.learn.em.KMeansMachine) and  Amir MOHAMMADI committed May 30, 2017 53 this statistical model is learned via a trainer  Amir MOHAMMADI committed May 30, 2017 54 (:py:class:bob.learn.em.KMeansTrainer).  André Anjos committed May 26, 2014 55   Amir MOHAMMADI committed May 30, 2017 56 Follow bellow an snippet on how to train a KMeans using Bob_.  André Anjos committed May 26, 2014 57 58 59 60  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 61 62  >>> import bob.learn.em >>> import numpy  Amir MOHAMMADI committed May 30, 2017 63 64 65 66 67 68 69 70  >>> data = numpy.array( ... [[3,-3,100], ... [4,-4,98], ... [3.5,-3.5,99], ... [-7,7,-100], ... [-5,5,-101]], dtype='float64') >>> # Create a kmeans m with k=2 clusters with a dimensionality equal to 3 >>> kmeans_machine = bob.learn.em.KMeansMachine(2, 3)  Tiago Pereira committed May 30, 2017 71 72 73  >>> kmeans_trainer = bob.learn.em.KMeansTrainer() >>> max_iterations = 200 >>> convergence_threshold = 1e-5  Amir MOHAMMADI committed May 30, 2017 74 75 76 77  >>> # Train the KMeansMachine >>> bob.learn.em.train(kmeans_trainer, kmeans_machine, data, ... max_iterations=max_iterations, ... convergence_threshold=convergence_threshold)  Tiago Pereira committed May 30, 2017 78 79 80  >>> print(kmeans_machine.means) [[ -6. 6. -100.5] [ 3.5 -3.5 99. ]]  André Anjos committed May 26, 2014 81 82   Amir MOHAMMADI committed May 30, 2017 83 84 Bellow follow an intuition (source code + plot) of a kmeans training using the Iris flower dataset _.  André Anjos committed May 26, 2014 85   Tiago Pereira committed May 30, 2017 86 87 .. plot:: plot/plot_kmeans.py :include-source: False  André Anjos committed May 26, 2014 88 89 90 91  Gaussian mixture models  Tiago Pereira committed May 30, 2017 92 -----------------------  André Anjos committed May 26, 2014 93 94   Amir MOHAMMADI committed May 30, 2017 95 96 97 98 99 100 A Gaussian mixture model (GMM _) is a probabilistic model for density estimation. It assumes that all the data points are generated from a mixture of a finite number of Gaussian distributions. More formally, a GMM can be defined as: :math:P(x|\Theta) = \sum_{c=0}^{C} \omega_c \mathcal{N}(x | \mu_c, \sigma_c) , where :math:\Theta = \{ \omega_c, \mu_c, \sigma_c \}.  André Anjos committed May 26, 2014 101   Amir MOHAMMADI committed May 30, 2017 102 103 This statistical model is defined in the class :py:class:bob.learn.em.GMMMachine as bellow.  André Anjos committed May 26, 2014 104 105  .. doctest::  Tiago Pereira committed May 30, 2017 106  :options: +NORMALIZE_WHITESPACE  André Anjos committed May 26, 2014 107   Tiago Pereira committed May 30, 2017 108  >>> import bob.learn.em  Amir MOHAMMADI committed May 30, 2017 109  >>> # Create a GMM with k=2 Gaussians with the dimensionality of 3  Tiago Pereira committed May 30, 2017 110  >>> gmm_machine = bob.learn.em.GMMMachine(2, 3)  André Anjos committed May 26, 2014 111 112   Amir MOHAMMADI committed May 30, 2017 113 114 There are plenty of ways to estimate :math:\Theta; the next subsections explains some that are implemented in Bob.  André Anjos committed May 26, 2014 115 116   Tiago Pereira committed May 30, 2017 117 Maximum likelihood Estimator (MLE)  André Anjos committed May 26, 2014 118 ==================================  Tiago Pereira committed May 30, 2017 119 .. _mle:  André Anjos committed May 26, 2014 120   Amir MOHAMMADI committed May 30, 2017 121 122 123 In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model given observations by finding the :math:\Theta that maximizes :math:P(x|\Theta) for all :math:x in your  Tiago de Freitas Pereira committed Apr 09, 2018 124 dataset [9]_. This optimization is done by the **Expectation-Maximization**  Amir MOHAMMADI committed May 30, 2017 125 126 (EM) algorithm [8]_ and it is implemented by :py:class:bob.learn.em.ML_GMMTrainer.  André Anjos committed May 26, 2014 127   Amir MOHAMMADI committed May 30, 2017 128 129 A very nice explanation of EM algorithm for the maximum likelihood estimation can be found in this  Amir MOHAMMADI committed May 30, 2017 130 Mathematical Monk _ YouTube  Amir MOHAMMADI committed May 30, 2017 131 video.  André Anjos committed May 26, 2014 132   Amir MOHAMMADI committed May 30, 2017 133 134 Follow bellow an snippet on how to train a GMM using the maximum likelihood estimator.  André Anjos committed May 26, 2014 135   Tiago de Freitas Pereira committed Feb 17, 2015 136 137 138 139  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 140 141  >>> import bob.learn.em >>> import numpy  Amir MOHAMMADI committed May 30, 2017 142 143 144 145 146 147 148 149  >>> data = numpy.array( ... [[3,-3,100], ... [4,-4,98], ... [3.5,-3.5,99], ... [-7,7,-100], ... [-5,5,-101]], dtype='float64') >>> # Create a kmeans model (machine) m with k=2 clusters >>> # with a dimensionality equal to 3  Tiago Pereira committed May 30, 2017 150  >>> gmm_machine = bob.learn.em.GMMMachine(2, 3)  Amir MOHAMMADI committed May 30, 2017 151 152 153 154  >>> # Using the MLE trainer to train the GMM: >>> # True, True, True means update means/variances/weights at each >>> # iteration >>> gmm_trainer = bob.learn.em.ML_GMMTrainer(True, True, True)  Tiago Pereira committed May 30, 2017 155 156  >>> # Setting some means to start the training. >>> # In practice, the output of kmeans is a good start for the MLE training  Amir MOHAMMADI committed May 30, 2017 157 158 159  >>> gmm_machine.means = numpy.array( ... [[ -4., 2.3, -10.5], ... [ 2.5, -4.5, 59. ]])  Tiago Pereira committed May 30, 2017 160 161 162  >>> max_iterations = 200 >>> convergence_threshold = 1e-5 >>> # Training  Amir MOHAMMADI committed May 30, 2017 163 164 165  >>> bob.learn.em.train(gmm_trainer, gmm_machine, data, ... max_iterations=max_iterations, ... convergence_threshold=convergence_threshold)  Tiago Pereira committed May 30, 2017 166 167 168  >>> print(gmm_machine.means) [[ -6. 6. -100.5] [ 3.5 -3.5 99. ]]  Tiago de Freitas Pereira committed Feb 17, 2015 169   Amir MOHAMMADI committed May 30, 2017 170 171 172 Bellow follow an intuition of the GMM trained the maximum likelihood estimator using the Iris flower dataset _.  Tiago de Freitas Pereira committed Feb 17, 2015 173   Tiago Pereira committed May 30, 2017 174 175 .. plot:: plot/plot_ML.py :include-source: False  Tiago de Freitas Pereira committed Feb 17, 2015 176 177   Tiago Pereira committed May 30, 2017 178 179 180 Maximum a posteriori Estimator (MAP) ==================================== .. _map:  Tiago de Freitas Pereira committed Feb 17, 2015 181   Amir MOHAMMADI committed May 30, 2017 182 183 Closely related to the MLE, Maximum a posteriori probability (MAP) is an estimate that equals the mode of the posterior distribution by incorporating in  Tiago de Freitas Pereira committed Apr 09, 2018 184 its loss function a prior distribution [10]_. Commonly this prior distribution  Amir MOHAMMADI committed May 30, 2017 185 186 187 188 189 190 191 192 (the values of :math:\Theta) is estimated with MLE. This optimization is done by the **Expectation-Maximization** (EM) algorithm [8]_ and it is implemented by :py:class:bob.learn.em.MAP_GMMTrainer. A compact way to write relevance MAP adaptation is by using GMM supervector notation (this will be useful in the next subsections). The GMM supervector notation consists of taking the parameters of :math:\Theta (weights, means and covariance matrices) of a GMM and create a single vector or matrix to  Amir MOHAMMADI committed May 30, 2017 193 represent each of them. For each Gaussian component :math:c, we can  Amir MOHAMMADI committed May 30, 2017 194 195 represent the MAP adaptation as the following :math:\mu_i = m + d_i, where :math:m is our prior and :math:d_i is the class offset.  Tiago de Freitas Pereira committed Feb 17, 2015 196   Tiago Pereira committed May 30, 2017 197 Follow bellow an snippet on how to train a GMM using the MAP estimator.  André Anjos committed May 26, 2014 198 199 200 201 202  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 203 204  >>> import bob.learn.em >>> import numpy  Amir MOHAMMADI committed May 30, 2017 205 206 207 208 209 210  >>> data = numpy.array( ... [[3,-3,100], ... [4,-4,98], ... [3.5,-3.5,99], ... [-7,7,-100], ... [-5,5,-101]], dtype='float64')  Tiago Pereira committed May 30, 2017 211 212  >>> # Creating a fake prior >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)  Amir MOHAMMADI committed May 30, 2017 213 214 215 216  >>> # Set some random means for the example >>> prior_gmm.means = numpy.array( ... [[ -4., 2.3, -10.5], ... [ 2.5, -4.5, 59. ]])  Tiago Pereira committed May 30, 2017 217 218 219 220 221 222 223 224  >>> # Creating the model for the adapted GMM >>> adapted_gmm = bob.learn.em.GMMMachine(2, 3) >>> # Creating the MAP trainer >>> gmm_trainer = bob.learn.em.MAP_GMMTrainer(prior_gmm, relevance_factor=4) >>> >>> max_iterations = 200 >>> convergence_threshold = 1e-5 >>> # Training  Amir MOHAMMADI committed May 30, 2017 225 226 227  >>> bob.learn.em.train(gmm_trainer, adapted_gmm, data, ... max_iterations=max_iterations, ... convergence_threshold=convergence_threshold)  Tiago Pereira committed May 30, 2017 228  >>> print(adapted_gmm.means)  Tiago Pereira committed May 30, 2017 229 230  [[ -4.667 3.533 -40.5 ] [ 2.929 -4.071 76.143]]  André Anjos committed May 26, 2014 231   Amir MOHAMMADI committed May 30, 2017 232 233 Bellow follow an intuition of the GMM trained with the MAP estimator using the Iris flower dataset _.  André Anjos committed May 26, 2014 234   Tiago Pereira committed May 30, 2017 235 236 .. plot:: plot/plot_MAP.py :include-source: False  André Anjos committed May 26, 2014 237 238   Tiago Pereira committed May 30, 2017 239 Session Variability Modeling with Gaussian Mixture Models  Amir MOHAMMADI committed May 30, 2017 240 ---------------------------------------------------------  André Anjos committed May 26, 2014 241   Amir MOHAMMADI committed May 30, 2017 242 243 244 In the aforementioned GMM based algorithms there is no explicit modeling of session variability. This section will introduce some session variability algorithms built on top of GMMs.  André Anjos committed May 26, 2014 245 246   Tiago Pereira committed May 30, 2017 247 GMM statistics  Amir MOHAMMADI committed May 30, 2017 248 249 250 251 252 253 254 255 ============== Before introduce session variability for GMM based algorithms, we must introduce a component called :py:class:bob.learn.em.GMMStats. This component is useful for some computation in the next sections. :py:class:bob.learn.em.GMMStats is a container that solves the Equations 8, 9 and 10 in [Reynolds2000]_ (also called, zeroth, first and second order GMM statistics).  André Anjos committed May 26, 2014 256   Amir MOHAMMADI committed May 30, 2017 257 Given a GMM (:math:\Theta) and a set of samples :math:x_t this component  Amir MOHAMMADI committed May 30, 2017 258 accumulates statistics for each Gaussian component :math:c.  André Anjos committed May 26, 2014 259   Amir MOHAMMADI committed May 30, 2017 260 261 Follow bellow a 1-1 relationship between statistics in [Reynolds2000]_ and the properties in :py:class:bob.learn.em.GMMStats:  André Anjos committed May 26, 2014 262   Amir MOHAMMADI committed May 30, 2017 263 264 265 266 267 268  - Eq (8) is :py:class:bob.learn.em.GMMStats.n: :math:n_c=\sum\limits_{t=1}^T Pr(c | x_t) (also called responsibilities) - Eq (9) is :py:class:bob.learn.em.GMMStats.sum_px: :math:E_c(x)=\frac{1}{n(c)}\sum\limits_{t=1}^T Pr(c | x_t)x_t - Eq (10) is :py:class:bob.learn.em.GMMStats.sum_pxx: :math:E_c(x^2)=\frac{1}{n(c)}\sum\limits_{t=1}^T Pr(c | x_t)x_t^2  André Anjos committed May 26, 2014 269   Amir MOHAMMADI committed May 30, 2017 270 where :math:T is the number of samples used to generate the stats.  André Anjos committed May 26, 2014 271   Amir MOHAMMADI committed May 30, 2017 272 273 The snippet bellow shows how to compute accumulated these statistics given a prior GMM.  André Anjos committed May 26, 2014 274 275 276 277 278  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 279 280 281 282  >>> import bob.learn.em >>> import numpy >>> numpy.random.seed(10) >>>  Amir MOHAMMADI committed May 30, 2017 283 284 285 286 287 288  >>> data = numpy.array( ... [[0, 0.3, -0.2], ... [0.4, 0.1, 0.15], ... [-0.3, -0.1, 0], ... [1.2, 1.4, 1], ... [0.8, 1., 1]], dtype='float64')  Amir MOHAMMADI committed May 30, 2017 289  >>> # Creating a fake prior with 2 Gaussians of dimension 3  Tiago Pereira committed May 30, 2017 290  >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)  Amir MOHAMMADI committed May 30, 2017 291 292  >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)), ... numpy.random.normal(1, 0.5, (1, 3))))  Tiago Pereira committed May 30, 2017 293 294 295 296 297 298  >>> # All nice and round diagonal covariance >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5 >>> prior_gmm.weights = numpy.array([0.3, 0.7]) >>> # Creating the container >>> gmm_stats_container = bob.learn.em.GMMStats(2, 3) >>> for d in data:  Tiago Pereira committed May 30, 2017 299  ... prior_gmm.acc_statistics(d, gmm_stats_container)  Tiago Pereira committed May 30, 2017 300 301  >>> >>> # Printing the responsibilities  Tiago de Freitas Pereira committed May 30, 2017 302  >>> print(gmm_stats_container.n/gmm_stats_container.t)  Tiago Pereira committed May 30, 2017 303  [ 0.429 0.571]  André Anjos committed May 26, 2014 304 305   Tiago Pereira committed May 30, 2017 306 Inter-Session Variability  Amir MOHAMMADI committed May 30, 2017 307 =========================  Tiago Pereira committed May 30, 2017 308 .. _isv:  André Anjos committed May 26, 2014 309   Amir MOHAMMADI committed May 30, 2017 310 311 Inter-Session Variability (ISV) modeling [3]_ [2]_ is a session variability modeling technique built on top of the Gaussian mixture modeling approach. It  Amir MOHAMMADI committed May 30, 2017 312 hypothesizes that within-class variations are embedded in a linear subspace in  Amir MOHAMMADI committed May 30, 2017 313 the GMM means subspace and these variations can be suppressed by an offset w.r.t  Amir MOHAMMADI committed May 30, 2017 314 315 316 317 318 319 320 321 322 each mean during the MAP adaptation. In this generative model each sample is assumed to have been generated by a GMM mean supervector with the following shape: :math:\mu_{i, j} = m + Ux_{i, j} + D_z{i}, where :math:m is our prior, :math:Ux_{i, j} is the session offset that we want to suppress and :math:D_z{i} is the class offset (with all session effects suppressed). All possible sources of session variations is embedded in this matrix  Amir MOHAMMADI committed May 30, 2017 323 :math:U. Follow bellow an intuition of what is modeled with :math:U in the  Amir MOHAMMADI committed May 30, 2017 324 325 Iris flower dataset _. The arrows :math:U_{1}, :math:U_{2} and :math:U_{3} are the directions of  Amir MOHAMMADI committed May 30, 2017 326 the within class variations, with respect to each Gaussian component, that will  Amir MOHAMMADI committed May 30, 2017 327 be suppressed a posteriori.  André Anjos committed May 26, 2014 328   Tiago Pereira committed May 30, 2017 329 330 .. plot:: plot/plot_ISV.py :include-source: False  André Anjos committed May 26, 2014 331 332   Amir MOHAMMADI committed May 30, 2017 333 334 335 The ISV statistical model is stored in this container :py:class:bob.learn.em.ISVBase and the training is performed by :py:class:bob.learn.em.ISVTrainer. The snippet bellow shows how to train a  Amir MOHAMMADI committed May 30, 2017 336 Intersession variability modeling.  André Anjos committed May 26, 2014 337 338 339 340 341  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 342 343 344 345 346 347 348 349 350 351 352  >>> import bob.learn.em >>> import numpy >>> numpy.random.seed(10) >>> >>> # Generating some fake data >>> data_class1 = numpy.random.normal(0, 0.5, (10, 3)) >>> data_class2 = numpy.random.normal(-0.2, 0.2, (10, 3)) >>> data = [data_class1, data_class2] >>> # Creating a fake prior with 2 gaussians of dimension 3 >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)  Amir MOHAMMADI committed May 30, 2017 353 354  >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)), ... numpy.random.normal(1, 0.5, (1, 3))))  Tiago Pereira committed May 30, 2017 355 356 357 358 359 360 361  >>> # All nice and round diagonal covariance >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5 >>> prior_gmm.weights = numpy.array([0.3, 0.7]) >>> # The input the the ISV Training is the statistics of the GMM >>> gmm_stats_per_class = [] >>> for d in data: ... stats = []  Tiago Pereira committed May 30, 2017 362 363 364 365 366  ... for i in d: ... gmm_stats_container = bob.learn.em.GMMStats(2, 3) ... prior_gmm.acc_statistics(i, gmm_stats_container) ... stats.append(gmm_stats_container) ... gmm_stats_per_class.append(stats)  Tiago Pereira committed May 30, 2017 367 368 369 370 371 372  >>> # Finally doing the ISV training >>> subspace_dimension_of_u = 2 >>> relevance_factor = 4 >>> isvbase = bob.learn.em.ISVBase(prior_gmm, subspace_dimension_of_u) >>> trainer = bob.learn.em.ISVTrainer(relevance_factor)  Amir MOHAMMADI committed May 30, 2017 373 374  >>> bob.learn.em.train(trainer, isvbase, gmm_stats_per_class, ... max_iterations=50)  Tiago Pereira committed May 30, 2017 375  >>> # Printing the session offset w.r.t each Gaussian component  Tiago de Freitas Pereira committed May 30, 2017 376  >>> print(isvbase.u)  Tiago Pereira committed May 30, 2017 377 378 379 380 381 382  [[-0.01 -0.027] [-0.002 -0.004] [ 0.028 0.074] [ 0.012 0.03 ] [ 0.033 0.085] [ 0.046 0.12 ]]  André Anjos committed May 26, 2014 383 384   Tiago Pereira committed May 30, 2017 385 Joint Factor Analysis  Amir MOHAMMADI committed May 30, 2017 386 =====================  Tiago Pereira committed May 30, 2017 387 .. _jfa:  André Anjos committed May 26, 2014 388   Amir MOHAMMADI committed May 30, 2017 389 390 Joint Factor Analysis (JFA) [1]_ [2]_ is an extension of ISV. Besides the within-class assumption (modeled with :math:U), it also hypothesize that  Amir MOHAMMADI committed May 30, 2017 391 392 between class variations are embedded in a low rank rectangular matrix :math:V. In the supervector notation, this modeling has the following shape:  Amir MOHAMMADI committed May 30, 2017 393 :math:\mu_{i, j} = m + Ux_{i, j} + Vy_{i} + D_z{i}.  André Anjos committed May 26, 2014 394   Amir MOHAMMADI committed May 30, 2017 395 Follow bellow an intuition of what is modeled with :math:U and :math:V in  Amir MOHAMMADI committed May 30, 2017 396 397 398 the Iris flower dataset _. The arrows :math:V_{1}, :math:V_{2} and :math:V_{3} are the directions of the  Amir MOHAMMADI committed May 30, 2017 399 between class variations with respect to each Gaussian component that will be  Amir MOHAMMADI committed May 30, 2017 400 added a posteriori.  André Anjos committed May 26, 2014 401 402   Tiago Pereira committed May 30, 2017 403 404 .. plot:: plot/plot_JFA.py :include-source: False  André Anjos committed May 26, 2014 405   Amir MOHAMMADI committed May 30, 2017 406 407 408 The JFA statistical model is stored in this container :py:class:bob.learn.em.JFABase and the training is performed by :py:class:bob.learn.em.JFATrainer. The snippet bellow shows how to train a  Amir MOHAMMADI committed May 30, 2017 409 Intersession variability modeling.  André Anjos committed May 26, 2014 410 411 412 413  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 414 415 416 417 418 419 420 421 422  >>> import bob.learn.em >>> import numpy >>> numpy.random.seed(10) >>> >>> # Generating some fake data >>> data_class1 = numpy.random.normal(0, 0.5, (10, 3)) >>> data_class2 = numpy.random.normal(-0.2, 0.2, (10, 3)) >>> data = [data_class1, data_class2]  Amir MOHAMMADI committed May 30, 2017 423  >>> # Creating a fake prior with 2 Gaussians of dimension 3  Tiago Pereira committed May 30, 2017 424  >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)  Amir MOHAMMADI committed May 30, 2017 425 426  >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)), ... numpy.random.normal(1, 0.5, (1, 3))))  Tiago Pereira committed May 30, 2017 427 428 429 430 431 432 433 434  >>> # All nice and round diagonal covariance >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5 >>> prior_gmm.weights = numpy.array([0.3, 0.7]) >>> >>> # The input the the JFA Training is the statistics of the GMM >>> gmm_stats_per_class = [] >>> for d in data: ... stats = []  Tiago Pereira committed May 30, 2017 435 436 437 438 439  ... for i in d: ... gmm_stats_container = bob.learn.em.GMMStats(2, 3) ... prior_gmm.acc_statistics(i, gmm_stats_container) ... stats.append(gmm_stats_container) ... gmm_stats_per_class.append(stats)  Tiago Pereira committed May 30, 2017 440 441 442 443 444  >>> >>> # Finally doing the JFA training >>> subspace_dimension_of_u = 2 >>> subspace_dimension_of_v = 2 >>> relevance_factor = 4  Amir MOHAMMADI committed May 30, 2017 445 446  >>> jfabase = bob.learn.em.JFABase(prior_gmm, subspace_dimension_of_u, ... subspace_dimension_of_v)  Tiago Pereira committed May 30, 2017 447  >>> trainer = bob.learn.em.JFATrainer()  Amir MOHAMMADI committed May 30, 2017 448 449  >>> bob.learn.em.train_jfa(trainer, jfabase, gmm_stats_per_class, ... max_iterations=50)  Tiago Pereira committed May 30, 2017 450 451  >>> # Printing the session offset w.r.t each Gaussian component  Tiago de Freitas Pereira committed May 30, 2017 452  >>> print(jfabase.v)  Tiago Pereira committed May 30, 2017 453 454 455 456 457 458  [[ 0.003 -0.006] [ 0.041 -0.084] [-0.261 0.53 ] [-0.252 0.51 ] [-0.387 0.785] [-0.36 0.73 ]]  Tiago Pereira committed May 30, 2017 459 460  Total variability Modelling  Amir MOHAMMADI committed May 30, 2017 461 ===========================  Tiago Pereira committed May 30, 2017 462 .. _ivector:  André Anjos committed May 26, 2014 463   Amir MOHAMMADI committed May 30, 2017 464 Total Variability (TV) modeling [4]_ is a front-end initially introduced for  Tiago Pereira committed May 30, 2017 465 speaker recognition, which aims at describing samples by vectors of low  Amir MOHAMMADI committed May 30, 2017 466 467 468 dimensionality called i-vectors. The model consists of a subspace :math:T and a residual diagonal covariance matrix :math:\Sigma, that are then used to extract i-vectors, and is built upon the GMM approach. In the supervector  Amir MOHAMMADI committed May 30, 2017 469 notation this modeling has the following shape: :math:\mu = m + Tv.  André Anjos committed May 26, 2014 470   Amir MOHAMMADI committed May 30, 2017 471 472 473 Follow bellow an intuition of the data from the Iris flower dataset _, embedded in the iVector space.  André Anjos committed May 26, 2014 474   Tiago Pereira committed May 30, 2017 475 476 .. plot:: plot/plot_iVector.py :include-source: False  André Anjos committed May 26, 2014 477 478   Amir MOHAMMADI committed May 30, 2017 479 480 481 The iVector statistical model is stored in this container :py:class:bob.learn.em.IVectorMachine and the training is performed by :py:class:bob.learn.em.IVectorTrainer. The snippet bellow shows how to train  Amir MOHAMMADI committed May 30, 2017 482 a Total variability modeling.  André Anjos committed May 26, 2014 483 484 485 486  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 487 488 489 490 491 492 493 494 495 496 497  >>> import bob.learn.em >>> import numpy >>> numpy.random.seed(10) >>> >>> # Generating some fake data >>> data_class1 = numpy.random.normal(0, 0.5, (10, 3)) >>> data_class2 = numpy.random.normal(-0.2, 0.2, (10, 3)) >>> data = [data_class1, data_class2] >>> >>> # Creating a fake prior with 2 gaussians of dimension 3 >>> prior_gmm = bob.learn.em.GMMMachine(2, 3)  Amir MOHAMMADI committed May 30, 2017 498 499  >>> prior_gmm.means = numpy.vstack((numpy.random.normal(0, 0.5, (1, 3)), ... numpy.random.normal(1, 0.5, (1, 3))))  Tiago Pereira committed May 30, 2017 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515  >>> # All nice and round diagonal covariance >>> prior_gmm.variances = numpy.ones((2, 3)) * 0.5 >>> prior_gmm.weights = numpy.array([0.3, 0.7]) >>> >>> # The input the the TV Training is the statistics of the GMM >>> gmm_stats_per_class = [] >>> for d in data: ... for i in d: ... gmm_stats_container = bob.learn.em.GMMStats(2, 3) ... prior_gmm.acc_statistics(i, gmm_stats_container) ... gmm_stats_per_class.append(gmm_stats_container) >>> >>> # Finally doing the TV training >>> subspace_dimension_of_t = 2 >>> >>> ivector_trainer = bob.learn.em.IVectorTrainer(update_sigma=True)  Amir MOHAMMADI committed May 30, 2017 516 517  >>> ivector_machine = bob.learn.em.IVectorMachine( ... prior_gmm, subspace_dimension_of_t, 10e-5)  Tiago Pereira committed May 30, 2017 518  >>> # train IVector model  Amir MOHAMMADI committed May 30, 2017 519 520  >>> bob.learn.em.train(ivector_trainer, ivector_machine, ... gmm_stats_per_class, 500)  Tiago Pereira committed May 30, 2017 521 522  >>> >>> # Printing the session offset w.r.t each Gaussian component  Tiago de Freitas Pereira committed May 30, 2017 523  >>> print(ivector_machine.t)  Tiago Pereira committed May 30, 2017 524 525 526 527 528 529  [[ 0.11 -0.203] [-0.124 0.014] [ 0.296 0.674] [ 0.447 0.174] [ 0.425 0.583] [ 0.394 0.794]]  Tiago Pereira committed May 30, 2017 530 531  Linear Scoring  Amir MOHAMMADI committed May 30, 2017 532 ==============  Tiago Pereira committed May 30, 2017 533 534 .. _linearscoring:  Amir MOHAMMADI committed May 30, 2017 535 536 537 In :ref:MAP  adaptation, :ref:ISV  and :ref:JFA  a traditional way to do scoring is via the log-likelihood ratio between the adapted model and the prior as the following:  André Anjos committed May 26, 2014 538   Tiago Pereira committed May 30, 2017 539 540 .. math:: score = ln(P(x | \Theta)) - ln(P(x | \Theta_{prior})),  André Anjos committed May 26, 2014 541 542   Tiago Pereira committed May 30, 2017 543 (with :math:\Theta varying for each approach).  André Anjos committed May 26, 2014 544   Tiago Pereira committed May 30, 2017 545 546 A simplification proposed by [Glembek2009]_, called linear scoring, approximate this ratio using a first order Taylor series as the following:  André Anjos committed May 26, 2014 547   Tiago Pereira committed May 30, 2017 548 549 .. math:: score = \frac{\mu - \mu_{prior}}{\sigma_{prior}} f * (\mu_{prior} + U_x),  André Anjos committed May 26, 2014 550   Amir MOHAMMADI committed May 30, 2017 551 552 553 554 where :math:\mu is the the GMM mean supervector (of the prior and the adapted model), :math:\sigma is the variance, supervector, :math:f is the first order GMM statistics (:py:class:bob.learn.em.GMMStats.sum_px) and :math:U_x, is possible channel offset (:ref:ISV ).  André Anjos committed May 26, 2014 555   Tiago Pereira committed May 30, 2017 556 557 This scoring technique is implemented in :py:func:bob.learn.em.linear_scoring. The snippet bellow shows how to compute scores using this approximation.  André Anjos committed May 26, 2014 558 559 560 561  .. doctest:: :options: +NORMALIZE_WHITESPACE  Tiago Pereira committed May 30, 2017 562 563 564 565 566 567 568 569 570 571 572 573 574  >>> import bob.learn.em >>> import numpy >>> # Defining a fake prior >>> prior_gmm = bob.learn.em.GMMMachine(3, 2) >>> prior_gmm.means = numpy.array([[1, 1], [2, 2.1], [3, 3]]) >>> # Defining a fake prior >>> adapted_gmm = bob.learn.em.GMMMachine(3,2) >>> adapted_gmm.means = numpy.array([[1.5, 1.5], [2.5, 2.5], [2, 2]]) >>> # Defining an input >>> input = numpy.array([[1.5, 1.5], [1.6, 1.6]]) >>> #Accumulating statistics of the GMM >>> stats = bob.learn.em.GMMStats(3, 2) >>> prior_gmm.acc_statistics(input, stats)  Amir MOHAMMADI committed May 30, 2017 575 576 577 578  >>> score = bob.learn.em.linear_scoring( ... [adapted_gmm], prior_gmm, [stats], [], ... frame_length_normalisation=True) >>> print(score)  Tiago Pereira committed May 30, 2017 579  [[ 0.254]]  André Anjos committed May 26, 2014 580 581 582  Probabilistic Linear Discriminant Analysis (PLDA)  Tiago Pereira committed May 30, 2017 583 -------------------------------------------------  André Anjos committed May 26, 2014 584   Tiago de Freitas Pereira committed Apr 09, 2018 585 Probabilistic Linear Discriminant Analysis [5]_ is a probabilistic model that  André Anjos committed May 26, 2014 586 587 588 589 590 591 592 593 594 595 596 incorporates components describing both between-class and within-class variations. Given a mean :math:\mu, between-class and within-class subspaces :math:F and :math:G and residual noise :math:\epsilon with zero mean and diagonal covariance matrix :math:\Sigma, the model assumes that a sample :math:x_{i,j} is generated by the following process: .. math:: x_{i,j} = \mu + F h_{i} + G w_{i,j} + \epsilon_{i,j}  Amir MOHAMMADI committed May 30, 2017 597 An Expectation-Maximization algorithm can be used to learn the parameters of  André Anjos committed May 26, 2014 598 599 this model :math:\mu, :math:F :math:G and :math:\Sigma. As these parameters can be shared between classes, there is a specific container class  600 for this purpose, which is :py:class:bob.learn.em.PLDABase. The process is  Tiago de Freitas Pereira committed Apr 09, 2018 601 described in detail in [6]_.  André Anjos committed May 26, 2014 602 603 604 605 606 607 608  Let us consider a training set of two classes, each with 3 samples of dimensionality 3. .. doctest:: :options: +NORMALIZE_WHITESPACE  Amir MOHAMMADI committed May 30, 2017 609 610 611 612 613 614 615 616  >>> data1 = numpy.array( ... [[3,-3,100], ... [4,-4,50], ... [40,-40,150]], dtype=numpy.float64) >>> data2 = numpy.array( ... [[3,6,-50], ... [4,8,-100], ... [40,79,-800]], dtype=numpy.float64)  André Anjos committed May 26, 2014 617 618 619  >>> data = [data1,data2] Learning a PLDA model can be performed by instantiating the class  620 :py:class:bob.learn.em.PLDATrainer, and calling the  Tiago de Freitas Pereira committed Oct 15, 2016 621 :py:meth:bob.learn.em.train method.  André Anjos committed May 26, 2014 622 623 624  .. doctest::  Amir MOHAMMADI committed May 30, 2017 625 626  >>> # This creates a PLDABase container for input feature of dimensionality >>> # 3 and with subspaces F and G of rank 1 and 2, respectively.  627  >>> pldabase = bob.learn.em.PLDABase(3,1,2)  André Anjos committed May 26, 2014 628   629  >>> trainer = bob.learn.em.PLDATrainer()  Tiago de Freitas Pereira committed Feb 17, 2015 630  >>> bob.learn.em.train(trainer, pldabase, data, max_iterations=10)  André Anjos committed May 26, 2014 631 632 633  Once trained, this PLDA model can be used to compute the log-likelihood of a set of samples given some hypothesis. For this purpose, a  634 :py:class:bob.learn.em.PLDAMachine should be instantiated. Then, the  André Anjos committed May 26, 2014 635 636 637 log-likelihood that a set of samples share the same latent identity variable :math:h_{i} (i.e. the samples are coming from the same identity/class) is obtained by calling the  638 :py:meth:bob.learn.em.PLDAMachine.compute_log_likelihood() method.  André Anjos committed May 26, 2014 639 640 641  .. doctest::  642  >>> plda = bob.learn.em.PLDAMachine(pldabase)  Amir MOHAMMADI committed May 30, 2017 643 644 645  >>> samples = numpy.array( ... [[3.5,-3.4,102], ... [4.5,-4.3,56]], dtype=numpy.float64)  André Anjos committed May 26, 2014 646 647 648  >>> loglike = plda.compute_log_likelihood(samples) If separate models for different classes need to be enrolled, each of them with  Amir MOHAMMADI committed May 30, 2017 649 a set of enrollment samples, then, several instances of  Manuel Günther committed Mar 05, 2015 650 651 :py:class:bob.learn.em.PLDAMachine need to be created and enrolled using the :py:meth:bob.learn.em.PLDATrainer.enroll() method as follows.  André Anjos committed May 26, 2014 652 653 654  .. doctest::  655  >>> plda1 = bob.learn.em.PLDAMachine(pldabase)  Amir MOHAMMADI committed May 30, 2017 656 657 658  >>> samples1 = numpy.array( ... [[3.5,-3.4,102], ... [4.5,-4.3,56]], dtype=numpy.float64)  Manuel Günther committed Mar 05, 2015 659  >>> trainer.enroll(plda1, samples1)  660  >>> plda2 = bob.learn.em.PLDAMachine(pldabase)  Amir MOHAMMADI committed May 30, 2017 661 662 663  >>> samples2 = numpy.array( ... [[3.5,7,-49], ... [4.5,8.9,-99]], dtype=numpy.float64)  Manuel Günther committed Mar 05, 2015 664  >>> trainer.enroll(plda2, samples2)  André Anjos committed May 26, 2014 665 666 667 668 669 670 671 672 673 674 675  Afterwards, the joint log-likelihood of the enrollment samples and of one or several test samples can be computed as previously described, and this separately for each model. .. doctest:: >>> sample = numpy.array([3.2,-3.3,58], dtype=numpy.float64) >>> l1 = plda1.compute_log_likelihood(sample) >>> l2 = plda2.compute_log_likelihood(sample)  Amir MOHAMMADI committed May 30, 2017 676 677 678 679 680 681 682 683 In a verification scenario, there are two possible hypotheses: #. :math:x_{test} and :math:x_{enroll} share the same class. #. :math:x_{test} and :math:x_{enroll} are from different classes. Using the methods :py:meth:bob.learn.em.PLDAMachine.log_likelihood_ratio or its alias __call__ function, the corresponding log-likelihood ratio will be computed, which is defined in more formal way by:  Manuel Günther committed Mar 05, 2015 684 :math:s = \ln(P(x_{test},x_{enroll})) - \ln(P(x_{test})P(x_{enroll}))  André Anjos committed May 26, 2014 685 686 687 688 689 690 691 692 693 694 695 696  .. doctest:: >>> s1 = plda1(sample) >>> s2 = plda2(sample) .. testcleanup:: * import shutil os.chdir(current_directory) shutil.rmtree(temp_dir)  Tiago Pereira committed May 30, 2017 697 698 699 700  Score Normalization -------------------  Amir MOHAMMADI committed May 30, 2017 701 702 Score normalization aims to compensate statistical variations in output scores due to changes in the conditions across different enrollment and probe samples.  Amir MOHAMMADI committed May 30, 2017 703 704 This is achieved by scaling distributions of system output scores to better facilitate the application of a single, global threshold for authentication.  Tiago Pereira committed May 30, 2017 705   Amir MOHAMMADI committed May 30, 2017 706 707 Bob has implemented 3 different strategies to normalize scores and these strategies are presented in the next subsections.  Tiago Pereira committed May 30, 2017 708 709 710 711 712  Z-Norm ====== .. _znorm:  Amir MOHAMMADI committed May 30, 2017 713 714 715 716 Given a score :math:s_i, Z-Norm [Auckenthaler2000]_ and [Mariethoz2005]_ (zero-normalization) scales this value by the mean (:math:\mu) and standard deviation (:math:\sigma) of an impostor score distribution. This score distribution can be computed before hand and it is defined as the following.  Tiago Pereira committed May 30, 2017 717 718 719 720 721 722  .. math:: zs_i = \frac{s_i - \mu}{\sigma}  Amir MOHAMMADI committed May 30, 2017 723 724 This scoring technique is implemented in :py:func:bob.learn.em.znorm. Follow bellow an example of score normalization using :py:func:bob.learn.em.znorm.  Tiago Pereira committed May 30, 2017 725 726 727 728 729  .. plot:: plot/plot_Znorm.py :include-source: True .. note::  Amir MOHAMMADI committed May 30, 2017 730   Tiago Pereira committed May 30, 2017 731 732 733 734 735 736 737  Observe how the scores were scaled in the plot above. T-Norm ====== .. _tnorm:  Amir MOHAMMADI committed May 30, 2017 738 739 740 741 742 743 744 T-norm [Auckenthaler2000]_ and [Mariethoz2005]_ (Test-normalization) operates in a probe-centric manner. If in the Z-Norm :math:\mu and :math:\sigma are estimated using an impostor set of models and its scores, the t-norm computes these statistics using the current probe sample against at set of models in a co-hort :math:\Theta_{c}. A co-hort can be any semantic organization that is sensible to your recognition task, such as sex (male and females), ethnicity, age, etc and is defined as the following.  Tiago Pereira committed May 30, 2017 745 746 747 748 749  .. math:: zs_i = \frac{s_i - \mu}{\sigma}  Amir MOHAMMADI committed May 30, 2017 750 751 752 753 754 where, :math:s_i is :math:P(x_i | \Theta) (the score given the claimed model), :math:\mu = \frac{ \sum\limits_{i=0}^{N} P(x_i | \Theta_{c}) }{N} (:math:\Theta_{c} are the models of one co-hort) and :math:\sigma is the standard deviation computed using the same criteria used to compute :math:\mu.  Tiago Pereira committed May 30, 2017 755 756   Amir MOHAMMADI committed May 30, 2017 757 758 This scoring technique is implemented in :py:func:bob.learn.em.tnorm. Follow bellow an example of score normalization using :py:func:bob.learn.em.tnorm.  Tiago Pereira committed May 30, 2017 759 760 761 762 763 764  .. plot:: plot/plot_Tnorm.py :include-source: True .. note::  Amir MOHAMMADI committed May 30, 2017 765 766 767 768  T-norm introduces extra computation during scoring, as the probe samples need to be compared to each cohort model in order to have :math:\mu and :math:\sigma.  Tiago Pereira committed May 30, 2017 769 770 771 772 773 774  ZT-Norm ======= .. _ztnorm:  Amir MOHAMMADI committed May 30, 2017 775 776 777 ZT-Norm [Auckenthaler2000]_ and [Mariethoz2005]_ consists in the application of :ref:Z-Norm  followed by a :ref:T-Norm  and it is implemented in :py:func:bob.learn.em.ztnorm.  Tiago Pereira committed May 30, 2017 778   Amir MOHAMMADI committed May 30, 2017 779 780 Follow bellow an example of score normalization using :py:func:bob.learn.em.ztnorm.  Tiago Pereira committed May 30, 2017 781 782 783 784 785  .. plot:: plot/plot_ZTnorm.py :include-source: True .. note::  Amir MOHAMMADI committed May 30, 2017 786   Tiago Pereira committed May 30, 2017 787 788 789  Observe how the scores were scaled in the plot above.  André Anjos committed May 26, 2014 790 791 792 793 794 795 .. Place here your external references .. include:: links.rst .. [1] http://dx.doi.org/10.1109/TASL.2006.881693 .. [2] http://publications.idiap.ch/index.php/publications/show/2606 .. [3] http://dx.doi.org/10.1016/j.csl.2007.05.003 .. [4] http://dx.doi.org/10.1109/TASL.2010.2064307  Tiago de Freitas Pereira committed Apr 09, 2018 796 797 .. [5] http://dx.doi.org/10.1109/ICCV.2007.4409052 .. [6] http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.38  André Anjos committed May 26, 2014 798 799 .. [7] http://en.wikipedia.org/wiki/K-means_clustering .. [8] http://en.wikipedia.org/wiki/Expectation-maximization_algorithm  Tiago de Freitas Pereira committed Apr 09, 2018 800 801 802 803 .. [9] http://en.wikipedia.org/wiki/Maximum_likelihood .. [10] http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation