guide.rst 25.1 KB
 André Anjos committed Nov 21, 2013 1 2 3 4 .. vim: set fileencoding=utf-8 : .. Andre Anjos .. Tue 15 Oct 17:41:52 2013  André Anjos committed Feb 18, 2014 5 .. testsetup:: *  André Anjos committed Nov 21, 2013 6   André Anjos committed Feb 18, 2014 7 8 9 10 11 12  import numpy positives = numpy.random.normal(1,1,100) negatives = numpy.random.normal(-1,1,100) import matplotlib if not hasattr(matplotlib, 'backends'): matplotlib.use('pdf') #non-interactive avoids exception on display  André Anjos committed May 26, 2014 13  import bob.measure  André Anjos committed Nov 21, 2013 14 15 16 17 18  ============ User Guide ============  André Anjos committed May 26, 2014 19 Methods in the :py:mod:bob.measure module can help you to quickly and easily  André Anjos committed Dec 12, 2013 20 21 evaluate error for multi-class or binary classification problems. If you are not yet familiarized with aspects of performance evaluation, we recommend the  22 23 following papers and book chapters for an overview of some of the implemented methods.  André Anjos committed Dec 12, 2013 24 25 26 27 28 29 30  * Bengio, S., Keller, M., Mariéthoz, J. (2004). The Expected Performance Curve_. International Conference on Machine Learning ICML Workshop on ROC Analysis in Machine Learning, 136(1), 1963–1966. * Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance_. Fifth European Conference on Speech Communication and Technology (pp. 1895-1898).  31 32 * Li, S., Jain, A.K. (2005), Handbook of Face Recognition, Chapter 14, Springer  André Anjos committed Dec 12, 2013 33 34 35 36 37 38 39 40 41 42 43 44 45  Overview -------- A classifier is subject to two types of errors, either the real access/signal is rejected (false rejection) or an impostor attack/a false access is accepted (false acceptance). A possible way to measure the detection performance is to use the Half Total Error Rate (HTER), which combines the False Rejection Rate (FRR) and the False Acceptance Rate (FAR) and is defined in the following formula: .. math::  Manuel Günther committed Nov 25, 2015 46  HTER(\tau, \mathcal{D}) = \frac{FAR(\tau, \mathcal{D}) + FRR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}  André Anjos committed Dec 12, 2013 47   André Anjos committed Sep 28, 2016 48   André Anjos committed Dec 12, 2013 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 where :math:\mathcal{D} denotes the dataset used. Since both the FAR and the FRR depends on the threshold :math:\tau, they are strongly related to each other: increasing the FAR will reduce the FRR and vice-versa. For this reason, results are often presented using either a Receiver Operating Characteristic (ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically present the FAR versus the FRR for different values of the threshold. Another widely used measure to summarise the performance of a system is the Equal Error Rate (EER), defined as the point along the ROC or DET curve where the FAR equals the FRR. However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be misleading when comparing systems. Hence, the so-called Expected Performance Curve (EPC) was proposed and consists of an unbiased estimate of the reachable performance of a system at various operating points. Indeed, in real-world scenarios, the threshold :math:\tau has to be set a priori: this is typically done using a development set (also called cross-validation set). Nevertheless, the optimal threshold can be different depending on the relative importance given to the FAR and the FRR. Hence, in the EPC framework, the cost  André Anjos committed Nov 08, 2016 67 68 :math:\beta \in [0;1] is defined as the trade-off between the FAR and FRR. The optimal threshold :math:\tau^* is then computed using different values of  André Anjos committed Dec 12, 2013 69 70 71 72 73 :math:\beta, corresponding to different operating points: .. math:: \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FAR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FRR}(\tau, \mathcal{D}_{d})  André Anjos committed Sep 28, 2016 74   André Anjos committed Dec 12, 2013 75 where :math:\mathcal{D}_{d} denotes the development set and should be  André Anjos committed Sep 28, 2016 76 completely separate to the evaluation set :math:\mathcal{D}.  André Anjos committed Dec 12, 2013 77 78 79 80 81  Performance for different values of :math:\beta is then computed on the test set :math:\mathcal{D}_{t} using the previously derived threshold. Note that setting :math:\beta to 0.5 yields to the Half Total Error Rate (HTER) as defined in the first equation.  André Anjos committed Nov 21, 2013 82 83 84  .. note::  Manuel Günther committed Nov 25, 2015 85  Most of the methods available in this module require as input a set of 2  André Anjos committed Dec 12, 2013 86 87 88 89 90 91  :py:class:numpy.ndarray objects that contain the scores obtained by the classification system to be evaluated, without specific order. Most of the classes that are defined to deal with two-class problems. Therefore, in this setting, and throughout this manual, we have defined that the **negatives** represents the impostor attacks or false class accesses (that is when a sample of class A is given to the classifier of another class, such as class  Manuel Günther committed Nov 25, 2015 92  B) for of the classifier. The second set, referred as the **positives**  André Anjos committed Dec 12, 2013 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108  represents the true class accesses or signal response of the classifier. The vectors are called this way because the procedures implemented in this module expects that the scores of **negatives** to be statistically distributed to the left of the signal scores (the **positives**). If that is not the case, one should either invert the input to the methods or multiply all scores available by -1, in order to have them inverted. The input to create these two vectors is generated by experiments conducted by the user and normally sits in files that may need some parsing before these vectors can be extracted. While it is not possible to provide a parser for every individual file that may be generated in different experimental frameworks, we do provide a few parsers for formats we use the most. Please refer to the documentation of :py:mod:bob.measure.load for a list of formats and details.  Manuel Günther committed Nov 25, 2015 109  In the remainder of this section we assume you have successfully parsed and  André Anjos committed Dec 12, 2013 110 111  loaded your scores in two 1D float64 vectors and are ready to evaluate the performance of the classifier.  André Anjos committed Nov 21, 2013 112   113 114 Verification ------------  André Anjos committed Nov 21, 2013 115   André Anjos committed Dec 12, 2013 116 117 To count the number of correctly classified positives and negatives you can use the following techniques:  André Anjos committed Nov 21, 2013 118 119 120  .. doctest::  Manuel Günther committed Nov 25, 2015 121 122 123 124 125 126  >>> # negatives, positives = parse_my_scores(...) # write parser if not provided! >>> T = 0.0 #Threshold: later we explain how one can calculate these >>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T) >>> FAR = 1 - (float(correct_negatives.sum())/negatives.size) >>> correct_positives = bob.measure.correctly_classified_positives(positives, T) >>> FRR = 1 - (float(correct_positives.sum())/positives.size)  André Anjos committed Nov 21, 2013 127   André Anjos committed Dec 12, 2013 128 We do provide a method to calculate the FAR and FRR in a single shot:  André Anjos committed Nov 21, 2013 129 130 131  .. doctest::  Manuel Günther committed Nov 25, 2015 132  >>> FAR, FRR = bob.measure.farfrr(negatives, positives, T)  André Anjos committed Nov 21, 2013 133   André Anjos committed Dec 12, 2013 134 135 136 137 138 139 The threshold T is normally calculated by looking at the distribution of negatives and positives in a development (or validation) set, selecting a threshold that matches a certain criterion and applying this derived threshold to the test (or evaluation) set. This technique gives a better overview of the generalization of a method. We implement different techniques for the calculation of the threshold:  André Anjos committed Nov 21, 2013 140   André Anjos committed Dec 12, 2013 141 * Threshold for the EER  André Anjos committed Nov 21, 2013 142   André Anjos committed Dec 12, 2013 143  .. doctest::  André Anjos committed Nov 21, 2013 144   André Anjos committed May 26, 2014 145  >>> T = bob.measure.eer_threshold(negatives, positives)  André Anjos committed Nov 21, 2013 146   André Anjos committed Dec 12, 2013 147 * Threshold for the minimum HTER  André Anjos committed Nov 21, 2013 148   André Anjos committed Dec 12, 2013 149  .. doctest::  André Anjos committed Nov 21, 2013 150   André Anjos committed May 26, 2014 151  >>> T = bob.measure.min_hter_threshold(negatives, positives)  André Anjos committed Nov 21, 2013 152   André Anjos committed Dec 12, 2013 153 154 * Threshold for the minimum weighted error rate (MWER) given a certain cost :math:\beta.  André Anjos committed Nov 21, 2013 155   Manuel Günther committed Nov 25, 2015 156  .. doctest:: python  André Anjos committed Nov 21, 2013 157   André Anjos committed Dec 12, 2013 158  >>> cost = 0.3 #or "beta"  André Anjos committed May 26, 2014 159  >>> T = bob.measure.min_weighted_error_rate_threshold(negatives, positives, cost)  André Anjos committed Nov 21, 2013 160   André Anjos committed Dec 12, 2013 161  .. note::  André Anjos committed Nov 21, 2013 162   Manuel Günther committed Nov 25, 2015 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178  By setting cost to 0.5 is equivalent to use :py:func:bob.measure.min_hter_threshold. .. note:: Many functions in bob.measure have an is_sorted parameter, which defaults to False, throughout. However, these functions need sorted positive and/or negative scores. If scores are not in ascendantly sorted order, internally, they will be copied -- twice! To avoid scores to be copied, you might want to sort the scores in ascending order, e.g., by: .. doctest:: python >>> negatives.sort() >>> positives.sort() >>> t = bob.measure.min_weighted_error_rate_threshold(negatives, positives, cost, is_sorted = True) >>> assert T == t  179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 Identification -------------- For identification, the Recognition Rate is one of the standard measures. To compute recognition rates, you can use the :py:func:bob.measure.recognition_rate function. This function expects a relatively complex data structure, which is the same as for the CMC_ below. For each probe item, the scores for negative and positive comparisons are computed, and collected for all probe items: .. doctest:: >>> rr_scores = [] >>> for probe in range(10): ... pos = numpy.random.normal(1, 1, 1) ... neg = numpy.random.normal(0, 1, 19) ... rr_scores.append((neg, pos))  194  >>> rr = bob.measure.recognition_rate(rr_scores, rank=1)  195 196 197 198 199 200 201 202 203 204 205 206 207 208 209  For open set identification, according to Li and Jain (2005) there are two different error measures defined. The first measure is the :py:func:bob.measure.detection_identification_rate, which counts the number of correctly classified in-gallery probe items. The second measure is the :py:func:bob.measure.false_alarm_rate, which counts, how often an out-of-gallery probe item was incorrectly accepted. Both rates can be computed using the same data structure, with one exception. Both functions require that at least one probe item exists, which has no according gallery item, i.e., where the positives are empty or None: (continued from above...) .. doctest:: >>> for probe in range(10): ... pos = None ... neg = numpy.random.normal(-2, 1, 10) ... rr_scores.append((neg, pos))  210 211  >>> dir = bob.measure.detection_identification_rate(rr_scores, threshold = 0, rank=1) >>> far = bob.measure.false_alarm_rate(rr_scores, threshold = 0)  212   André Anjos committed Nov 21, 2013 213   André Anjos committed Dec 12, 2013 214 215 Plotting --------  André Anjos committed Nov 21, 2013 216   André Anjos committed Dec 12, 2013 217 An image is worth 1000 words, they say. You can combine the capabilities of  André Anjos committed Sep 29, 2016 218 219 Matplotlib_ with |project| to plot a number of curves. However, you must have that package installed though. In this section we describe a few recipes.  André Anjos committed Nov 21, 2013 220   André Anjos committed Dec 12, 2013 221 222 ROC ===  André Anjos committed Nov 21, 2013 223   André Anjos committed Dec 12, 2013 224 225 226 The Receiver Operating Characteristic (ROC) curve is one of the oldest plots in town. To plot an ROC curve, in possession of your **negatives** and **positives**, just do something along the lines of:  André Anjos committed Nov 21, 2013 227 228 229  .. doctest::  Manuel Günther committed Nov 25, 2015 230 231 232 233 234 235 236 237  >>> from matplotlib import pyplot >>> # we assume you have your negatives and positives already split >>> npoints = 100 >>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP >>> pyplot.grid(True) >>> pyplot.show() # doctest: +SKIP  André Anjos committed Nov 21, 2013 238   André Anjos committed Dec 12, 2013 239 You should see an image like the following one:  André Anjos committed Nov 21, 2013 240   André Anjos committed Feb 18, 2014 241 242 243 .. plot:: import numpy  244  numpy.random.seed(42)  André Anjos committed May 26, 2014 245  import bob.measure  André Anjos committed Feb 18, 2014 246 247 248 249 250  from matplotlib import pyplot positives = numpy.random.normal(1,1,100) negatives = numpy.random.normal(-1,1,100) npoints = 100  André Anjos committed May 26, 2014 251  bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')  André Anjos committed Feb 18, 2014 252 253 254 255  pyplot.grid(True) pyplot.xlabel('FAR (%)') pyplot.ylabel('FRR (%)') pyplot.title('ROC')  André Anjos committed Nov 21, 2013 256   André Anjos committed Dec 12, 2013 257 As can be observed, plotting methods live in the namespace  André Anjos committed Sep 29, 2016 258 259 260 261 262 263 264 265 266 267 268 269 :py:mod:bob.measure.plot. They work like the :py:func:matplotlib.pyplot.plot itself, except that instead of receiving the x and y point coordinates as parameters, they receive the two :py:class:numpy.ndarray arrays with negatives and positives, as well as an indication of the number of points the curve must contain. As in the :py:func:matplotlib.pyplot.plot command, you can pass optional parameters for the line as shown in the example to setup its color, shape and even the label. For an overview of the keywords accepted, please refer to the Matplotlib_'s Documentation. Other plot properties such as the plot title, axis labels, grids, legends should be controlled directly using the relevant Matplotlib_'s controls.  André Anjos committed Nov 21, 2013 270   André Anjos committed Dec 12, 2013 271 272 DET ===  André Anjos committed Nov 21, 2013 273   André Anjos committed Dec 12, 2013 274 A DET curve can be drawn using similar commands such as the ones for the ROC curve:  André Anjos committed Nov 21, 2013 275 276 277  .. doctest::  André Anjos committed Dec 12, 2013 278 279 280  >>> from matplotlib import pyplot >>> # we assume you have your negatives and positives already split >>> npoints = 100  André Anjos committed May 26, 2014 281 282  >>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP  André Anjos committed Dec 12, 2013 283 284 285 286  >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP >>> pyplot.grid(True) >>> pyplot.show() # doctest: +SKIP  André Anjos committed Nov 21, 2013 287   André Anjos committed Dec 12, 2013 288 This will produce an image like the following one:  André Anjos committed Nov 21, 2013 289   André Anjos committed Feb 18, 2014 290 291 292 .. plot:: import numpy  293  numpy.random.seed(42)  André Anjos committed May 26, 2014 294  import bob.measure  André Anjos committed Feb 18, 2014 295 296 297 298 299 300  from matplotlib import pyplot positives = numpy.random.normal(1,1,100) negatives = numpy.random.normal(-1,1,100) npoints = 100  André Anjos committed May 26, 2014 301 302  bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') bob.measure.plot.det_axis([0.1, 80, 0.1, 80])  André Anjos committed Feb 18, 2014 303 304 305 306  pyplot.grid(True) pyplot.xlabel('FAR (%)') pyplot.ylabel('FRR (%)') pyplot.title('DET')  André Anjos committed Nov 21, 2013 307 308 309  .. note::  André Anjos committed Dec 12, 2013 310 311 312  If you wish to reset axis zooming, you must use the Gaussian scale rather than the visual marks showed at the plot, which are just there for displaying purposes. The real axis scale is based on the  Manuel Günther committed Oct 28, 2014 313  :py:func:bob.measure.ppndf method. For example, if you wish to set the x and y  André Anjos committed Dec 12, 2013 314  axis to display data between 1% and 40% here is the recipe:  André Anjos committed Nov 21, 2013 315   André Anjos committed Dec 12, 2013 316  .. doctest::  André Anjos committed Nov 21, 2013 317   André Anjos committed Dec 12, 2013 318  >>> #AFTER you plot the DET curve, just set the axis in this way:  André Anjos committed May 26, 2014 319  >>> pyplot.axis([bob.measure.ppndf(k/100.0) for k in (1, 40, 1, 40)]) # doctest: +SKIP  André Anjos committed Nov 21, 2013 320   André Anjos committed Dec 12, 2013 321  We provide a convenient way for you to do the above in this module. So,  André Anjos committed May 26, 2014 322  optionally, you may use the bob.measure.plot.det_axis method like this:  André Anjos committed Nov 21, 2013 323   André Anjos committed Dec 12, 2013 324  .. doctest::  André Anjos committed Nov 21, 2013 325   André Anjos committed May 26, 2014 326  >>> bob.measure.plot.det_axis([1, 40, 1, 40]) # doctest: +SKIP  André Anjos committed Nov 21, 2013 327   André Anjos committed Dec 12, 2013 328 329 EPC ===  André Anjos committed Nov 21, 2013 330   Manuel Günther committed Sep 17, 2015 331 Drawing an EPC requires that both the development set negatives and positives are provided alongside  André Anjos committed Dec 12, 2013 332 the test (or evaluation) set ones. Because of this the API is slightly modified:  André Anjos committed Nov 21, 2013 333 334 335  .. doctest::  André Anjos committed May 26, 2014 336  >>> bob.measure.plot.epc(dev_neg, dev_pos, test_neg, test_pos, npoints, color=(0,0,0), linestyle='-') # doctest: +SKIP  André Anjos committed Dec 12, 2013 337  >>> pyplot.show() # doctest: +SKIP  André Anjos committed Nov 21, 2013 338   André Anjos committed Dec 12, 2013 339 This will produce an image like the following one:  André Anjos committed Nov 21, 2013 340   André Anjos committed Feb 18, 2014 341 342 343 .. plot:: import numpy  344  numpy.random.seed(42)  André Anjos committed May 26, 2014 345  import bob.measure  André Anjos committed Feb 18, 2014 346 347 348 349 350 351 352  from matplotlib import pyplot dev_pos = numpy.random.normal(1,1,100) dev_neg = numpy.random.normal(-1,1,100) test_pos = numpy.random.normal(0.9,1,100) test_neg = numpy.random.normal(-1.1,1,100) npoints = 100  André Anjos committed May 26, 2014 353  bob.measure.plot.epc(dev_neg, dev_pos, test_neg, test_pos, npoints, color=(0,0,0), linestyle='-')  André Anjos committed Feb 18, 2014 354 355  pyplot.grid(True) pyplot.title('EPC')  André Anjos committed Nov 21, 2013 356   Manuel Günther committed Sep 17, 2015 357 358 359 360  CMC ===  André Anjos committed Sep 29, 2016 361 362 363 364 365 The Cumulative Match Characteristics (CMC) curve estimates the probability that the correct model is in the *N* models with the highest similarity to a given probe. A CMC curve can be plotted using the :py:func:bob.measure.plot.cmc function. The CMC can be calculated from a relatively complex data structure, which defines a pair of positive and negative scores **per probe**:  Manuel Günther committed Sep 17, 2015 366 367 368 369  .. plot:: import numpy  370  numpy.random.seed(42)  Manuel Günther committed Sep 17, 2015 371 372 373  import bob.measure from matplotlib import pyplot  374  cmc_scores = []  Manuel Günther committed Sep 17, 2015 375 376 377  for probe in range(10): positives = numpy.random.normal(1, 1, 1) negatives = numpy.random.normal(0, 1, 19)  378 379  cmc_scores.append((negatives, positives)) bob.measure.plot.cmc(cmc_scores, logx=False)  André Anjos committed Sep 28, 2016 380  pyplot.grid(True)  Manuel Günther committed Sep 17, 2015 381 382 383 384 385  pyplot.title('CMC') pyplot.xlabel('Rank') pyplot.xticks([1,5,10,20]) pyplot.xlim([1,20]) pyplot.ylim([0,100])  386  pyplot.ylabel('Probability of Recognition (%)')  Manuel Günther committed Sep 17, 2015 387 388 389 390  Usually, there is only a single positive score per probe, but this is not a fixed restriction. .. note::  André Anjos committed Sep 28, 2016 391 392  The complex data structure can be read from our default 4 or 5 column score  Manuel Günther committed Feb 20, 2017 393  files using the :py:func:bob.measure.load.cmc function.  Manuel Günther committed Sep 17, 2015 394   395 396 397 398  Detection & Identification Curve ================================  André Anjos committed Sep 28, 2016 399 400 401 402 403 404 405 406 The detection & identification curve is designed to evaluate open set identification tasks. It can be plotted using the :py:func:bob.measure.plot.detection_identification_curve function, but it requires at least one open-set probe, i.e., where no corresponding positive score exists, for which the FAR values are computed. Here, we plot the detection and identification curve for rank 1, so that the recognition rate for FAR=1 will be identical to the rank one :py:func:bob.measure.recognition_rate obtained in the CMC plot above.  407 408 409 410 411 412 413 414 415  .. plot:: import numpy numpy.random.seed(42) import bob.measure from matplotlib import pyplot cmc_scores = []  Manuel Günther committed Oct 16, 2017 416  for probe in range(1000):  417 418 419  positives = numpy.random.normal(1, 1, 1) negatives = numpy.random.normal(0, 1, 19) cmc_scores.append((negatives, positives))  Manuel Günther committed Oct 16, 2017 420  for probe in range(1000):  421 422 423 424 425  negatives = numpy.random.normal(-1, 1, 10) cmc_scores.append((negatives, None)) bob.measure.plot.detection_identification_curve(cmc_scores, rank=1, logx=True) pyplot.xlabel('False Alarm Rate')  Manuel Günther committed Oct 16, 2017 426  pyplot.xlim([0.0001, 1])  427  pyplot.ylabel('Detection & Identification Rate (%)')  Manuel Günther committed Oct 16, 2017 428  pyplot.ylim([0,1])  429 430 431   André Anjos committed Dec 12, 2013 432 433 Fine-tunning ============  André Anjos committed Nov 21, 2013 434   André Anjos committed May 26, 2014 435 436 The methods inside :py:mod:bob.measure.plot are only provided as a Matplotlib_ wrapper to equivalent methods in :py:mod:bob.measure that can  André Anjos committed Dec 12, 2013 437 438 only calculate the points without doing any plotting. You may prefer to tweak the plotting or even use a different plotting system such as gnuplot. Have a  André Anjos committed Sep 28, 2016 439 440 441 442 look at the implementations at :py:mod:bob.measure.plot to understand how to use the |project| methods to compute the curves and interlace that in the way that best suits you.  André Anjos committed Dec 12, 2013 443 444 445 446 447 448 449  Full applications ----------------- We do provide a few scripts that can be used to quickly evaluate a set of scores. We present these scripts in this section. The scripts take as input either a 4-column or 5-column data format as specified in the documentation of  Manuel Günther committed Oct 28, 2014 450 451 :py:func:bob.measure.load.four_column or :py:func:bob.measure.load.five_column.  André Anjos committed Dec 12, 2013 452 453 454 455 456 457  To calculate the threshold using a certain criterion (EER, min.HTER or weighted Error Rate) on a set, after setting up |project|, just do: .. code-block:: sh  André Anjos committed Nov 08, 2016 458  $bob_eval_threshold.py development-scores-4col.txt  André Anjos committed Dec 12, 2013 459 460 461 462 463 464 465 466 467 468  Threshold: -0.004787956164 FAR : 6.731% (35/520) FRR : 6.667% (26/390) HTER: 6.699% The output will present the threshold together with the FAR, FRR and HTER on the given set, calculated using such a threshold. The relative counts of FAs and FRs are also displayed between parenthesis. To evaluate the performance of a new score file with a given threshold, use the  André Anjos committed May 26, 2014 469 application bob_apply_threshold.py:  André Anjos committed Dec 12, 2013 470 471 472  .. code-block:: sh  André Anjos committed Nov 08, 2016 473 $ bob_apply_threshold.py -0.0047879 test-scores-4col.txt  André Anjos committed Dec 12, 2013 474 475 476 477 478  FAR : 2.115% (11/520) FRR : 7.179% (28/390) HTER: 4.647% In this case, only the error figures are presented. You can conduct the  André Anjos committed Nov 08, 2016 479 evaluation and plotting of development (and test set data) using our combined  André Anjos committed May 26, 2014 480 bob_compute_perf.py script. You pass both sets and it does the rest:  André Anjos committed Dec 12, 2013 481 482 483  .. code-block:: sh  André Anjos committed Nov 08, 2016 484  \$ bob_compute_perf.py development-scores-4col.txt test-scores-4col.txt  André Anjos committed Sep 29, 2016 485  [Min. criterion: EER] Threshold on Development set: -4.787956e-03  André Anjos committed Dec 12, 2013 486 487 488 489 490  | Development | Test -------+-----------------+------------------ FAR | 6.731% (35/520) | 2.500% (13/520) FRR | 6.667% (26/390) | 6.154% (24/390) HTER | 6.699% | 4.327%  André Anjos committed Sep 29, 2016 491  [Min. criterion: Min. HTER] Threshold on Development set: 3.411070e-03  André Anjos committed Dec 12, 2013 492 493 494 495 496 497 498 499 500 501  | Development | Test -------+-----------------+------------------ FAR | 4.231% (22/520) | 1.923% (10/520) FRR | 7.949% (31/390) | 7.692% (30/390) HTER | 6.090% | 4.808% [Plots] Performance curves => 'curves.pdf' Inside that script we evaluate 2 different thresholds based on the EER and the minimum HTER on the development set and apply the output to the test set. As can be seen from the toy-example above, the system generalizes reasonably well.  André Anjos committed Nov 08, 2016 502 503 A single PDF file is generated containing an EPC as well as ROC and DET plots of such a system.  André Anjos committed Dec 12, 2013 504 505 506  Use the --help option on the above-cited scripts to find-out about more options.  André Anjos committed Nov 21, 2013 507   Manuel Günther committed Sep 17, 2015 508 509 510 511  Score file conversion ---------------------  André Anjos committed Nov 08, 2016 512 513 514 515 Sometimes, it is required to export the score files generated by Bob to a different format, e.g., to be able to generate a plot comparing Bob's systems with other systems. In this package, we provide source code to convert between different types of score files.  Manuel Günther committed Sep 17, 2015 516   Manuel Günther committed Oct 05, 2015 517 518 519 Bob to OpenBR =============  André Anjos committed Nov 08, 2016 520 521 522 523 524 525 One of the supported formats is the matrix format that the National Institute of Standards and Technology (NIST) uses, and which is supported by OpenBR_. The scores are stored in two binary matrices, where the first matrix (usually with a .mtx filename extension) contains the raw scores, while a second mask matrix (extension .mask) contains information, which scores are positives, and which are negatives.  Manuel Günther committed Sep 17, 2015 526   André Anjos committed Nov 08, 2016 527 528 529 530 531 To convert from Bob's four column or five column score file to a pair of these matrices, you can use the :py:func:bob.measure.openbr.write_matrix function. In the simplest way, this function takes a score file 'five-column-sore-file' and writes the pair 'openbr.mtx', 'openbr.mask' of OpenBR compatible files:  Manuel Günther committed Sep 17, 2015 532 533 534 535 536  .. code-block:: py >>> bob.measure.openbr.write_matrix('five-column-sore-file', 'openbr.mtx', 'openbr.mask', score_file_format = '5column')  André Anjos committed Nov 08, 2016 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 In this way, the score file will be parsed and the matrices will be written in the same order that is obtained from the score file. For most of the applications, this should be sufficient, but as the identity information is lost in the matrix files, no deeper analysis is possible anymore when just using the matrices. To enforce an order of the models and probes inside the matrices, you can use the model_names and probe_names parameters of :py:func:bob.measure.openbr.write_matrix: * The probe_names parameter lists the path elements stored in the score files, which are the fourth column in a 5column file, and the third column in a 4column file, see :py:func:bob.measure.load.five_column and :py:func:bob.measure.load.four_column. * The model_names parameter is a bit more complicated. In a 5column format score file, the model names are defined by the second column of that file, see :py:func:bob.measure.load.five_column. In a 4column format score file, the model information is not contained, but only the client information of the model. Hence, for the 4column format, the model_names actually lists the client ids found in the first column, see :py:func:bob.measure.load.four_column.  Manuel Günther committed Sep 17, 2015 557 558  .. warning::  André Anjos committed Nov 08, 2016 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578  The model information is lost, but required to write the matrix files. In the 4column format, we use client ids instead of the model information. Hence, when several models exist per client, this function will not work as expected. Additionally, there are fields in the matrix files, which define the gallery and probe list files that were used to generate the matrix. These file names can be selected with the gallery_file_name and probe_file_name keyword parameters of :py:func:bob.measure.openbr.write_matrix. Finally, OpenBR defines a specific 'search' score file format, which is designed to be used to compute CMC curves. The score matrix contains descendingly sorted and possibly truncated list of scores, i.e., for each probe, a sorted list of all scores for the models is generated. To generate these special score file format, you can specify the search parameter. It specifies the number of highest scores per probe that should be kept. If the search parameter is set to a negative value, all scores will be kept. If the search parameter is higher as the actual number of models, NaN scores will be appended, and the according mask values will be set to 0 (i.e., to be ignored).  Manuel Günther committed Sep 17, 2015 579   Manuel Günther committed Oct 05, 2015 580 581 582 583  OpenBR to Bob =============  André Anjos committed Nov 08, 2016 584 585 586 587 588 On the other hand, you might also want to generate a Bob-compatible (four or five column) score file based on a pair of OpenBR matrix and mask files. This is possible by using the :py:func:bob.measure.openbr.write_score_file function. At the basic, it takes the given pair of matrix and mask files, as well as the desired output score file:  Manuel Günther committed Oct 05, 2015 589 590 591 592 593  .. code-block:: py >>> bob.measure.openbr.write_score_file('openbr.mtx', 'openbr.mask', 'four-column-sore-file')  André Anjos committed Nov 08, 2016 594 595 This score file is sufficient to compute a CMC curve (see CMC_), however it does not contain relevant client ids or paths for models and probes.  Manuel Günther committed Oct 05, 2015 596 597 Particularly, it assumes that each client has exactly one associated model.  André Anjos committed Nov 08, 2016 598 599 600 601 602 603 604 605 606 607 608 609 610 To add/correct these information, you can use additional parameters to :py:func:bob.measure.openbr.write_score_file. Client ids of models and probes can be added using the models_ids and probes_ids keyword arguments. The length of these lists must be identical to the number of models and probes as given in the matrix files, **and they must be in the same order as used to compute the OpenBR matrix**. This includes that the same same-client and different-client pairs as indicated by the OpenBR mask will be generated, which will be checked inside the function. To add model and probe path information, the model_names and probe_names parameters, which need to have the same size and order as the models_ids and probes_ids. These information are simply stored in the score file, and no further check is applied.  Manuel Günther committed Oct 05, 2015 611 612 613 614  .. note:: The model_names parameter is used only when writing score files in score_file_format='5column', in the '4column' format, this parameter is ignored.  André Anjos committed Nov 21, 2013 615 616 .. include:: links.rst  André Anjos committed Dec 12, 2013 617 .. Place youre references here:  André Anjos committed Nov 21, 2013 618   André Anjos committed Dec 12, 2013 619 620 .. _The Expected Performance Curve: http://publications.idiap.ch/downloads/reports/2005/bengio_2005_icml.pdf .. _The DET curve in assessment of detection task performance: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.4489&rep=rep1&type=pdf  Manuel Günther committed Oct 05, 2015 621 .. _openbr: http://openbiometrics.org