guide.rst 24.1 KB
 André Anjos committed Nov 21, 2013 1 2 3 4 .. vim: set fileencoding=utf-8 : .. Andre Anjos .. Tue 15 Oct 17:41:52 2013  André Anjos committed Feb 18, 2014 5 .. testsetup:: *  André Anjos committed Nov 21, 2013 6   André Anjos committed Feb 18, 2014 7 8 9 10 11 12  import numpy positives = numpy.random.normal(1,1,100) negatives = numpy.random.normal(-1,1,100) import matplotlib if not hasattr(matplotlib, 'backends'): matplotlib.use('pdf') #non-interactive avoids exception on display  André Anjos committed May 26, 2014 13  import bob.measure  André Anjos committed Nov 21, 2013 14 15 16 17 18  ============ User Guide ============  André Anjos committed May 26, 2014 19 Methods in the :py:mod:bob.measure module can help you to quickly and easily  André Anjos committed Dec 12, 2013 20 21 evaluate error for multi-class or binary classification problems. If you are not yet familiarized with aspects of performance evaluation, we recommend the  22 23 following papers and book chapters for an overview of some of the implemented methods.  André Anjos committed Dec 12, 2013 24 25 26 27 28 29 30  * Bengio, S., Keller, M., Mariéthoz, J. (2004). The Expected Performance Curve_. International Conference on Machine Learning ICML Workshop on ROC Analysis in Machine Learning, 136(1), 1963–1966. * Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance_. Fifth European Conference on Speech Communication and Technology (pp. 1895-1898).  31 32 * Li, S., Jain, A.K. (2005), Handbook of Face Recognition, Chapter 14, Springer  André Anjos committed Dec 12, 2013 33 34 35 36 37  Overview -------- A classifier is subject to two types of errors, either the real access/signal  Theophile GENTILHOMME committed Jun 29, 2018 38 39 40 41 is rejected (false negative) or an impostor attack/a false access is accepted (false positive). A possible way to measure the detection performance is to use the Half Total Error Rate (HTER), which combines the False Negative Rate (FNR) and the False Positive Rate (FPR) and is defined in the following  André Anjos committed Dec 12, 2013 42 43 44 45 formula: .. math::  Theophile GENTILHOMME committed Jun 29, 2018 46  HTER(\tau, \mathcal{D}) = \frac{FPR(\tau, \mathcal{D}) + FNR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}  André Anjos committed Dec 12, 2013 47   André Anjos committed Sep 28, 2016 48   Theophile GENTILHOMME committed Jun 29, 2018 49 50 51 where :math:\mathcal{D} denotes the dataset used. Since both the FPR and the FNR depends on the threshold :math:\tau, they are strongly related to each other: increasing the FPR will reduce the FNR and vice-versa. For this reason,  André Anjos committed Dec 12, 2013 52 53 results are often presented using either a Receiver Operating Characteristic (ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically  Theophile GENTILHOMME committed Jun 29, 2018 54 present the FPR versus the FNR for different values of the threshold. Another  André Anjos committed Dec 12, 2013 55 widely used measure to summarise the performance of a system is the Equal Error  Theophile GENTILHOMME committed Jun 29, 2018 56 57 Rate (EER), defined as the point along the ROC or DET curve where the FPR equals the FNR.  André Anjos committed Dec 12, 2013 58 59 60 61 62 63 64 65  However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be misleading when comparing systems. Hence, the so-called Expected Performance Curve (EPC) was proposed and consists of an unbiased estimate of the reachable performance of a system at various operating points. Indeed, in real-world scenarios, the threshold :math:\tau has to be set a priori: this is typically done using a development set (also called cross-validation set). Nevertheless, the optimal threshold can be different depending on the relative importance  Theophile GENTILHOMME committed Jun 29, 2018 66 67 given to the FPR and the FNR. Hence, in the EPC framework, the cost :math:\beta \in [0;1] is defined as the trade-off between the FPR and FNR.  André Anjos committed Nov 08, 2016 68 The optimal threshold :math:\tau^* is then computed using different values of  André Anjos committed Dec 12, 2013 69 70 71 :math:\beta, corresponding to different operating points: .. math::  Theophile GENTILHOMME committed Jun 29, 2018 72  \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FPR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FNR}(\tau, \mathcal{D}_{d})  André Anjos committed Dec 12, 2013 73   André Anjos committed Sep 28, 2016 74   André Anjos committed Dec 12, 2013 75 where :math:\mathcal{D}_{d} denotes the development set and should be  André Anjos committed Sep 28, 2016 76 completely separate to the evaluation set :math:\mathcal{D}.  André Anjos committed Dec 12, 2013 77   78 79 Performance for different values of :math:\beta is then computed on the evaluation  André Anjos committed Dec 12, 2013 80 81 82 set :math:\mathcal{D}_{t} using the previously derived threshold. Note that setting :math:\beta to 0.5 yields to the Half Total Error Rate (HTER) as defined in the first equation.  André Anjos committed Nov 21, 2013 83 84 85  .. note::  Manuel Günther committed Nov 25, 2015 86  Most of the methods available in this module require as input a set of 2  André Anjos committed Dec 12, 2013 87 88 89 90 91 92  :py:class:numpy.ndarray objects that contain the scores obtained by the classification system to be evaluated, without specific order. Most of the classes that are defined to deal with two-class problems. Therefore, in this setting, and throughout this manual, we have defined that the **negatives** represents the impostor attacks or false class accesses (that is when a sample of class A is given to the classifier of another class, such as class  Manuel Günther committed Nov 25, 2015 93  B) for of the classifier. The second set, referred as the **positives**  André Anjos committed Dec 12, 2013 94 95 96 97 98 99 100 101 102  represents the true class accesses or signal response of the classifier. The vectors are called this way because the procedures implemented in this module expects that the scores of **negatives** to be statistically distributed to the left of the signal scores (the **positives**). If that is not the case, one should either invert the input to the methods or multiply all scores available by -1, in order to have them inverted. The input to create these two vectors is generated by experiments conducted by the user and normally sits in files that may need some parsing before  Amir MOHAMMADI committed May 03, 2018 103  these vectors can be extracted. While it is not possible to provide a parser  Theophile GENTILHOMME committed Mar 29, 2018 104 105 106  for every individual file that may be generated in different experimental frameworks, we do provide a parser for a generic two columns format where the first column contains -1/1 for negative/positive and the second column  Amir MOHAMMADI committed May 03, 2018 107  contains score values. Please refer to the documentation of  Theophile GENTILHOMME committed Mar 29, 2018 108  :py:func:bob.measure.load.split for more details.  André Anjos committed Dec 12, 2013 109   Manuel Günther committed Nov 25, 2015 110  In the remainder of this section we assume you have successfully parsed and  André Anjos committed Dec 12, 2013 111 112  loaded your scores in two 1D float64 vectors and are ready to evaluate the performance of the classifier.  André Anjos committed Nov 21, 2013 113   114 115 Verification ------------  André Anjos committed Nov 21, 2013 116   André Anjos committed Dec 12, 2013 117 118 To count the number of correctly classified positives and negatives you can use the following techniques:  André Anjos committed Nov 21, 2013 119 120 121  .. doctest::  Manuel Günther committed Nov 25, 2015 122 123 124  >>> # negatives, positives = parse_my_scores(...) # write parser if not provided! >>> T = 0.0 #Threshold: later we explain how one can calculate these >>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T)  Theophile GENTILHOMME committed Jun 29, 2018 125  >>> FPR = 1 - (float(correct_negatives.sum())/negatives.size)  Manuel Günther committed Nov 25, 2015 126  >>> correct_positives = bob.measure.correctly_classified_positives(positives, T)  Theophile GENTILHOMME committed Jun 29, 2018 127  >>> FNR = 1 - (float(correct_positives.sum())/positives.size)  André Anjos committed Nov 21, 2013 128   Theophile GENTILHOMME committed Jun 29, 2018 129 We do provide a method to calculate the FPR and FNR in a single shot:  André Anjos committed Nov 21, 2013 130 131 132  .. doctest::  Theophile GENTILHOMME committed Jun 29, 2018 133  >>> FPR, FNR = bob.measure.farfrr(negatives, positives, T)  André Anjos committed Nov 21, 2013 134   André Anjos committed Dec 12, 2013 135 136 137 The threshold T is normally calculated by looking at the distribution of negatives and positives in a development (or validation) set, selecting a threshold that matches a certain criterion and applying this derived threshold  138 to the evaluation set. This technique gives a better overview of the  André Anjos committed Dec 12, 2013 139 140 generalization of a method. We implement different techniques for the calculation of the threshold:  André Anjos committed Nov 21, 2013 141   André Anjos committed Dec 12, 2013 142 * Threshold for the EER  André Anjos committed Nov 21, 2013 143   André Anjos committed Dec 12, 2013 144  .. doctest::  André Anjos committed Nov 21, 2013 145   André Anjos committed May 26, 2014 146  >>> T = bob.measure.eer_threshold(negatives, positives)  André Anjos committed Nov 21, 2013 147   André Anjos committed Dec 12, 2013 148 * Threshold for the minimum HTER  André Anjos committed Nov 21, 2013 149   André Anjos committed Dec 12, 2013 150  .. doctest::  André Anjos committed Nov 21, 2013 151   André Anjos committed May 26, 2014 152  >>> T = bob.measure.min_hter_threshold(negatives, positives)  André Anjos committed Nov 21, 2013 153   André Anjos committed Dec 12, 2013 154 155 * Threshold for the minimum weighted error rate (MWER) given a certain cost :math:\beta.  André Anjos committed Nov 21, 2013 156   Manuel Günther committed Nov 25, 2015 157  .. doctest:: python  André Anjos committed Nov 21, 2013 158   André Anjos committed Dec 12, 2013 159  >>> cost = 0.3 #or "beta"  André Anjos committed May 26, 2014 160  >>> T = bob.measure.min_weighted_error_rate_threshold(negatives, positives, cost)  André Anjos committed Nov 21, 2013 161   André Anjos committed Dec 12, 2013 162  .. note::  André Anjos committed Nov 21, 2013 163   Manuel Günther committed Nov 25, 2015 164 165 166  By setting cost to 0.5 is equivalent to use :py:func:bob.measure.min_hter_threshold.  Amir MOHAMMADI committed Jun 07, 2018 167 168 169 170 171 172  .. important:: Often, it is not numerically possible to match the requested criteria for calculating the threshold based on the provided scores. Instead, the closest possible threshold is returned. For example, using :any:bob.measure.eer_threshold **will not** give you a threshold where  Theophile GENTILHOMME committed Jun 29, 2018 173 174  :math:FPR == FNR. Hence, you cannot report :math:FPR or :math:FNR instead of :math:EER; you should report :math:(FPR+FNR)/2 instead. This  Amir MOHAMMADI committed Jun 07, 2018 175 176 177  is also true for :any:bob.measure.far_threshold and :any:bob.measure.frr_threshold. The threshold returned by those functions does not guarantee that using that threshold you will get the requested  Theophile GENTILHOMME committed Jun 29, 2018 178  :math:FPR or :math:FNR value. Instead, you should recalculate using  Amir MOHAMMADI committed Jun 07, 2018 179 180  :any:bob.measure.farfrr.  Manuel Günther committed Nov 25, 2015 181 182 183 184 185 186 187 188 189 190 191 192 193 .. note:: Many functions in bob.measure have an is_sorted parameter, which defaults to False, throughout. However, these functions need sorted positive and/or negative scores. If scores are not in ascendantly sorted order, internally, they will be copied -- twice! To avoid scores to be copied, you might want to sort the scores in ascending order, e.g., by: .. doctest:: python >>> negatives.sort() >>> positives.sort() >>> t = bob.measure.min_weighted_error_rate_threshold(negatives, positives, cost, is_sorted = True) >>> assert T == t  194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 Identification -------------- For identification, the Recognition Rate is one of the standard measures. To compute recognition rates, you can use the :py:func:bob.measure.recognition_rate function. This function expects a relatively complex data structure, which is the same as for the CMC_ below. For each probe item, the scores for negative and positive comparisons are computed, and collected for all probe items: .. doctest:: >>> rr_scores = [] >>> for probe in range(10): ... pos = numpy.random.normal(1, 1, 1) ... neg = numpy.random.normal(0, 1, 19) ... rr_scores.append((neg, pos))  209  >>> rr = bob.measure.recognition_rate(rr_scores, rank=1)  210 211 212 213 214 215 216 217 218 219 220 221 222 223 224  For open set identification, according to Li and Jain (2005) there are two different error measures defined. The first measure is the :py:func:bob.measure.detection_identification_rate, which counts the number of correctly classified in-gallery probe items. The second measure is the :py:func:bob.measure.false_alarm_rate, which counts, how often an out-of-gallery probe item was incorrectly accepted. Both rates can be computed using the same data structure, with one exception. Both functions require that at least one probe item exists, which has no according gallery item, i.e., where the positives are empty or None: (continued from above...) .. doctest:: >>> for probe in range(10): ... pos = None ... neg = numpy.random.normal(-2, 1, 10) ... rr_scores.append((neg, pos))  225 226  >>> dir = bob.measure.detection_identification_rate(rr_scores, threshold = 0, rank=1) >>> far = bob.measure.false_alarm_rate(rr_scores, threshold = 0)  227   Theophile GENTILHOMME committed Mar 20, 2018 228 229 230 Confidence interval -------------------  Theophile GENTILHOMME committed Mar 22, 2018 231 232 233 234 235 236 A confidence interval for parameter :math:x consists of a lower estimate :math:L, and an upper estimate :math:U, such that the probability of the true value being within the interval estimate is equal to :math:\alpha. For example, a 95% confidence interval (i.e. :math:\alpha = 0.95) for a parameter :math:x is given by :math:[L, U] such that  Amir MOHAMMADI committed May 03, 2018 237 .. math:: Prob(x∈[L,U]) = 95%  Theophile GENTILHOMME committed Mar 22, 2018 238   Amir MOHAMMADI committed May 03, 2018 239 The smaller the test size, the wider the confidence  Theophile GENTILHOMME committed Mar 22, 2018 240 interval will be, and the greater :math:\alpha, the smaller the confidence interval  Theophile GENTILHOMME committed Mar 20, 2018 241 242 243 will be. The Clopper-Pearson interval_, a common method for calculating  Amir MOHAMMADI committed May 03, 2018 244 confidence intervals, is function of the number of success, the number of trials  Theophile GENTILHOMME committed Mar 20, 2018 245 and confidence  Theophile GENTILHOMME committed Mar 22, 2018 246 value :math:\alpha is used as :py:func:bob.measure.utils.confidence_for_indicator_variable.  Theophile GENTILHOMME committed Mar 20, 2018 247 It is based on the cumulative probabilities of the binomial distribution. This  Amir MOHAMMADI committed May 03, 2018 248 method is quite conservative, meaning that the true coverage rate of a 95%  Theophile GENTILHOMME committed Mar 22, 2018 249 Clopper–Pearson interval may be well above 95%.  Theophile GENTILHOMME committed Mar 20, 2018 250   Theophile GENTILHOMME committed Mar 22, 2018 251 252 253 254 255 256 257 258 259 260 261 For example, we want to evaluate the reliability of a system to identify registered persons. Let's say that among 10,000 accepted transactions, 9856 are true matches. The 95% confidence interval for true match rate is then: .. doctest:: python >>> bob.measure.utils.confidence_for_indicator_variable(9856, 10000) (0.98306835053282549, 0.98784270928084694) meaning there is a 95% probability that the true match rate is inside :math:[0.983, 0.988].  André Anjos committed Nov 21, 2013 262   André Anjos committed Dec 12, 2013 263 264 Plotting --------  André Anjos committed Nov 21, 2013 265   André Anjos committed Dec 12, 2013 266 An image is worth 1000 words, they say. You can combine the capabilities of  André Anjos committed Sep 29, 2016 267 268 Matplotlib_ with |project| to plot a number of curves. However, you must have that package installed though. In this section we describe a few recipes.  André Anjos committed Nov 21, 2013 269   André Anjos committed Dec 12, 2013 270 271 ROC ===  André Anjos committed Nov 21, 2013 272   André Anjos committed Dec 12, 2013 273 274 275 The Receiver Operating Characteristic (ROC) curve is one of the oldest plots in town. To plot an ROC curve, in possession of your **negatives** and **positives**, just do something along the lines of:  André Anjos committed Nov 21, 2013 276 277 278  .. doctest::  Manuel Günther committed Nov 25, 2015 279 280 281 282  >>> from matplotlib import pyplot >>> # we assume you have your negatives and positives already split >>> npoints = 100 >>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP  Theophile GENTILHOMME committed Jun 29, 2018 283 284  >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP  Manuel Günther committed Nov 25, 2015 285 286  >>> pyplot.grid(True) >>> pyplot.show() # doctest: +SKIP  André Anjos committed Nov 21, 2013 287   André Anjos committed Dec 12, 2013 288 You should see an image like the following one:  André Anjos committed Nov 21, 2013 289   André Anjos committed Feb 18, 2014 290 291 292 .. plot:: import numpy  293  numpy.random.seed(42)  André Anjos committed May 26, 2014 294  import bob.measure  André Anjos committed Feb 18, 2014 295 296 297 298 299  from matplotlib import pyplot positives = numpy.random.normal(1,1,100) negatives = numpy.random.normal(-1,1,100) npoints = 100  André Anjos committed May 26, 2014 300  bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')  André Anjos committed Feb 18, 2014 301  pyplot.grid(True)  Theophile GENTILHOMME committed Jun 29, 2018 302 303  pyplot.xlabel('FPR (%)') pyplot.ylabel('FNR (%)')  André Anjos committed Feb 18, 2014 304  pyplot.title('ROC')  André Anjos committed Nov 21, 2013 305   André Anjos committed Dec 12, 2013 306 As can be observed, plotting methods live in the namespace  André Anjos committed Sep 29, 2016 307 308 309 310 311 312 313 314 315 316 317 318 :py:mod:bob.measure.plot. They work like the :py:func:matplotlib.pyplot.plot itself, except that instead of receiving the x and y point coordinates as parameters, they receive the two :py:class:numpy.ndarray arrays with negatives and positives, as well as an indication of the number of points the curve must contain. As in the :py:func:matplotlib.pyplot.plot command, you can pass optional parameters for the line as shown in the example to setup its color, shape and even the label. For an overview of the keywords accepted, please refer to the Matplotlib_'s Documentation. Other plot properties such as the plot title, axis labels, grids, legends should be controlled directly using the relevant Matplotlib_'s controls.  André Anjos committed Nov 21, 2013 319   André Anjos committed Dec 12, 2013 320 321 DET ===  André Anjos committed Nov 21, 2013 322   André Anjos committed Dec 12, 2013 323 A DET curve can be drawn using similar commands such as the ones for the ROC curve:  André Anjos committed Nov 21, 2013 324 325 326  .. doctest::  André Anjos committed Dec 12, 2013 327 328 329  >>> from matplotlib import pyplot >>> # we assume you have your negatives and positives already split >>> npoints = 100  André Anjos committed May 26, 2014 330 331  >>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP  Theophile GENTILHOMME committed Jun 29, 2018 332 333  >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP  André Anjos committed Dec 12, 2013 334 335  >>> pyplot.grid(True) >>> pyplot.show() # doctest: +SKIP  André Anjos committed Nov 21, 2013 336   André Anjos committed Dec 12, 2013 337 This will produce an image like the following one:  André Anjos committed Nov 21, 2013 338   André Anjos committed Feb 18, 2014 339 340 341 .. plot:: import numpy  342  numpy.random.seed(42)  André Anjos committed May 26, 2014 343  import bob.measure  André Anjos committed Feb 18, 2014 344 345 346 347 348 349  from matplotlib import pyplot positives = numpy.random.normal(1,1,100) negatives = numpy.random.normal(-1,1,100) npoints = 100  André Anjos committed May 26, 2014 350 351  bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') bob.measure.plot.det_axis([0.1, 80, 0.1, 80])  André Anjos committed Feb 18, 2014 352  pyplot.grid(True)  Theophile GENTILHOMME committed Jun 29, 2018 353 354  pyplot.xlabel('FPR (%)') pyplot.ylabel('FNR (%)')  André Anjos committed Feb 18, 2014 355  pyplot.title('DET')  André Anjos committed Nov 21, 2013 356 357 358  .. note::  André Anjos committed Dec 12, 2013 359 360 361  If you wish to reset axis zooming, you must use the Gaussian scale rather than the visual marks showed at the plot, which are just there for displaying purposes. The real axis scale is based on the  Manuel Günther committed Oct 28, 2014 362  :py:func:bob.measure.ppndf method. For example, if you wish to set the x and y  André Anjos committed Dec 12, 2013 363  axis to display data between 1% and 40% here is the recipe:  André Anjos committed Nov 21, 2013 364   André Anjos committed Dec 12, 2013 365  .. doctest::  André Anjos committed Nov 21, 2013 366   André Anjos committed Dec 12, 2013 367  >>> #AFTER you plot the DET curve, just set the axis in this way:  André Anjos committed May 26, 2014 368  >>> pyplot.axis([bob.measure.ppndf(k/100.0) for k in (1, 40, 1, 40)]) # doctest: +SKIP  André Anjos committed Nov 21, 2013 369   André Anjos committed Dec 12, 2013 370  We provide a convenient way for you to do the above in this module. So,  André Anjos committed May 26, 2014 371  optionally, you may use the bob.measure.plot.det_axis method like this:  André Anjos committed Nov 21, 2013 372   André Anjos committed Dec 12, 2013 373  .. doctest::  André Anjos committed Nov 21, 2013 374   André Anjos committed May 26, 2014 375  >>> bob.measure.plot.det_axis([1, 40, 1, 40]) # doctest: +SKIP  André Anjos committed Nov 21, 2013 376   André Anjos committed Dec 12, 2013 377 378 EPC ===  André Anjos committed Nov 21, 2013 379   Manuel Günther committed Sep 17, 2015 380 Drawing an EPC requires that both the development set negatives and positives are provided alongside  381 the evaluation set ones. Because of this the API is slightly modified:  André Anjos committed Nov 21, 2013 382 383 384  .. doctest::  André Anjos committed May 26, 2014 385  >>> bob.measure.plot.epc(dev_neg, dev_pos, test_neg, test_pos, npoints, color=(0,0,0), linestyle='-') # doctest: +SKIP  André Anjos committed Dec 12, 2013 386  >>> pyplot.show() # doctest: +SKIP  André Anjos committed Nov 21, 2013 387   André Anjos committed Dec 12, 2013 388 This will produce an image like the following one:  André Anjos committed Nov 21, 2013 389   André Anjos committed Feb 18, 2014 390 391 392 .. plot:: import numpy  393  numpy.random.seed(42)  André Anjos committed May 26, 2014 394  import bob.measure  André Anjos committed Feb 18, 2014 395 396 397 398 399 400 401  from matplotlib import pyplot dev_pos = numpy.random.normal(1,1,100) dev_neg = numpy.random.normal(-1,1,100) test_pos = numpy.random.normal(0.9,1,100) test_neg = numpy.random.normal(-1.1,1,100) npoints = 100  André Anjos committed May 26, 2014 402  bob.measure.plot.epc(dev_neg, dev_pos, test_neg, test_pos, npoints, color=(0,0,0), linestyle='-')  André Anjos committed Feb 18, 2014 403 404  pyplot.grid(True) pyplot.title('EPC')  André Anjos committed Nov 21, 2013 405   Manuel Günther committed Sep 17, 2015 406 407 408 409  CMC ===  André Anjos committed Sep 29, 2016 410 411 412 413 414 The Cumulative Match Characteristics (CMC) curve estimates the probability that the correct model is in the *N* models with the highest similarity to a given probe. A CMC curve can be plotted using the :py:func:bob.measure.plot.cmc function. The CMC can be calculated from a relatively complex data structure, which defines a pair of positive and negative scores **per probe**:  Manuel Günther committed Sep 17, 2015 415 416 417 418  .. plot:: import numpy  419  numpy.random.seed(42)  Manuel Günther committed Sep 17, 2015 420 421 422  import bob.measure from matplotlib import pyplot  423  cmc_scores = []  Manuel Günther committed Sep 17, 2015 424 425 426  for probe in range(10): positives = numpy.random.normal(1, 1, 1) negatives = numpy.random.normal(0, 1, 19)  427 428  cmc_scores.append((negatives, positives)) bob.measure.plot.cmc(cmc_scores, logx=False)  André Anjos committed Sep 28, 2016 429  pyplot.grid(True)  Manuel Günther committed Sep 17, 2015 430 431 432 433 434  pyplot.title('CMC') pyplot.xlabel('Rank') pyplot.xticks([1,5,10,20]) pyplot.xlim([1,20]) pyplot.ylim([0,100])  435  pyplot.ylabel('Probability of Recognition (%)')  Manuel Günther committed Sep 17, 2015 436 437 438  Usually, there is only a single positive score per probe, but this is not a fixed restriction.  439 440 441 442  Detection & Identification Curve ================================  André Anjos committed Sep 28, 2016 443 444 445 446 The detection & identification curve is designed to evaluate open set identification tasks. It can be plotted using the :py:func:bob.measure.plot.detection_identification_curve function, but it requires at least one open-set probe, i.e., where no corresponding positive  Theophile GENTILHOMME committed Jun 29, 2018 447 score exists, for which the FPR values are computed. Here, we plot the  André Anjos committed Sep 28, 2016 448 detection and identification curve for rank 1, so that the recognition rate for  Theophile GENTILHOMME committed Jun 29, 2018 449 FPR=1 will be identical to the rank one :py:func:bob.measure.recognition_rate  André Anjos committed Sep 28, 2016 450 obtained in the CMC plot above.  451 452 453 454 455 456 457 458 459  .. plot:: import numpy numpy.random.seed(42) import bob.measure from matplotlib import pyplot cmc_scores = []  Manuel Günther committed Oct 16, 2017 460  for probe in range(1000):  461 462 463  positives = numpy.random.normal(1, 1, 1) negatives = numpy.random.normal(0, 1, 19) cmc_scores.append((negatives, positives))  Manuel Günther committed Oct 16, 2017 464  for probe in range(1000):  465 466 467 468 469  negatives = numpy.random.normal(-1, 1, 10) cmc_scores.append((negatives, None)) bob.measure.plot.detection_identification_curve(cmc_scores, rank=1, logx=True) pyplot.xlabel('False Alarm Rate')  Manuel Günther committed Oct 16, 2017 470  pyplot.xlim([0.0001, 1])  471  pyplot.ylabel('Detection & Identification Rate (%)')  Manuel Günther committed Oct 16, 2017 472  pyplot.ylim([0,1])  473 474 475   André Anjos committed Dec 12, 2013 476 477 Fine-tunning ============  André Anjos committed Nov 21, 2013 478   André Anjos committed May 26, 2014 479 480 The methods inside :py:mod:bob.measure.plot are only provided as a Matplotlib_ wrapper to equivalent methods in :py:mod:bob.measure that can  André Anjos committed Dec 12, 2013 481 482 only calculate the points without doing any plotting. You may prefer to tweak the plotting or even use a different plotting system such as gnuplot. Have a  André Anjos committed Sep 28, 2016 483 484 485 486 look at the implementations at :py:mod:bob.measure.plot to understand how to use the |project| methods to compute the curves and interlace that in the way that best suits you.  Amir MOHAMMADI committed May 03, 2018 487 488 .. _bob.measure.command_line:  Theophile GENTILHOMME committed Mar 29, 2018 489 490 491 492 493 494 495 496 497 498 499 500 Full applications ----------------- Commands under bob measure can be used to quickly evaluate a set of scores and generate plots. We present these commands in this section. The commands take as input generic 2-column data format as specified in the documentation of :py:func:bob.measure.load.split Metrics ======= To calculate the threshold using a certain criterion (EER (default) or min.HTER)  Theophile GENTILHOMME committed Jun 29, 2018 501 502 on a development set and conduct the threshold computation and its performance on an evaluation set, after setting up |project|, just do:  Theophile GENTILHOMME committed Mar 29, 2018 503 504 505  .. code-block:: sh  Theophile GENTILHOMME committed Jun 29, 2018 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520  ./bin/bob measure metrics ./MTest1/scores-{dev,eval} -e [Min. criterion: EER ] Threshold on Development set ./MTest1/scores-dev: -1.373550e-02 bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-dev bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval =================== ================ ================ .. Development Evaluation =================== ================ ================ False Positive Rate 15.5% (767/4942) 15.5% (767/4942) False Negative Rate 15.5% (769/4954) 15.5% (769/4954) Precision 0.8 0.8 Recall 0.8 0.8 F1-score 0.8 0.8 =================== ================ ================ The output will present the threshold together with the FPR, FNR, Precision, Recall, F1-score and  Theophile GENTILHOMME committed Mar 29, 2018 521 522 523 524 525 HTER on the given set, calculated using such a threshold. The relative counts of FAs and FRs are also displayed between parenthesis. .. note:: Several scores files can be given at once and the metrics will be computed  526  for each of them separatly. Development and evaluation files must be given by  Theophile GENTILHOMME committed Jun 06, 2018 527  pairs. When evaluation files are provided, --eval flag  528  must be given.  Theophile GENTILHOMME committed Mar 29, 2018 529 530 531 532 533 534 535  To evaluate the performance of a new score file with a given threshold, use --thres: .. code-block:: sh  Theophile GENTILHOMME committed Jun 29, 2018 536 537 538 539 540 541 542 543 544 545 546 547 548  ./bin/bob measure metrics ./MTest1/scores-eval --thres 0.006 [Min. criterion: user provided] Threshold on Development set ./MTest1/scores-eval: 6.000000e-03 bob.measure@2018-06-29 10:22:06,852 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval =================== ================ .. Development =================== ================ False Positive Rate 15.2% (751/4942) False Negative Rate 16.1% (796/4954) Precision 0.8 Recall 0.8 F1-score 0.8 =================== ================  Theophile GENTILHOMME committed Mar 29, 2018 549 550  You can simultaneously conduct the threshold computation and its performance  551 on an evaluation set:  Theophile GENTILHOMME committed Mar 29, 2018 552 553  .. note::  Theophile GENTILHOMME committed Apr 13, 2018 554 555  Table format can be changed using --tablefmt option, the default format being rst. Please refer to bob measure metrics --help for more details.  Amir MOHAMMADI committed May 03, 2018 556   Theophile GENTILHOMME committed Mar 29, 2018 557 558 559 560 561  Plots ===== Customizable plotting commands are available in the :py:mod:bob.measure module.  562 They take a list of development and/or evaluation files and generate a single PDF  Theophile GENTILHOMME committed Mar 29, 2018 563 564 565 566 567 568 569 570 571 572 573 574 575 file containing the plots. Available plots are: * roc (receiver operating characteristic) * det (detection error trade-off) * epc (expected performance curve) * hist (histograms of positive and negatives) Use the --help option on the above-cited commands to find-out about more options.  576 For example, to generate a DET curve from development and evaluation datasets:  Theophile GENTILHOMME committed Mar 29, 2018 577 578 579  .. code-block:: sh  Theophile GENTILHOMME committed Jun 06, 2018 580  $bob measure det -e -v --output "my_det.pdf" -ts "DetDev1,DetEval1,DetDev2,DetEval2"  Theophile GENTILHOMME committed Jun 05, 2018 581  dev-1.txt eval-1.txt dev-2.txt eval-2.txt  Theophile GENTILHOMME committed Mar 29, 2018 582 583 584 585 586 587 588 589  where my_det.pdf will contain DET plots for the two experiments. .. note:: By default, det and roc plot development and evaluation curves on different plots. You can force gather everything in the same plot using --no-split option.  Amir MOHAMMADI committed May 03, 2018 590 591 592 593 594 .. note:: The --figsize and --style options are two powerful options that can dramatically change the appearance of your figures. Try them! (e.g. --figsize 12,10 --style grayscale)  Theophile GENTILHOMME committed Mar 29, 2018 595 596 597 Evaluate ========  Theophile GENTILHOMME committed Apr 13, 2018 598 599 A convenient command evaluate is provided to generate multiple metrics and plots for a list of experiments. It generates two metrics outputs with ERR  Theophile GENTILHOMME committed May 03, 2018 600 and min-HTER criteria along with roc, det, epc, hist plots for each  Theophile GENTILHOMME committed Mar 29, 2018 601 602 603 604 experiment. For example: .. code-block:: sh  Theophile GENTILHOMME committed Jun 06, 2018 605 $bob measure evaluate -e -v -l 'my_metrics.txt' -o 'my_plots.pdf' {sys1,sys2}/{dev,eval}  Theophile GENTILHOMME committed Mar 29, 2018 606   607 will output metrics and plots for the two experiments (dev and eval pairs) in  Theophile GENTILHOMME committed Mar 29, 2018 608 609 my_metrics.txt and my_plots.pdf, respectively.  André Anjos committed Nov 21, 2013 610 611 .. include:: links.rst  André Anjos committed Dec 12, 2013 612 .. Place youre references here:  André Anjos committed Nov 21, 2013 613   André Anjos committed Dec 12, 2013 614 615 .. _The Expected Performance Curve: http://publications.idiap.ch/downloads/reports/2005/bengio_2005_icml.pdf .. _The DET curve in assessment of detection task performance: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.4489&rep=rep1&type=pdf  Theophile GENTILHOMME committed Mar 20, 2018 616 .. _The Clopper-Pearson interval: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper-Pearson_interval