guide.rst 25.1 KB
Newer Older
André Anjos's avatar
André Anjos committed
1 2 3 4
.. vim: set fileencoding=utf-8 :
.. Andre Anjos <andre.dos.anjos@gmail.com>
.. Tue 15 Oct 17:41:52 2013

André Anjos's avatar
André Anjos committed
5
.. testsetup:: *
André Anjos's avatar
André Anjos committed
6

André Anjos's avatar
André Anjos committed
7 8 9 10 11 12
  import numpy
  positives = numpy.random.normal(1,1,100)
  negatives = numpy.random.normal(-1,1,100)
  import matplotlib
  if not hasattr(matplotlib, 'backends'):
    matplotlib.use('pdf') #non-interactive avoids exception on display
André Anjos's avatar
André Anjos committed
13
  import bob.measure
André Anjos's avatar
André Anjos committed
14 15 16 17 18

============
 User Guide
============

André Anjos's avatar
André Anjos committed
19
Methods in the :py:mod:`bob.measure` module can help you to quickly and easily
20 21
evaluate error for multi-class or binary classification problems. If you are
not yet familiarized with aspects of performance evaluation, we recommend the
22 23
following papers and book chapters for an overview of some of the implemented
methods.
24 25 26 27 28 29 30

* Bengio, S., Keller, M., Mariéthoz, J. (2004). `The Expected Performance
  Curve`_.  International Conference on Machine Learning ICML Workshop on ROC
  Analysis in Machine Learning, 136(1), 19631966.
* Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997).
  `The DET curve in assessment of detection task performance`_. Fifth European
  Conference on Speech Communication and Technology (pp. 1895-1898).
31 32
* Li, S., Jain, A.K. (2005), `Handbook of Face Recognition`, Chapter 14, Springer

33 34 35 36 37 38 39 40 41 42 43 44 45

Overview
--------

A classifier is subject to two types of errors, either the real access/signal
is rejected (false rejection) or an impostor attack/a false access is accepted
(false acceptance). A possible way to measure the detection performance is to
use the Half Total Error Rate (HTER), which combines the False Rejection Rate
(FRR) and the False Acceptance Rate (FAR) and is defined in the following
formula:

.. math::

46
   HTER(\tau, \mathcal{D}) = \frac{FAR(\tau, \mathcal{D}) + FRR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

where :math:`\mathcal{D}` denotes the dataset used. Since both the FAR and the
FRR depends on the threshold :math:`\tau`, they are strongly related to each
other: increasing the FAR will reduce the FRR and vice-versa. For this reason,
results are often presented using either a Receiver Operating Characteristic
(ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically
present the FAR versus the FRR for different values of the threshold. Another
widely used measure to summarise the performance of a system is the Equal Error
Rate (EER), defined as the point along the ROC or DET curve where the FAR
equals the FRR.

However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be
misleading when comparing systems. Hence, the so-called Expected Performance
Curve (EPC) was proposed and consists of an unbiased estimate of the reachable
performance of a system at various operating points.  Indeed, in real-world
scenarios, the threshold :math:`\tau` has to be set a priori: this is typically
done using a development set (also called cross-validation set). Nevertheless,
the optimal threshold can be different depending on the relative importance
given to the FAR and the FRR. Hence, in the EPC framework, the cost
66
:math:`\beta \in [0;1]` is defined as the trade-off between the FAR and FRR. The
67 68 69 70 71 72 73 74 75 76 77 78 79
optimal threshold :math:`\tau^*` is then computed using different values of
:math:`\beta`, corresponding to different operating points:

.. math::
  \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FAR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FRR}(\tau, \mathcal{D}_{d})

where :math:`\mathcal{D}_{d}` denotes the development set and should be
completely separate to the evaluation set `\mathcal{D}`.

Performance for different values of :math:`\beta` is then computed on the test
set :math:`\mathcal{D}_{t}` using the previously derived threshold. Note that
setting :math:`\beta` to 0.5 yields to the Half Total Error Rate (HTER) as
defined in the first equation.
André Anjos's avatar
André Anjos committed
80 81 82

.. note::

83
  Most of the methods available in this module require as input a set of 2
84 85 86 87 88 89
  :py:class:`numpy.ndarray` objects that contain the scores obtained by the
  classification system to be evaluated, without specific order. Most of the
  classes that are defined to deal with two-class problems. Therefore, in this
  setting, and throughout this manual, we have defined that the **negatives**
  represents the impostor attacks or false class accesses (that is when a
  sample of class A is given to the classifier of another class, such as class
90
  B) for of the classifier. The second set, referred as the **positives**
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
  represents the true class accesses or signal response of the classifier. The
  vectors are called this way because the procedures implemented in this module
  expects that the scores of **negatives** to be statistically distributed to
  the left of the signal scores (the **positives**). If that is not the case,
  one should either invert the input to the methods or multiply all scores
  available by -1, in order to have them inverted.

  The input to create these two vectors is generated by experiments conducted
  by the user and normally sits in files that may need some parsing before
  these vectors can be extracted.

  While it is not possible to provide a parser for every individual file that
  may be generated in different experimental frameworks, we do provide a few
  parsers for formats we use the most. Please refer to the documentation of
  :py:mod:`bob.measure.load` for a list of formats and details.

107
  In the remainder of this section we assume you have successfully parsed and
108 109
  loaded your scores in two 1D float64 vectors and are ready to evaluate the
  performance of the classifier.
André Anjos's avatar
André Anjos committed
110

111 112
Verification
------------
André Anjos's avatar
André Anjos committed
113

114 115
To count the number of correctly classified positives and negatives you can use
the following techniques:
André Anjos's avatar
André Anjos committed
116 117 118

.. doctest::

119 120 121 122 123 124
   >>> # negatives, positives = parse_my_scores(...) # write parser if not provided!
   >>> T = 0.0 #Threshold: later we explain how one can calculate these
   >>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T)
   >>> FAR = 1 - (float(correct_negatives.sum())/negatives.size)
   >>> correct_positives = bob.measure.correctly_classified_positives(positives, T)
   >>> FRR = 1 - (float(correct_positives.sum())/positives.size)
André Anjos's avatar
André Anjos committed
125

126
We do provide a method to calculate the FAR and FRR in a single shot:
André Anjos's avatar
André Anjos committed
127 128 129

.. doctest::

130
   >>> FAR, FRR = bob.measure.farfrr(negatives, positives, T)
André Anjos's avatar
André Anjos committed
131

132 133 134 135 136 137
The threshold ``T`` is normally calculated by looking at the distribution of
negatives and positives in a development (or validation) set, selecting a
threshold that matches a certain criterion and applying this derived threshold
to the test (or evaluation) set. This technique gives a better overview of the
generalization of a method. We implement different techniques for the
calculation of the threshold:
André Anjos's avatar
André Anjos committed
138

139
* Threshold for the EER
André Anjos's avatar
André Anjos committed
140

141
  .. doctest::
André Anjos's avatar
André Anjos committed
142

André Anjos's avatar
André Anjos committed
143
    >>> T = bob.measure.eer_threshold(negatives, positives)
André Anjos's avatar
André Anjos committed
144

145
* Threshold for the minimum HTER
André Anjos's avatar
André Anjos committed
146

147
  .. doctest::
André Anjos's avatar
André Anjos committed
148

André Anjos's avatar
André Anjos committed
149
    >>> T = bob.measure.min_hter_threshold(negatives, positives)
André Anjos's avatar
André Anjos committed
150

151 152
* Threshold for the minimum weighted error rate (MWER) given a certain cost
  :math:`\beta`.
André Anjos's avatar
André Anjos committed
153

154
  .. doctest:: python
André Anjos's avatar
André Anjos committed
155

156
     >>> cost = 0.3 #or "beta"
André Anjos's avatar
André Anjos committed
157
     >>> T = bob.measure.min_weighted_error_rate_threshold(negatives, positives, cost)
André Anjos's avatar
André Anjos committed
158

159
  .. note::
André Anjos's avatar
André Anjos committed
160

161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176
     By setting cost to 0.5 is equivalent to use
     :py:func:`bob.measure.min_hter_threshold`.

.. note::
   Many functions in ``bob.measure`` have an ``is_sorted`` parameter, which defaults to ``False``, throughout.
   However, these functions need sorted ``positive`` and/or ``negative`` scores.
   If scores are not in ascendantly sorted order, internally, they will be copied -- twice!
   To avoid scores to be copied, you might want to sort the scores in ascending order, e.g., by:

   .. doctest:: python

      >>> negatives.sort()
      >>> positives.sort()
      >>> t = bob.measure.min_weighted_error_rate_threshold(negatives, positives, cost, is_sorted = True)
      >>> assert T == t

177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
Identification
--------------

For identification, the Recognition Rate is one of the standard measures.
To compute recognition rates, you can use the :py:func:`bob.measure.recognition_rate` function.
This function expects a relatively complex data structure, which is the same as for the `CMC`_ below.
For each probe item, the scores for negative and positive comparisons are computed, and collected for all probe items:

.. doctest::

   >>> rr_scores = []
   >>> for probe in range(10):
   ...   pos = numpy.random.normal(1, 1, 1)
   ...   neg = numpy.random.normal(0, 1, 19)
   ...   rr_scores.append((neg, pos))
   >>> bob.measure.recognition_rate(rr_scores, rank=1)
   0.3

For open set identification, according to Li and Jain (2005) there are two different error measures defined.
The first measure is the :py:func:`bob.measure.detection_identification_rate`, which counts the number of correctly classified in-gallery probe items.
The second measure is the :py:func:`bob.measure.false_alarm_rate`, which counts, how often an out-of-gallery probe item was incorrectly accepted.
Both rates can be computed using the same data structure, with one exception.
Both functions require that at least one probe item exists, which has no according gallery item, i.e., where the positives are empty or ``None``:

(continued from above...)

.. doctest::

   >>> for probe in range(10):
   ...   pos = None
   ...   neg = numpy.random.normal(-2, 1, 10)
   ...   rr_scores.append((neg, pos))
   >>> bob.measure.detection_identification_rate(rr_scores, threshold = 0, rank=1)
   0.3
   >>> bob.measure.false_alarm_rate(rr_scores, threshold = 0)
   0.2

André Anjos's avatar
André Anjos committed
214

215 216
Plotting
--------
André Anjos's avatar
André Anjos committed
217

218 219 220
An image is worth 1000 words, they say. You can combine the capabilities of
`Matplotlib`_ with |project| to plot a number of curves. However, you must have that
package installed though. In this section we describe a few recipes.
André Anjos's avatar
André Anjos committed
221

222 223
ROC
===
André Anjos's avatar
André Anjos committed
224

225 226 227
The Receiver Operating Characteristic (ROC) curve is one of the oldest plots in
town. To plot an ROC curve, in possession of your **negatives** and
**positives**, just do something along the lines of:
André Anjos's avatar
André Anjos committed
228 229 230

.. doctest::

231 232 233 234 235 236 237 238
   >>> from matplotlib import pyplot
   >>> # we assume you have your negatives and positives already split
   >>> npoints = 100
   >>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
   >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP
   >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP
   >>> pyplot.grid(True)
   >>> pyplot.show() # doctest: +SKIP
André Anjos's avatar
André Anjos committed
239

240
You should see an image like the following one:
André Anjos's avatar
André Anjos committed
241

André Anjos's avatar
André Anjos committed
242 243 244
.. plot::

   import numpy
245
   numpy.random.seed(42)
André Anjos's avatar
André Anjos committed
246
   import bob.measure
André Anjos's avatar
André Anjos committed
247 248 249 250 251
   from matplotlib import pyplot

   positives = numpy.random.normal(1,1,100)
   negatives = numpy.random.normal(-1,1,100)
   npoints = 100
André Anjos's avatar
André Anjos committed
252
   bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
André Anjos's avatar
André Anjos committed
253 254 255 256
   pyplot.grid(True)
   pyplot.xlabel('FAR (%)')
   pyplot.ylabel('FRR (%)')
   pyplot.title('ROC')
André Anjos's avatar
André Anjos committed
257

258
As can be observed, plotting methods live in the namespace
259
:py:mod:`bob.measure.plot`. They work like the :py:func:`matplotlib.pyplot.plot`
260 261 262 263
itself, except that instead of receiving the x and y point coordinates as
parameters, they receive the two :py:class:`numpy.ndarray` arrays with
negatives and positives, as well as an indication of the number of points the
curve must contain.
André Anjos's avatar
André Anjos committed
264

265
As in the :py:func:`matplotlib.pyplot.plot` command, you can pass optional parameters for
266 267 268 269 270
the line as shown in the example to setup its color, shape and even the label.
For an overview of the keywords accepted, please refer to the `Matplotlib`_'s
Documentation. Other plot properties such as the plot title, axis labels,
grids, legends should be controlled directly using the relevant `Matplotlib`_'s
controls.
André Anjos's avatar
André Anjos committed
271

272 273
DET
===
André Anjos's avatar
André Anjos committed
274

275
A DET curve can be drawn using similar commands such as the ones for the ROC curve:
André Anjos's avatar
André Anjos committed
276 277 278

.. doctest::

279 280 281
  >>> from matplotlib import pyplot
  >>> # we assume you have your negatives and positives already split
  >>> npoints = 100
André Anjos's avatar
André Anjos committed
282 283
  >>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
  >>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP
284 285 286 287
  >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP
  >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP
  >>> pyplot.grid(True)
  >>> pyplot.show() # doctest: +SKIP
André Anjos's avatar
André Anjos committed
288

289
This will produce an image like the following one:
André Anjos's avatar
André Anjos committed
290

André Anjos's avatar
André Anjos committed
291 292 293
.. plot::

   import numpy
294
   numpy.random.seed(42)
André Anjos's avatar
André Anjos committed
295
   import bob.measure
André Anjos's avatar
André Anjos committed
296 297 298 299 300 301
   from matplotlib import pyplot

   positives = numpy.random.normal(1,1,100)
   negatives = numpy.random.normal(-1,1,100)

   npoints = 100
André Anjos's avatar
André Anjos committed
302 303
   bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
   bob.measure.plot.det_axis([0.1, 80, 0.1, 80])
André Anjos's avatar
André Anjos committed
304 305 306 307
   pyplot.grid(True)
   pyplot.xlabel('FAR (%)')
   pyplot.ylabel('FRR (%)')
   pyplot.title('DET')
André Anjos's avatar
André Anjos committed
308 309 310

.. note::

311 312 313
  If you wish to reset axis zooming, you must use the Gaussian scale rather
  than the visual marks showed at the plot, which are just there for
  displaying purposes. The real axis scale is based on the
314
  :py:func:`bob.measure.ppndf` method. For example, if you wish to set the x and y
315
  axis to display data between 1% and 40% here is the recipe:
André Anjos's avatar
André Anjos committed
316

317
  .. doctest::
André Anjos's avatar
André Anjos committed
318

319
    >>> #AFTER you plot the DET curve, just set the axis in this way:
André Anjos's avatar
André Anjos committed
320
    >>> pyplot.axis([bob.measure.ppndf(k/100.0) for k in (1, 40, 1, 40)]) # doctest: +SKIP
André Anjos's avatar
André Anjos committed
321

322
  We provide a convenient way for you to do the above in this module. So,
André Anjos's avatar
André Anjos committed
323
  optionally, you may use the ``bob.measure.plot.det_axis`` method like this:
André Anjos's avatar
André Anjos committed
324

325
  .. doctest::
André Anjos's avatar
André Anjos committed
326

André Anjos's avatar
André Anjos committed
327
    >>> bob.measure.plot.det_axis([1, 40, 1, 40]) # doctest: +SKIP
André Anjos's avatar
André Anjos committed
328

329 330
EPC
===
André Anjos's avatar
André Anjos committed
331

332
Drawing an EPC requires that both the development set negatives and positives are provided alongside
333
the test (or evaluation) set ones. Because of this the API is slightly modified:
André Anjos's avatar
André Anjos committed
334 335 336

.. doctest::

André Anjos's avatar
André Anjos committed
337
  >>> bob.measure.plot.epc(dev_neg, dev_pos, test_neg, test_pos, npoints, color=(0,0,0), linestyle='-') # doctest: +SKIP
338
  >>> pyplot.show() # doctest: +SKIP
André Anjos's avatar
André Anjos committed
339

340
This will produce an image like the following one:
André Anjos's avatar
André Anjos committed
341

André Anjos's avatar
André Anjos committed
342 343 344
.. plot::

   import numpy
345
   numpy.random.seed(42)
André Anjos's avatar
André Anjos committed
346
   import bob.measure
André Anjos's avatar
André Anjos committed
347 348 349 350 351 352 353
   from matplotlib import pyplot

   dev_pos = numpy.random.normal(1,1,100)
   dev_neg = numpy.random.normal(-1,1,100)
   test_pos = numpy.random.normal(0.9,1,100)
   test_neg = numpy.random.normal(-1.1,1,100)
   npoints = 100
André Anjos's avatar
André Anjos committed
354
   bob.measure.plot.epc(dev_neg, dev_pos, test_neg, test_pos, npoints, color=(0,0,0), linestyle='-')
André Anjos's avatar
André Anjos committed
355 356
   pyplot.grid(True)
   pyplot.title('EPC')
André Anjos's avatar
André Anjos committed
357

358 359 360 361 362 363 364 365 366 367 368

CMC
===

The Cumulative Match Characteristics (CMC) curve estimates the probability that the correct model is in the *N* models with the highest similarity to a given probe.
A CMC curve can be plotted using the :py:func:`bob.measure.plot.cmc` function.
The CMC can be calculated from a relatively complex data structure, which defines a pair of positive and negative scores **per probe**:

.. plot::

   import numpy
369
   numpy.random.seed(42)
370 371 372
   import bob.measure
   from matplotlib import pyplot

373
   cmc_scores = []
374 375 376
   for probe in range(10):
     positives = numpy.random.normal(1, 1, 1)
     negatives = numpy.random.normal(0, 1, 19)
377 378
     cmc_scores.append((negatives, positives))
   bob.measure.plot.cmc(cmc_scores, logx=False)
379 380 381 382 383
   pyplot.title('CMC')
   pyplot.xlabel('Rank')
   pyplot.xticks([1,5,10,20])
   pyplot.xlim([1,20])
   pyplot.ylim([0,100])
384
   pyplot.ylabel('Probability of Recognition (%)')
385 386 387 388 389 390

Usually, there is only a single positive score per probe, but this is not a fixed restriction.

.. note::
   The complex data structure can be read from our default 4 or 5 column score files using the :py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column` function.

391 392 393 394 395

Detection & Identification Curve
================================

The detection & identification curve is designed to evaluate open set identification tasks.
396 397
It can be plotted using the :py:func:`bob.measure.plot.detection_identification_curve` function, but it requires at least one open-set probe, i.e., where no corresponding positive score exists, for which the FAR values are computed.
Here, we plot the detection and identification curve for rank 1, so that the recognition rate for FAR=1 will be identical to the rank one :py:func:`bob.measure.recognition_rate` obtained in the CMC plot above.
398 399 400 401 402 403 404 405 406 407 408 409 410

.. plot::

   import numpy
   numpy.random.seed(42)
   import bob.measure
   from matplotlib import pyplot

   cmc_scores = []
   for probe in range(10):
     positives = numpy.random.normal(1, 1, 1)
     negatives = numpy.random.normal(0, 1, 19)
     cmc_scores.append((negatives, positives))
411 412 413 414 415 416
   for probe in range(10):
     negatives = numpy.random.normal(-1, 1, 10)
     cmc_scores.append((negatives, None))

   bob.measure.plot.detection_identification_curve(cmc_scores, rank=1, logx=True)
   pyplot.xlabel('False Alarm Rate')
417 418 419 420 421
   pyplot.ylabel('Detection & Identification Rate (%)')
   pyplot.ylim([0,100])



422 423
Fine-tunning
============
André Anjos's avatar
André Anjos committed
424

André Anjos's avatar
André Anjos committed
425 426
The methods inside :py:mod:`bob.measure.plot` are only provided as a
`Matplotlib`_ wrapper to equivalent methods in :py:mod:`bob.measure` that can
427 428
only calculate the points without doing any plotting. You may prefer to tweak
the plotting or even use a different plotting system such as gnuplot. Have a
André Anjos's avatar
André Anjos committed
429
look at the implementations at :py:mod:`bob.measure.plot` to understand how
430 431 432 433 434 435 436 437 438
to use the |project| methods to compute the curves and interlace that in the
way that best suits you.

Full applications
-----------------

We do provide a few scripts that can be used to quickly evaluate a set of
scores. We present these scripts in this section. The scripts take as input
either a 4-column or 5-column data format as specified in the documentation of
439 440
:py:func:`bob.measure.load.four_column` or
:py:func:`bob.measure.load.five_column`.
441 442 443 444 445 446

To calculate the threshold using a certain criterion (EER, min.HTER or weighted
Error Rate) on a set, after setting up |project|, just do:

.. code-block:: sh

André Anjos's avatar
André Anjos committed
447
  $ bob_eval_threshold.py --scores=development-scores-4col.txt
448 449 450 451 452 453 454 455 456 457
  Threshold: -0.004787956164
  FAR : 6.731% (35/520)
  FRR : 6.667% (26/390)
  HTER: 6.699%

The output will present the threshold together with the FAR, FRR and HTER on
the given set, calculated using such a threshold. The relative counts of FAs
and FRs are also displayed between parenthesis.

To evaluate the performance of a new score file with a given threshold, use the
André Anjos's avatar
André Anjos committed
458
application ``bob_apply_threshold.py``:
459 460 461

.. code-block:: sh

André Anjos's avatar
André Anjos committed
462
  $ bob_apply_threshold.py --scores=test-scores-4col.txt --threshold=-0.0047879
463 464 465 466 467 468
  FAR : 2.115% (11/520)
  FRR : 7.179% (28/390)
  HTER: 4.647%

In this case, only the error figures are presented. You can conduct the
evaluation and plotting of development and test set data using our combined
André Anjos's avatar
André Anjos committed
469
``bob_compute_perf.py`` script. You pass both sets and it does the rest:
470 471 472

.. code-block:: sh

André Anjos's avatar
André Anjos committed
473
  $ bob_compute_perf.py --devel=development-scores-4col.txt --test=test-scores-4col.txt
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495
  [Min. criterium: EER] Threshold on Development set: -4.787956e-03
         | Development     | Test
  -------+-----------------+------------------
    FAR  | 6.731% (35/520) | 2.500% (13/520)
    FRR  | 6.667% (26/390) | 6.154% (24/390)
    HTER | 6.699%          | 4.327%
  [Min. criterium: Min. HTER] Threshold on Development set: 3.411070e-03
         | Development     | Test
  -------+-----------------+------------------
    FAR  | 4.231% (22/520) | 1.923% (10/520)
    FRR  | 7.949% (31/390) | 7.692% (30/390)
    HTER | 6.090%          | 4.808%
  [Plots] Performance curves => 'curves.pdf'

Inside that script we evaluate 2 different thresholds based on the EER and the
minimum HTER on the development set and apply the output to the test set. As
can be seen from the toy-example above, the system generalizes reasonably well.
A single PDF file is generated containing an EPC as well as ROC and DET plots of such a
system.

Use the ``--help`` option on the above-cited scripts to find-out about more
options.
André Anjos's avatar
André Anjos committed
496

497 498 499 500 501 502 503

Score file conversion
---------------------

Sometimes, it is required to export the score files generated by Bob to a different format, e.g., to be able to generate a plot comparing Bob's systems with other systems.
In this package, we provide source code to convert between different types of score files.

504 505 506 507
Bob to OpenBR
=============

One of the supported formats is the matrix format that the National Institute of Standards and Technology (NIST) uses, and which is supported by OpenBR_.
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543
The scores are stored in two binary matrices, where the first matrix (usually with a ``.mtx`` filename extension) contains the raw scores, while a second mask matrix (extension ``.mask``) contains information, which scores are positives, and which are negatives.

To convert from Bob's four column or five column score file to a pair of these matrices, you can use the :py:func:`bob.measure.openbr.write_matrix` function.
In the simplest way, this function takes a score file ``'five-column-sore-file'`` and writes the pair ``'openbr.mtx', 'openbr.mask'`` of OpenBR compatible files:

.. code-block:: py

   >>> bob.measure.openbr.write_matrix('five-column-sore-file', 'openbr.mtx', 'openbr.mask', score_file_format = '5column')

In this way, the score file will be parsed and the matrices will be written in the same order that is obtained from the score file.

For most of the applications, this should be sufficient, but as the identity information is lost in the matrix files, no deeper analysis is possible anymore when just using the matrices.
To enforce an order of the models and probes inside the matrices, you can use the ``model_names`` and ``probe_names`` parameters of :py:func:`bob.measure.openbr.write_matrix`:

* The ``probe_names`` parameter lists the ``path`` elements stored in the score files, which are the fourth column in a ``5column`` file, and the third column in a ``4column`` file, see :py:func:`bob.measure.load.five_column` and :py:func:`bob.measure.load.four_column`.

* The ``model_names`` parameter is a bit more complicated.
  In a ``5column`` format score file, the model names are defined by the second column of that file, see :py:func:`bob.measure.load.five_column`.
  In a ``4column`` format score file, the model information is not contained, but only the client information of the model.
  Hence, for the ``4column`` format, the ``model_names`` actually lists the client ids found in the first column, see :py:func:`bob.measure.load.four_column`.

  .. warning::
     The model information is lost, but required to write the matrix files.
     In the ``4column`` format, we use client ids instead of the model information.
     Hence, when several models exist per client, this function will not work as expected.

Additionally, there are fields in the matrix files, which define the gallery and probe list files that were used to generate the matrix.
These file names can be selected with the ``gallery_file_name`` and ``probe_file_name`` keyword parameters of :py:func:`bob.measure.openbr.write_matrix`.

Finally, OpenBR defines a specific ``'search'`` score file format, which is designed to be used to compute CMC curves.
The score matrix contains descendingly sorted and possibly truncated list of scores, i.e., for each probe, a sorted list of all scores for the models is generated.
To generate these special score file format, you can specify the ``search`` parameter.
It specifies the number of highest scores per probe that should be kept.
If the ``search`` parameter is set to a negative value, all scores will be kept.
If the ``search`` parameter is higher as the actual number of models, ``NaN`` scores will be appended, and the according mask values will be set to ``0`` (i.e., to be ignored).

544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569

OpenBR to Bob
=============

On the other hand, you might also want to generate a Bob-compatible (four or five column) score file based on a pair of OpenBR matrix and mask files.
This is possible by using the :py:func:`bob.measure.openbr.write_score_file` function.
At the basic, it takes the given pair of matrix and mask files, as well as the desired output score file:

.. code-block:: py

   >>> bob.measure.openbr.write_score_file('openbr.mtx', 'openbr.mask', 'four-column-sore-file')

This score file is sufficient to compute a CMC curve (see `CMC`_), however it does not contain relevant client ids or paths for models and probes.
Particularly, it assumes that each client has exactly one associated model.

To add/correct these information, you can use additional parameters to :py:func:`bob.measure.openbr.write_score_file`.
Client ids of models and probes can be added using the ``models_ids`` and ``probes_ids`` keyword arguments.
The length of these lists must be identical to the number of models and probes as given in the matrix files, **and they must be in the same order as used to compute the OpenBR matrix**.
This includes that the same same-client and different-client pairs as indicated by the OpenBR mask will be generated, which will be checked inside the function.

To add model and probe path information, the ``model_names`` and ``probe_names`` parameters, which need to have the same size and order as the ``models_ids`` and ``probes_ids``.
These information are simply stored in the score file, and no further check is applied.

.. note:: The ``model_names`` parameter is used only when writing score files in ``score_file_format='5column'``, in the ``'4column'`` format, this parameter is ignored.


André Anjos's avatar
André Anjos committed
570 571
.. include:: links.rst

572
.. Place youre references here:
André Anjos's avatar
André Anjos committed
573

574 575
.. _`The Expected Performance Curve`: http://publications.idiap.ch/downloads/reports/2005/bengio_2005_icml.pdf
.. _`The DET curve in assessment of detection task performance`: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.4489&rep=rep1&type=pdf
576
.. _openbr: http://openbiometrics.org