From 647700e5ea4f95052aab580d6e5ac99b4bcfe6bf Mon Sep 17 00:00:00 2001
From: Theophile GENTILHOMME
Date: Fri, 29 Jun 2018 10:33:20 +0200
Subject: [PATCH] [doc][guide] Update docuemtation

doc/guide.rst  134 +++++++++++++++++++++++
1 file changed, 61 insertions(+), 73 deletions()
diff git a/doc/guide.rst b/doc/guide.rst
index 5df02a0..2ac48b8 100644
 a/doc/guide.rst
+++ b/doc/guide.rst
@@ 35,26 +35,26 @@ Overview

A classifier is subject to two types of errors, either the real access/signal
is rejected (false rejection) or an impostor attack/a false access is accepted
(false acceptance). A possible way to measure the detection performance is to
use the Half Total Error Rate (HTER), which combines the False Rejection Rate
(FRR) and the False Acceptance Rate (FAR) and is defined in the following
+is rejected (false negative) or an impostor attack/a false access is accepted
+(false positive). A possible way to measure the detection performance is to
+use the Half Total Error Rate (HTER), which combines the False Negative Rate
+(FNR) and the False Positive Rate (FPR) and is defined in the following
formula:
.. math::
 HTER(\tau, \mathcal{D}) = \frac{FAR(\tau, \mathcal{D}) + FRR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}
+ HTER(\tau, \mathcal{D}) = \frac{FPR(\tau, \mathcal{D}) + FNR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}
where :math:`\mathcal{D}` denotes the dataset used. Since both the FAR and the
FRR depends on the threshold :math:`\tau`, they are strongly related to each
other: increasing the FAR will reduce the FRR and viceversa. For this reason,
+where :math:`\mathcal{D}` denotes the dataset used. Since both the FPR and the
+FNR depends on the threshold :math:`\tau`, they are strongly related to each
+other: increasing the FPR will reduce the FNR and viceversa. For this reason,
results are often presented using either a Receiver Operating Characteristic
(ROC) or a DetectionError Tradeoff (DET) plot, these two plots basically
present the FAR versus the FRR for different values of the threshold. Another
+present the FPR versus the FNR for different values of the threshold. Another
widely used measure to summarise the performance of a system is the Equal Error
Rate (EER), defined as the point along the ROC or DET curve where the FAR
equals the FRR.
+Rate (EER), defined as the point along the ROC or DET curve where the FPR
+equals the FNR.
However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be
misleading when comparing systems. Hence, the socalled Expected Performance
@@ 63,13 +63,13 @@ performance of a system at various operating points. Indeed, in realworld
scenarios, the threshold :math:`\tau` has to be set a priori: this is typically
done using a development set (also called crossvalidation set). Nevertheless,
the optimal threshold can be different depending on the relative importance
given to the FAR and the FRR. Hence, in the EPC framework, the cost
:math:`\beta \in [0;1]` is defined as the tradeoff between the FAR and FRR.
+given to the FPR and the FNR. Hence, in the EPC framework, the cost
+:math:`\beta \in [0;1]` is defined as the tradeoff between the FPR and FNR.
The optimal threshold :math:`\tau^*` is then computed using different values of
:math:`\beta`, corresponding to different operating points:
.. math::
 \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FAR}(\tau, \mathcal{D}_{d}) + (1\beta) \cdot \textrm{FRR}(\tau, \mathcal{D}_{d})
+ \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FPR}(\tau, \mathcal{D}_{d}) + (1\beta) \cdot \textrm{FNR}(\tau, \mathcal{D}_{d})
where :math:`\mathcal{D}_{d}` denotes the development set and should be
@@ 122,15 +122,15 @@ the following techniques:
>>> # negatives, positives = parse_my_scores(...) # write parser if not provided!
>>> T = 0.0 #Threshold: later we explain how one can calculate these
>>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T)
 >>> FAR = 1  (float(correct_negatives.sum())/negatives.size)
+ >>> FPR = 1  (float(correct_negatives.sum())/negatives.size)
>>> correct_positives = bob.measure.correctly_classified_positives(positives, T)
 >>> FRR = 1  (float(correct_positives.sum())/positives.size)
+ >>> FNR = 1  (float(correct_positives.sum())/positives.size)
We do provide a method to calculate the FAR and FRR in a single shot:
+We do provide a method to calculate the FPR and FNR in a single shot:
.. doctest::
 >>> FAR, FRR = bob.measure.farfrr(negatives, positives, T)
+ >>> FPR, FNR = bob.measure.farfrr(negatives, positives, T)
The threshold ``T`` is normally calculated by looking at the distribution of
negatives and positives in a development (or validation) set, selecting a
@@ 170,12 +170,12 @@ calculation of the threshold:
calculating the threshold based on the provided scores. Instead, the closest
possible threshold is returned. For example, using
:any:`bob.measure.eer_threshold` **will not** give you a threshold where
 :math:`FAR == FRR`. Hence, you cannot report :math:`FAR` or :math:`FRR`
 instead of :math:`EER`; you should report :math:`(FAR+FRR)/2` instead. This
+ :math:`FPR == FNR`. Hence, you cannot report :math:`FPR` or :math:`FNR`
+ instead of :math:`EER`; you should report :math:`(FPR+FNR)/2` instead. This
is also true for :any:`bob.measure.far_threshold` and
:any:`bob.measure.frr_threshold`. The threshold returned by those functions
does not guarantee that using that threshold you will get the requested
 :math:`FAR` or :math:`FRR` value. Instead, you should recalculate using
+ :math:`FPR` or :math:`FNR` value. Instead, you should recalculate using
:any:`bob.measure.farfrr`.
.. note::
@@ 280,8 +280,8 @@ town. To plot an ROC curve, in possession of your **negatives** and
>>> # we assume you have your negatives and positives already split
>>> npoints = 100
>>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='', label='test') # doctest: +SKIP
 >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP
 >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP
+ >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP
+ >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP
>>> pyplot.grid(True)
>>> pyplot.show() # doctest: +SKIP
@@ 299,8 +299,8 @@ You should see an image like the following one:
npoints = 100
bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='', label='test')
pyplot.grid(True)
 pyplot.xlabel('FAR (%)')
 pyplot.ylabel('FRR (%)')
+ pyplot.xlabel('FPR (%)')
+ pyplot.ylabel('FNR (%)')
pyplot.title('ROC')
As can be observed, plotting methods live in the namespace
@@ 329,8 +329,8 @@ A DET curve can be drawn using similar commands such as the ones for the ROC cur
>>> npoints = 100
>>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='', label='test') # doctest: +SKIP
>>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP
 >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP
 >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP
+ >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP
+ >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP
>>> pyplot.grid(True)
>>> pyplot.show() # doctest: +SKIP
@@ 350,8 +350,8 @@ This will produce an image like the following one:
bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='', label='test')
bob.measure.plot.det_axis([0.1, 80, 0.1, 80])
pyplot.grid(True)
 pyplot.xlabel('FAR (%)')
 pyplot.ylabel('FRR (%)')
+ pyplot.xlabel('FPR (%)')
+ pyplot.ylabel('FNR (%)')
pyplot.title('DET')
.. note::
@@ 444,9 +444,9 @@ The detection & identification curve is designed to evaluate open set
identification tasks. It can be plotted using the
:py:func:`bob.measure.plot.detection_identification_curve` function, but it
requires at least one openset probe, i.e., where no corresponding positive
score exists, for which the FAR values are computed. Here, we plot the
+score exists, for which the FPR values are computed. Here, we plot the
detection and identification curve for rank 1, so that the recognition rate for
FAR=1 will be identical to the rank one :py:func:`bob.measure.recognition_rate`
+FPR=1 will be identical to the rank one :py:func:`bob.measure.recognition_rate`
obtained in the CMC plot above.
.. plot::
@@ 498,24 +498,26 @@ Metrics
=======
To calculate the threshold using a certain criterion (EER (default) or min.HTER)
on a set, after setting up project, just do:
+on a development set and conduct the threshold computation and its performance
+on an evaluation set, after setting up project, just do:
.. codeblock:: sh
 $ bob measure metrics dev1.txt
 [Min. criterion: EER] Threshold on Development set `dev1.txt`: 8.025286e03
 ==== ===================
 .. Development dev1
 ==== ===================
 FtA 0.000%
 FMR 6.263% (31/495)
 FNMR 6.208% (28/451)
 FAR 5.924%
 FRR 11.273%
 HTER 8.599%
 ==== ===================

The output will present the threshold together with the FtA, FMR, FMNR, FAR, FRR and
+ ./bin/bob measure metrics ./MTest1/scores{dev,eval} e
+ [Min. criterion: EER ] Threshold on Development set `./MTest1/scoresdev`: 1.373550e02
+ bob.measure@20180629 10:20:14,177  ERROR: NaNs scores (1.0%) were found in ./MTest1/scoresdev
+ bob.measure@20180629 10:20:14,177  ERROR: NaNs scores (1.0%) were found in ./MTest1/scoreseval
+ =================== ================ ================
+ .. Development Evaluation
+ =================== ================ ================
+ False Positive Rate 15.5% (767/4942) 15.5% (767/4942)
+ False Negative Rate 15.5% (769/4954) 15.5% (769/4954)
+ Precision 0.8 0.8
+ Recall 0.8 0.8
+ F1score 0.8 0.8
+ =================== ================ ================
+
+The output will present the threshold together with the FPR, FNR, Precision, Recall, F1score and
HTER on the given set, calculated using such a threshold. The relative counts of FAs
and FRs are also displayed between parenthesis.
@@ 531,37 +533,23 @@ To evaluate the performance of a new score file with a given threshold, use
.. codeblock:: sh
 $ bob measure metrics thres 0.006 eval1.txt
 [Min. criterion: user provider] Threshold on Development set `eval1`: 6.000000e03
 ==== ====================
 .. Development eval1
 ==== ====================
 FtA 0.000%
 FMR 5.010% (24/479)
 FNMR 6.977% (33/473)
 FAR 4.770%
 FRR 11.442%
 HTER 8.106%
 ==== ====================
+ ./bin/bob measure metrics ./MTest1/scoreseval thres 0.006
+ [Min. criterion: user provided] Threshold on Development set `./MTest1/scoreseval`: 6.000000e03
+ bob.measure@20180629 10:22:06,852  ERROR: NaNs scores (1.0%) were found in ./MTest1/scoreseval
+ =================== ================
+ .. Development
+ =================== ================
+ False Positive Rate 15.2% (751/4942)
+ False Negative Rate 16.1% (796/4954)
+ Precision 0.8
+ Recall 0.8
+ F1score 0.8
+ =================== ================
+
You can simultaneously conduct the threshold computation and its performance
on an evaluation set:
.. codeblock:: sh

 $ bob measure metrics e dev1.txt eval1.txt
 [Min. criterion: EER] Threshold on Development set `dev1`: 8.025286e03
 ==== =================== ===============
 .. Development dev1 Eval. eval1
 ==== =================== ===============
 FtA 0.000% 0.000%
 FMR 6.263% (31/495) 5.637% (27/479)
 FNMR 6.208% (28/451) 6.131% (29/473)
 FAR 5.924% 5.366%
 FRR 11.273% 10.637%
 HTER 8.599% 8.001%
 ==== =================== ===============

.. note::
Table format can be changed using ``tablefmt`` option, the default format
being ``rst``. Please refer to ``bob measure metrics help`` for more details.

2.21.0