Commit 647700e5 authored by Theophile GENTILHOMME's avatar Theophile GENTILHOMME

[doc][guide] Update docuemtation

parent 37312171
Pipeline #21467 passed with stage
in 10 minutes
...@@ -35,26 +35,26 @@ Overview ...@@ -35,26 +35,26 @@ Overview
-------- --------
A classifier is subject to two types of errors, either the real access/signal A classifier is subject to two types of errors, either the real access/signal
is rejected (false rejection) or an impostor attack/a false access is accepted is rejected (false negative) or an impostor attack/a false access is accepted
(false acceptance). A possible way to measure the detection performance is to (false positive). A possible way to measure the detection performance is to
use the Half Total Error Rate (HTER), which combines the False Rejection Rate use the Half Total Error Rate (HTER), which combines the False Negative Rate
(FRR) and the False Acceptance Rate (FAR) and is defined in the following (FNR) and the False Positive Rate (FPR) and is defined in the following
formula: formula:
.. math:: .. math::
HTER(\tau, \mathcal{D}) = \frac{FAR(\tau, \mathcal{D}) + FRR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]} HTER(\tau, \mathcal{D}) = \frac{FPR(\tau, \mathcal{D}) + FNR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}
where :math:`\mathcal{D}` denotes the dataset used. Since both the FAR and the where :math:`\mathcal{D}` denotes the dataset used. Since both the FPR and the
FRR depends on the threshold :math:`\tau`, they are strongly related to each FNR depends on the threshold :math:`\tau`, they are strongly related to each
other: increasing the FAR will reduce the FRR and vice-versa. For this reason, other: increasing the FPR will reduce the FNR and vice-versa. For this reason,
results are often presented using either a Receiver Operating Characteristic results are often presented using either a Receiver Operating Characteristic
(ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically (ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically
present the FAR versus the FRR for different values of the threshold. Another present the FPR versus the FNR for different values of the threshold. Another
widely used measure to summarise the performance of a system is the Equal Error widely used measure to summarise the performance of a system is the Equal Error
Rate (EER), defined as the point along the ROC or DET curve where the FAR Rate (EER), defined as the point along the ROC or DET curve where the FPR
equals the FRR. equals the FNR.
However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be
misleading when comparing systems. Hence, the so-called Expected Performance misleading when comparing systems. Hence, the so-called Expected Performance
...@@ -63,13 +63,13 @@ performance of a system at various operating points. Indeed, in real-world ...@@ -63,13 +63,13 @@ performance of a system at various operating points. Indeed, in real-world
scenarios, the threshold :math:`\tau` has to be set a priori: this is typically scenarios, the threshold :math:`\tau` has to be set a priori: this is typically
done using a development set (also called cross-validation set). Nevertheless, done using a development set (also called cross-validation set). Nevertheless,
the optimal threshold can be different depending on the relative importance the optimal threshold can be different depending on the relative importance
given to the FAR and the FRR. Hence, in the EPC framework, the cost given to the FPR and the FNR. Hence, in the EPC framework, the cost
:math:`\beta \in [0;1]` is defined as the trade-off between the FAR and FRR. :math:`\beta \in [0;1]` is defined as the trade-off between the FPR and FNR.
The optimal threshold :math:`\tau^*` is then computed using different values of The optimal threshold :math:`\tau^*` is then computed using different values of
:math:`\beta`, corresponding to different operating points: :math:`\beta`, corresponding to different operating points:
.. math:: .. math::
\tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FAR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FRR}(\tau, \mathcal{D}_{d}) \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FPR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FNR}(\tau, \mathcal{D}_{d})
where :math:`\mathcal{D}_{d}` denotes the development set and should be where :math:`\mathcal{D}_{d}` denotes the development set and should be
...@@ -122,15 +122,15 @@ the following techniques: ...@@ -122,15 +122,15 @@ the following techniques:
>>> # negatives, positives = parse_my_scores(...) # write parser if not provided! >>> # negatives, positives = parse_my_scores(...) # write parser if not provided!
>>> T = 0.0 #Threshold: later we explain how one can calculate these >>> T = 0.0 #Threshold: later we explain how one can calculate these
>>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T) >>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T)
>>> FAR = 1 - (float(correct_negatives.sum())/negatives.size) >>> FPR = 1 - (float(correct_negatives.sum())/negatives.size)
>>> correct_positives = bob.measure.correctly_classified_positives(positives, T) >>> correct_positives = bob.measure.correctly_classified_positives(positives, T)
>>> FRR = 1 - (float(correct_positives.sum())/positives.size) >>> FNR = 1 - (float(correct_positives.sum())/positives.size)
We do provide a method to calculate the FAR and FRR in a single shot: We do provide a method to calculate the FPR and FNR in a single shot:
.. doctest:: .. doctest::
>>> FAR, FRR = bob.measure.farfrr(negatives, positives, T) >>> FPR, FNR = bob.measure.farfrr(negatives, positives, T)
The threshold ``T`` is normally calculated by looking at the distribution of The threshold ``T`` is normally calculated by looking at the distribution of
negatives and positives in a development (or validation) set, selecting a negatives and positives in a development (or validation) set, selecting a
...@@ -170,12 +170,12 @@ calculation of the threshold: ...@@ -170,12 +170,12 @@ calculation of the threshold:
calculating the threshold based on the provided scores. Instead, the closest calculating the threshold based on the provided scores. Instead, the closest
possible threshold is returned. For example, using possible threshold is returned. For example, using
:any:`bob.measure.eer_threshold` **will not** give you a threshold where :any:`bob.measure.eer_threshold` **will not** give you a threshold where
:math:`FAR == FRR`. Hence, you cannot report :math:`FAR` or :math:`FRR` :math:`FPR == FNR`. Hence, you cannot report :math:`FPR` or :math:`FNR`
instead of :math:`EER`; you should report :math:`(FAR+FRR)/2` instead. This instead of :math:`EER`; you should report :math:`(FPR+FNR)/2` instead. This
is also true for :any:`bob.measure.far_threshold` and is also true for :any:`bob.measure.far_threshold` and
:any:`bob.measure.frr_threshold`. The threshold returned by those functions :any:`bob.measure.frr_threshold`. The threshold returned by those functions
does not guarantee that using that threshold you will get the requested does not guarantee that using that threshold you will get the requested
:math:`FAR` or :math:`FRR` value. Instead, you should recalculate using :math:`FPR` or :math:`FNR` value. Instead, you should recalculate using
:any:`bob.measure.farfrr`. :any:`bob.measure.farfrr`.
.. note:: .. note::
...@@ -280,8 +280,8 @@ town. To plot an ROC curve, in possession of your **negatives** and ...@@ -280,8 +280,8 @@ town. To plot an ROC curve, in possession of your **negatives** and
>>> # we assume you have your negatives and positives already split >>> # we assume you have your negatives and positives already split
>>> npoints = 100 >>> npoints = 100
>>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
>>> pyplot.xlabel('FAR (%)') # doctest: +SKIP >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP
>>> pyplot.ylabel('FRR (%)') # doctest: +SKIP >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP
>>> pyplot.grid(True) >>> pyplot.grid(True)
>>> pyplot.show() # doctest: +SKIP >>> pyplot.show() # doctest: +SKIP
...@@ -299,8 +299,8 @@ You should see an image like the following one: ...@@ -299,8 +299,8 @@ You should see an image like the following one:
npoints = 100 npoints = 100
bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
pyplot.grid(True) pyplot.grid(True)
pyplot.xlabel('FAR (%)') pyplot.xlabel('FPR (%)')
pyplot.ylabel('FRR (%)') pyplot.ylabel('FNR (%)')
pyplot.title('ROC') pyplot.title('ROC')
As can be observed, plotting methods live in the namespace As can be observed, plotting methods live in the namespace
...@@ -329,8 +329,8 @@ A DET curve can be drawn using similar commands such as the ones for the ROC cur ...@@ -329,8 +329,8 @@ A DET curve can be drawn using similar commands such as the ones for the ROC cur
>>> npoints = 100 >>> npoints = 100
>>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
>>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP >>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP
>>> pyplot.xlabel('FAR (%)') # doctest: +SKIP >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP
>>> pyplot.ylabel('FRR (%)') # doctest: +SKIP >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP
>>> pyplot.grid(True) >>> pyplot.grid(True)
>>> pyplot.show() # doctest: +SKIP >>> pyplot.show() # doctest: +SKIP
...@@ -350,8 +350,8 @@ This will produce an image like the following one: ...@@ -350,8 +350,8 @@ This will produce an image like the following one:
bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
bob.measure.plot.det_axis([0.1, 80, 0.1, 80]) bob.measure.plot.det_axis([0.1, 80, 0.1, 80])
pyplot.grid(True) pyplot.grid(True)
pyplot.xlabel('FAR (%)') pyplot.xlabel('FPR (%)')
pyplot.ylabel('FRR (%)') pyplot.ylabel('FNR (%)')
pyplot.title('DET') pyplot.title('DET')
.. note:: .. note::
...@@ -444,9 +444,9 @@ The detection & identification curve is designed to evaluate open set ...@@ -444,9 +444,9 @@ The detection & identification curve is designed to evaluate open set
identification tasks. It can be plotted using the identification tasks. It can be plotted using the
:py:func:`bob.measure.plot.detection_identification_curve` function, but it :py:func:`bob.measure.plot.detection_identification_curve` function, but it
requires at least one open-set probe, i.e., where no corresponding positive requires at least one open-set probe, i.e., where no corresponding positive
score exists, for which the FAR values are computed. Here, we plot the score exists, for which the FPR values are computed. Here, we plot the
detection and identification curve for rank 1, so that the recognition rate for detection and identification curve for rank 1, so that the recognition rate for
FAR=1 will be identical to the rank one :py:func:`bob.measure.recognition_rate` FPR=1 will be identical to the rank one :py:func:`bob.measure.recognition_rate`
obtained in the CMC plot above. obtained in the CMC plot above.
.. plot:: .. plot::
...@@ -498,24 +498,26 @@ Metrics ...@@ -498,24 +498,26 @@ Metrics
======= =======
To calculate the threshold using a certain criterion (EER (default) or min.HTER) To calculate the threshold using a certain criterion (EER (default) or min.HTER)
on a set, after setting up |project|, just do: on a development set and conduct the threshold computation and its performance
on an evaluation set, after setting up |project|, just do:
.. code-block:: sh .. code-block:: sh
$ bob measure metrics dev-1.txt ./bin/bob measure metrics ./MTest1/scores-{dev,eval} -e
[Min. criterion: EER] Threshold on Development set `dev-1.txt`: -8.025286e-03 [Min. criterion: EER ] Threshold on Development set `./MTest1/scores-dev`: -1.373550e-02
==== =================== bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-dev
.. Development dev-1 bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval
==== =================== =================== ================ ================
FtA 0.000% .. Development Evaluation
FMR 6.263% (31/495) =================== ================ ================
FNMR 6.208% (28/451) False Positive Rate 15.5% (767/4942) 15.5% (767/4942)
FAR 5.924% False Negative Rate 15.5% (769/4954) 15.5% (769/4954)
FRR 11.273% Precision 0.8 0.8
HTER 8.599% Recall 0.8 0.8
==== =================== F1-score 0.8 0.8
=================== ================ ================
The output will present the threshold together with the FtA, FMR, FMNR, FAR, FRR and
The output will present the threshold together with the FPR, FNR, Precision, Recall, F1-score and
HTER on the given set, calculated using such a threshold. The relative counts of FAs HTER on the given set, calculated using such a threshold. The relative counts of FAs
and FRs are also displayed between parenthesis. and FRs are also displayed between parenthesis.
...@@ -531,37 +533,23 @@ To evaluate the performance of a new score file with a given threshold, use ...@@ -531,37 +533,23 @@ To evaluate the performance of a new score file with a given threshold, use
.. code-block:: sh .. code-block:: sh
$ bob measure metrics --thres 0.006 eval-1.txt ./bin/bob measure metrics ./MTest1/scores-eval --thres 0.006
[Min. criterion: user provider] Threshold on Development set `eval-1`: 6.000000e-03 [Min. criterion: user provided] Threshold on Development set `./MTest1/scores-eval`: 6.000000e-03
==== ==================== bob.measure@2018-06-29 10:22:06,852 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval
.. Development eval-1 =================== ================
==== ==================== .. Development
FtA 0.000% =================== ================
FMR 5.010% (24/479) False Positive Rate 15.2% (751/4942)
FNMR 6.977% (33/473) False Negative Rate 16.1% (796/4954)
FAR 4.770% Precision 0.8
FRR 11.442% Recall 0.8
HTER 8.106% F1-score 0.8
==== ==================== =================== ================
You can simultaneously conduct the threshold computation and its performance You can simultaneously conduct the threshold computation and its performance
on an evaluation set: on an evaluation set:
.. code-block:: sh
$ bob measure metrics -e dev-1.txt eval-1.txt
[Min. criterion: EER] Threshold on Development set `dev-1`: -8.025286e-03
==== =================== ===============
.. Development dev-1 Eval. eval-1
==== =================== ===============
FtA 0.000% 0.000%
FMR 6.263% (31/495) 5.637% (27/479)
FNMR 6.208% (28/451) 6.131% (29/473)
FAR 5.924% 5.366%
FRR 11.273% 10.637%
HTER 8.599% 8.001%
==== =================== ===============
.. note:: .. note::
Table format can be changed using ``--tablefmt`` option, the default format Table format can be changed using ``--tablefmt`` option, the default format
being ``rst``. Please refer to ``bob measure metrics --help`` for more details. being ``rst``. Please refer to ``bob measure metrics --help`` for more details.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment