Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
bob.measure
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
2
Issues
2
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
bob
bob.measure
Commits
647700e5
Commit
647700e5
authored
Jun 29, 2018
by
Theophile GENTILHOMME
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[doc][guide] Update docuemtation
parent
37312171
Pipeline
#21467
passed with stage
in 10 minutes
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
61 additions
and
73 deletions
+61
-73
doc/guide.rst
doc/guide.rst
+61
-73
No files found.
doc/guide.rst
View file @
647700e5
...
@@ -35,26 +35,26 @@ Overview
...
@@ -35,26 +35,26 @@ Overview
--------
--------
A classifier is subject to two types of errors, either the real access/signal
A classifier is subject to two types of errors, either the real access/signal
is rejected (false
rejection
) or an impostor attack/a false access is accepted
is rejected (false
negative
) or an impostor attack/a false access is accepted
(false
acceptanc
e). A possible way to measure the detection performance is to
(false
positiv
e). A possible way to measure the detection performance is to
use the Half Total Error Rate (HTER), which combines the False
Rejection
Rate
use the Half Total Error Rate (HTER), which combines the False
Negative
Rate
(F
RR) and the False Acceptance Rate (FA
R) and is defined in the following
(F
NR) and the False Positive Rate (FP
R) and is defined in the following
formula:
formula:
.. math::
.. math::
HTER(\tau, \mathcal{D}) = \frac{F
AR(\tau, \mathcal{D}) + FR
R(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}
HTER(\tau, \mathcal{D}) = \frac{F
PR(\tau, \mathcal{D}) + FN
R(\tau, \mathcal{D})}{2} \quad \textrm{[\%]}
where :math:`\mathcal{D}` denotes the dataset used. Since both the F
A
R and the
where :math:`\mathcal{D}` denotes the dataset used. Since both the F
P
R and the
F
R
R depends on the threshold :math:`\tau`, they are strongly related to each
F
N
R depends on the threshold :math:`\tau`, they are strongly related to each
other: increasing the F
AR will reduce the FR
R and vice-versa. For this reason,
other: increasing the F
PR will reduce the FN
R and vice-versa. For this reason,
results are often presented using either a Receiver Operating Characteristic
results are often presented using either a Receiver Operating Characteristic
(ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically
(ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically
present the F
AR versus the FR
R for different values of the threshold. Another
present the F
PR versus the FN
R for different values of the threshold. Another
widely used measure to summarise the performance of a system is the Equal Error
widely used measure to summarise the performance of a system is the Equal Error
Rate (EER), defined as the point along the ROC or DET curve where the F
A
R
Rate (EER), defined as the point along the ROC or DET curve where the F
P
R
equals the F
R
R.
equals the F
N
R.
However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be
However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be
misleading when comparing systems. Hence, the so-called Expected Performance
misleading when comparing systems. Hence, the so-called Expected Performance
...
@@ -63,13 +63,13 @@ performance of a system at various operating points. Indeed, in real-world
...
@@ -63,13 +63,13 @@ performance of a system at various operating points. Indeed, in real-world
scenarios, the threshold :math:`\tau` has to be set a priori: this is typically
scenarios, the threshold :math:`\tau` has to be set a priori: this is typically
done using a development set (also called cross-validation set). Nevertheless,
done using a development set (also called cross-validation set). Nevertheless,
the optimal threshold can be different depending on the relative importance
the optimal threshold can be different depending on the relative importance
given to the F
AR and the FR
R. Hence, in the EPC framework, the cost
given to the F
PR and the FN
R. Hence, in the EPC framework, the cost
:math:`\beta \in [0;1]` is defined as the trade-off between the F
AR and FR
R.
:math:`\beta \in [0;1]` is defined as the trade-off between the F
PR and FN
R.
The optimal threshold :math:`\tau^*` is then computed using different values of
The optimal threshold :math:`\tau^*` is then computed using different values of
:math:`\beta`, corresponding to different operating points:
:math:`\beta`, corresponding to different operating points:
.. math::
.. math::
\tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{F
AR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FR
R}(\tau, \mathcal{D}_{d})
\tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{F
PR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FN
R}(\tau, \mathcal{D}_{d})
where :math:`\mathcal{D}_{d}` denotes the development set and should be
where :math:`\mathcal{D}_{d}` denotes the development set and should be
...
@@ -122,15 +122,15 @@ the following techniques:
...
@@ -122,15 +122,15 @@ the following techniques:
>>> # negatives, positives = parse_my_scores(...) # write parser if not provided!
>>> # negatives, positives = parse_my_scores(...) # write parser if not provided!
>>> T = 0.0 #Threshold: later we explain how one can calculate these
>>> T = 0.0 #Threshold: later we explain how one can calculate these
>>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T)
>>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T)
>>> F
A
R = 1 - (float(correct_negatives.sum())/negatives.size)
>>> F
P
R = 1 - (float(correct_negatives.sum())/negatives.size)
>>> correct_positives = bob.measure.correctly_classified_positives(positives, T)
>>> correct_positives = bob.measure.correctly_classified_positives(positives, T)
>>> F
R
R = 1 - (float(correct_positives.sum())/positives.size)
>>> F
N
R = 1 - (float(correct_positives.sum())/positives.size)
We do provide a method to calculate the F
AR and FR
R in a single shot:
We do provide a method to calculate the F
PR and FN
R in a single shot:
.. doctest::
.. doctest::
>>> F
AR, FR
R = bob.measure.farfrr(negatives, positives, T)
>>> F
PR, FN
R = bob.measure.farfrr(negatives, positives, T)
The threshold ``T`` is normally calculated by looking at the distribution of
The threshold ``T`` is normally calculated by looking at the distribution of
negatives and positives in a development (or validation) set, selecting a
negatives and positives in a development (or validation) set, selecting a
...
@@ -170,12 +170,12 @@ calculation of the threshold:
...
@@ -170,12 +170,12 @@ calculation of the threshold:
calculating the threshold based on the provided scores. Instead, the closest
calculating the threshold based on the provided scores. Instead, the closest
possible threshold is returned. For example, using
possible threshold is returned. For example, using
:any:`bob.measure.eer_threshold` **will not** give you a threshold where
:any:`bob.measure.eer_threshold` **will not** give you a threshold where
:math:`F
AR == FRR`. Hence, you cannot report :math:`FAR` or :math:`FR
R`
:math:`F
PR == FNR`. Hence, you cannot report :math:`FPR` or :math:`FN
R`
instead of :math:`EER`; you should report :math:`(F
AR+FR
R)/2` instead. This
instead of :math:`EER`; you should report :math:`(F
PR+FN
R)/2` instead. This
is also true for :any:`bob.measure.far_threshold` and
is also true for :any:`bob.measure.far_threshold` and
:any:`bob.measure.frr_threshold`. The threshold returned by those functions
:any:`bob.measure.frr_threshold`. The threshold returned by those functions
does not guarantee that using that threshold you will get the requested
does not guarantee that using that threshold you will get the requested
:math:`F
AR` or :math:`FR
R` value. Instead, you should recalculate using
:math:`F
PR` or :math:`FN
R` value. Instead, you should recalculate using
:any:`bob.measure.farfrr`.
:any:`bob.measure.farfrr`.
.. note::
.. note::
...
@@ -280,8 +280,8 @@ town. To plot an ROC curve, in possession of your **negatives** and
...
@@ -280,8 +280,8 @@ town. To plot an ROC curve, in possession of your **negatives** and
>>> # we assume you have your negatives and positives already split
>>> # we assume you have your negatives and positives already split
>>> npoints = 100
>>> npoints = 100
>>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
>>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
>>> pyplot.xlabel('F
A
R (%)') # doctest: +SKIP
>>> pyplot.xlabel('F
P
R (%)') # doctest: +SKIP
>>> pyplot.ylabel('F
R
R (%)') # doctest: +SKIP
>>> pyplot.ylabel('F
N
R (%)') # doctest: +SKIP
>>> pyplot.grid(True)
>>> pyplot.grid(True)
>>> pyplot.show() # doctest: +SKIP
>>> pyplot.show() # doctest: +SKIP
...
@@ -299,8 +299,8 @@ You should see an image like the following one:
...
@@ -299,8 +299,8 @@ You should see an image like the following one:
npoints = 100
npoints = 100
bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
pyplot.grid(True)
pyplot.grid(True)
pyplot.xlabel('F
A
R (%)')
pyplot.xlabel('F
P
R (%)')
pyplot.ylabel('F
R
R (%)')
pyplot.ylabel('F
N
R (%)')
pyplot.title('ROC')
pyplot.title('ROC')
As can be observed, plotting methods live in the namespace
As can be observed, plotting methods live in the namespace
...
@@ -329,8 +329,8 @@ A DET curve can be drawn using similar commands such as the ones for the ROC cur
...
@@ -329,8 +329,8 @@ A DET curve can be drawn using similar commands such as the ones for the ROC cur
>>> npoints = 100
>>> npoints = 100
>>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
>>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP
>>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP
>>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP
>>> pyplot.xlabel('F
A
R (%)') # doctest: +SKIP
>>> pyplot.xlabel('F
P
R (%)') # doctest: +SKIP
>>> pyplot.ylabel('F
R
R (%)') # doctest: +SKIP
>>> pyplot.ylabel('F
N
R (%)') # doctest: +SKIP
>>> pyplot.grid(True)
>>> pyplot.grid(True)
>>> pyplot.show() # doctest: +SKIP
>>> pyplot.show() # doctest: +SKIP
...
@@ -350,8 +350,8 @@ This will produce an image like the following one:
...
@@ -350,8 +350,8 @@ This will produce an image like the following one:
bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test')
bob.measure.plot.det_axis([0.1, 80, 0.1, 80])
bob.measure.plot.det_axis([0.1, 80, 0.1, 80])
pyplot.grid(True)
pyplot.grid(True)
pyplot.xlabel('F
A
R (%)')
pyplot.xlabel('F
P
R (%)')
pyplot.ylabel('F
R
R (%)')
pyplot.ylabel('F
N
R (%)')
pyplot.title('DET')
pyplot.title('DET')
.. note::
.. note::
...
@@ -444,9 +444,9 @@ The detection & identification curve is designed to evaluate open set
...
@@ -444,9 +444,9 @@ The detection & identification curve is designed to evaluate open set
identification tasks. It can be plotted using the
identification tasks. It can be plotted using the
:py:func:`bob.measure.plot.detection_identification_curve` function, but it
:py:func:`bob.measure.plot.detection_identification_curve` function, but it
requires at least one open-set probe, i.e., where no corresponding positive
requires at least one open-set probe, i.e., where no corresponding positive
score exists, for which the F
A
R values are computed. Here, we plot the
score exists, for which the F
P
R values are computed. Here, we plot the
detection and identification curve for rank 1, so that the recognition rate for
detection and identification curve for rank 1, so that the recognition rate for
F
A
R=1 will be identical to the rank one :py:func:`bob.measure.recognition_rate`
F
P
R=1 will be identical to the rank one :py:func:`bob.measure.recognition_rate`
obtained in the CMC plot above.
obtained in the CMC plot above.
.. plot::
.. plot::
...
@@ -498,24 +498,26 @@ Metrics
...
@@ -498,24 +498,26 @@ Metrics
=======
=======
To calculate the threshold using a certain criterion (EER (default) or min.HTER)
To calculate the threshold using a certain criterion (EER (default) or min.HTER)
on a set, after setting up |project|, just do:
on a development set and conduct the threshold computation and its performance
on an evaluation set, after setting up |project|, just do:
.. code-block:: sh
.. code-block:: sh
$ bob measure metrics dev-1.txt
./bin/bob measure metrics ./MTest1/scores-{dev,eval} -e
[Min. criterion: EER] Threshold on Development set `dev-1.txt`: -8.025286e-03
[Min. criterion: EER ] Threshold on Development set `./MTest1/scores-dev`: -1.373550e-02
==== ===================
bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-dev
.. Development dev-1
bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval
==== ===================
=================== ================ ================
FtA 0.000%
.. Development Evaluation
FMR 6.263% (31/495)
=================== ================ ================
FNMR 6.208% (28/451)
False Positive Rate 15.5% (767/4942) 15.5% (767/4942)
FAR 5.924%
False Negative Rate 15.5% (769/4954) 15.5% (769/4954)
FRR 11.273%
Precision 0.8 0.8
HTER 8.599%
Recall 0.8 0.8
==== ===================
F1-score 0.8 0.8
=================== ================ ================
The output will present the threshold together with the FtA, FMR, FMNR, FAR, FRR and
The output will present the threshold together with the FPR, FNR, Precision, Recall, F1-score and
HTER on the given set, calculated using such a threshold. The relative counts of FAs
HTER on the given set, calculated using such a threshold. The relative counts of FAs
and FRs are also displayed between parenthesis.
and FRs are also displayed between parenthesis.
...
@@ -531,37 +533,23 @@ To evaluate the performance of a new score file with a given threshold, use
...
@@ -531,37 +533,23 @@ To evaluate the performance of a new score file with a given threshold, use
.. code-block:: sh
.. code-block:: sh
$ bob measure metrics --thres 0.006 eval-1.txt
./bin/bob measure metrics ./MTest1/scores-eval --thres 0.006
[Min. criterion: user provider] Threshold on Development set `eval-1`: 6.000000e-03
[Min. criterion: user provided] Threshold on Development set `./MTest1/scores-eval`: 6.000000e-03
==== ====================
bob.measure@2018-06-29 10:22:06,852 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval
.. Development eval-1
=================== ================
==== ====================
.. Development
FtA 0.000%
=================== ================
FMR 5.010% (24/479)
False Positive Rate 15.2% (751/4942)
FNMR 6.977% (33/473)
False Negative Rate 16.1% (796/4954)
FAR 4.770%
Precision 0.8
FRR 11.442%
Recall 0.8
HTER 8.106%
F1-score 0.8
==== ====================
=================== ================
You can simultaneously conduct the threshold computation and its performance
You can simultaneously conduct the threshold computation and its performance
on an evaluation set:
on an evaluation set:
.. code-block:: sh
$ bob measure metrics -e dev-1.txt eval-1.txt
[Min. criterion: EER] Threshold on Development set `dev-1`: -8.025286e-03
==== =================== ===============
.. Development dev-1 Eval. eval-1
==== =================== ===============
FtA 0.000% 0.000%
FMR 6.263% (31/495) 5.637% (27/479)
FNMR 6.208% (28/451) 6.131% (29/473)
FAR 5.924% 5.366%
FRR 11.273% 10.637%
HTER 8.599% 8.001%
==== =================== ===============
.. note::
.. note::
Table format can be changed using ``--tablefmt`` option, the default format
Table format can be changed using ``--tablefmt`` option, the default format
being ``rst``. Please refer to ``bob measure metrics --help`` for more details.
being ``rst``. Please refer to ``bob measure metrics --help`` for more details.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment