### [doc][guide] Update docuemtation

parent 37312171
Pipeline #21467 passed with stage
in 10 minutes
 ... @@ -35,26 +35,26 @@ Overview ... @@ -35,26 +35,26 @@ Overview -------- -------- A classifier is subject to two types of errors, either the real access/signal A classifier is subject to two types of errors, either the real access/signal is rejected (false rejection) or an impostor attack/a false access is accepted is rejected (false negative) or an impostor attack/a false access is accepted (false acceptance). A possible way to measure the detection performance is to (false positive). A possible way to measure the detection performance is to use the Half Total Error Rate (HTER), which combines the False Rejection Rate use the Half Total Error Rate (HTER), which combines the False Negative Rate (FRR) and the False Acceptance Rate (FAR) and is defined in the following (FNR) and the False Positive Rate (FPR) and is defined in the following formula: formula: .. math:: .. math:: HTER(\tau, \mathcal{D}) = \frac{FAR(\tau, \mathcal{D}) + FRR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]} HTER(\tau, \mathcal{D}) = \frac{FPR(\tau, \mathcal{D}) + FNR(\tau, \mathcal{D})}{2} \quad \textrm{[\%]} where :math:\mathcal{D} denotes the dataset used. Since both the FAR and the where :math:\mathcal{D} denotes the dataset used. Since both the FPR and the FRR depends on the threshold :math:\tau, they are strongly related to each FNR depends on the threshold :math:\tau, they are strongly related to each other: increasing the FAR will reduce the FRR and vice-versa. For this reason, other: increasing the FPR will reduce the FNR and vice-versa. For this reason, results are often presented using either a Receiver Operating Characteristic results are often presented using either a Receiver Operating Characteristic (ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically (ROC) or a Detection-Error Tradeoff (DET) plot, these two plots basically present the FAR versus the FRR for different values of the threshold. Another present the FPR versus the FNR for different values of the threshold. Another widely used measure to summarise the performance of a system is the Equal Error widely used measure to summarise the performance of a system is the Equal Error Rate (EER), defined as the point along the ROC or DET curve where the FAR Rate (EER), defined as the point along the ROC or DET curve where the FPR equals the FRR. equals the FNR. However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be However, it was noted in by Bengio et al. (2004) that ROC and DET curves may be misleading when comparing systems. Hence, the so-called Expected Performance misleading when comparing systems. Hence, the so-called Expected Performance ... @@ -63,13 +63,13 @@ performance of a system at various operating points. Indeed, in real-world ... @@ -63,13 +63,13 @@ performance of a system at various operating points. Indeed, in real-world scenarios, the threshold :math:\tau has to be set a priori: this is typically scenarios, the threshold :math:\tau has to be set a priori: this is typically done using a development set (also called cross-validation set). Nevertheless, done using a development set (also called cross-validation set). Nevertheless, the optimal threshold can be different depending on the relative importance the optimal threshold can be different depending on the relative importance given to the FAR and the FRR. Hence, in the EPC framework, the cost given to the FPR and the FNR. Hence, in the EPC framework, the cost :math:\beta \in [0;1] is defined as the trade-off between the FAR and FRR. :math:\beta \in [0;1] is defined as the trade-off between the FPR and FNR. The optimal threshold :math:\tau^* is then computed using different values of The optimal threshold :math:\tau^* is then computed using different values of :math:\beta, corresponding to different operating points: :math:\beta, corresponding to different operating points: .. math:: .. math:: \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FAR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FRR}(\tau, \mathcal{D}_{d}) \tau^{*} = \arg\!\min_{\tau} \quad \beta \cdot \textrm{FPR}(\tau, \mathcal{D}_{d}) + (1-\beta) \cdot \textrm{FNR}(\tau, \mathcal{D}_{d}) where :math:\mathcal{D}_{d} denotes the development set and should be where :math:\mathcal{D}_{d} denotes the development set and should be ... @@ -122,15 +122,15 @@ the following techniques: ... @@ -122,15 +122,15 @@ the following techniques: >>> # negatives, positives = parse_my_scores(...) # write parser if not provided! >>> # negatives, positives = parse_my_scores(...) # write parser if not provided! >>> T = 0.0 #Threshold: later we explain how one can calculate these >>> T = 0.0 #Threshold: later we explain how one can calculate these >>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T) >>> correct_negatives = bob.measure.correctly_classified_negatives(negatives, T) >>> FAR = 1 - (float(correct_negatives.sum())/negatives.size) >>> FPR = 1 - (float(correct_negatives.sum())/negatives.size) >>> correct_positives = bob.measure.correctly_classified_positives(positives, T) >>> correct_positives = bob.measure.correctly_classified_positives(positives, T) >>> FRR = 1 - (float(correct_positives.sum())/positives.size) >>> FNR = 1 - (float(correct_positives.sum())/positives.size) We do provide a method to calculate the FAR and FRR in a single shot: We do provide a method to calculate the FPR and FNR in a single shot: .. doctest:: .. doctest:: >>> FAR, FRR = bob.measure.farfrr(negatives, positives, T) >>> FPR, FNR = bob.measure.farfrr(negatives, positives, T) The threshold T is normally calculated by looking at the distribution of The threshold T is normally calculated by looking at the distribution of negatives and positives in a development (or validation) set, selecting a negatives and positives in a development (or validation) set, selecting a ... @@ -170,12 +170,12 @@ calculation of the threshold: ... @@ -170,12 +170,12 @@ calculation of the threshold: calculating the threshold based on the provided scores. Instead, the closest calculating the threshold based on the provided scores. Instead, the closest possible threshold is returned. For example, using possible threshold is returned. For example, using :any:bob.measure.eer_threshold **will not** give you a threshold where :any:bob.measure.eer_threshold **will not** give you a threshold where :math:FAR == FRR. Hence, you cannot report :math:FAR or :math:FRR :math:FPR == FNR. Hence, you cannot report :math:FPR or :math:FNR instead of :math:EER; you should report :math:(FAR+FRR)/2 instead. This instead of :math:EER; you should report :math:(FPR+FNR)/2 instead. This is also true for :any:bob.measure.far_threshold and is also true for :any:bob.measure.far_threshold and :any:bob.measure.frr_threshold. The threshold returned by those functions :any:bob.measure.frr_threshold. The threshold returned by those functions does not guarantee that using that threshold you will get the requested does not guarantee that using that threshold you will get the requested :math:FAR or :math:FRR value. Instead, you should recalculate using :math:FPR or :math:FNR value. Instead, you should recalculate using :any:bob.measure.farfrr. :any:bob.measure.farfrr. .. note:: .. note:: ... @@ -280,8 +280,8 @@ town. To plot an ROC curve, in possession of your **negatives** and ... @@ -280,8 +280,8 @@ town. To plot an ROC curve, in possession of your **negatives** and >>> # we assume you have your negatives and positives already split >>> # we assume you have your negatives and positives already split >>> npoints = 100 >>> npoints = 100 >>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP >>> pyplot.grid(True) >>> pyplot.grid(True) >>> pyplot.show() # doctest: +SKIP >>> pyplot.show() # doctest: +SKIP ... @@ -299,8 +299,8 @@ You should see an image like the following one: ... @@ -299,8 +299,8 @@ You should see an image like the following one: npoints = 100 npoints = 100 bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') bob.measure.plot.roc(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') pyplot.grid(True) pyplot.grid(True) pyplot.xlabel('FAR (%)') pyplot.xlabel('FPR (%)') pyplot.ylabel('FRR (%)') pyplot.ylabel('FNR (%)') pyplot.title('ROC') pyplot.title('ROC') As can be observed, plotting methods live in the namespace As can be observed, plotting methods live in the namespace ... @@ -329,8 +329,8 @@ A DET curve can be drawn using similar commands such as the ones for the ROC cur ... @@ -329,8 +329,8 @@ A DET curve can be drawn using similar commands such as the ones for the ROC cur >>> npoints = 100 >>> npoints = 100 >>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') # doctest: +SKIP >>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP >>> bob.measure.plot.det_axis([0.01, 40, 0.01, 40]) # doctest: +SKIP >>> pyplot.xlabel('FAR (%)') # doctest: +SKIP >>> pyplot.xlabel('FPR (%)') # doctest: +SKIP >>> pyplot.ylabel('FRR (%)') # doctest: +SKIP >>> pyplot.ylabel('FNR (%)') # doctest: +SKIP >>> pyplot.grid(True) >>> pyplot.grid(True) >>> pyplot.show() # doctest: +SKIP >>> pyplot.show() # doctest: +SKIP ... @@ -350,8 +350,8 @@ This will produce an image like the following one: ... @@ -350,8 +350,8 @@ This will produce an image like the following one: bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') bob.measure.plot.det(negatives, positives, npoints, color=(0,0,0), linestyle='-', label='test') bob.measure.plot.det_axis([0.1, 80, 0.1, 80]) bob.measure.plot.det_axis([0.1, 80, 0.1, 80]) pyplot.grid(True) pyplot.grid(True) pyplot.xlabel('FAR (%)') pyplot.xlabel('FPR (%)') pyplot.ylabel('FRR (%)') pyplot.ylabel('FNR (%)') pyplot.title('DET') pyplot.title('DET') .. note:: .. note:: ... @@ -444,9 +444,9 @@ The detection & identification curve is designed to evaluate open set ... @@ -444,9 +444,9 @@ The detection & identification curve is designed to evaluate open set identification tasks. It can be plotted using the identification tasks. It can be plotted using the :py:func:bob.measure.plot.detection_identification_curve function, but it :py:func:bob.measure.plot.detection_identification_curve function, but it requires at least one open-set probe, i.e., where no corresponding positive requires at least one open-set probe, i.e., where no corresponding positive score exists, for which the FAR values are computed. Here, we plot the score exists, for which the FPR values are computed. Here, we plot the detection and identification curve for rank 1, so that the recognition rate for detection and identification curve for rank 1, so that the recognition rate for FAR=1 will be identical to the rank one :py:func:bob.measure.recognition_rate FPR=1 will be identical to the rank one :py:func:bob.measure.recognition_rate obtained in the CMC plot above. obtained in the CMC plot above. .. plot:: .. plot:: ... @@ -498,24 +498,26 @@ Metrics ... @@ -498,24 +498,26 @@ Metrics ======= ======= To calculate the threshold using a certain criterion (EER (default) or min.HTER) To calculate the threshold using a certain criterion (EER (default) or min.HTER) on a set, after setting up |project|, just do: on a development set and conduct the threshold computation and its performance on an evaluation set, after setting up |project|, just do: .. code-block:: sh .. code-block:: sh $bob measure metrics dev-1.txt ./bin/bob measure metrics ./MTest1/scores-{dev,eval} -e [Min. criterion: EER] Threshold on Development set dev-1.txt: -8.025286e-03 [Min. criterion: EER ] Threshold on Development set ./MTest1/scores-dev: -1.373550e-02 ==== =================== bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-dev .. Development dev-1 bob.measure@2018-06-29 10:20:14,177 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval ==== =================== =================== ================ ================ FtA 0.000% .. Development Evaluation FMR 6.263% (31/495) =================== ================ ================ FNMR 6.208% (28/451) False Positive Rate 15.5% (767/4942) 15.5% (767/4942) FAR 5.924% False Negative Rate 15.5% (769/4954) 15.5% (769/4954) FRR 11.273% Precision 0.8 0.8 HTER 8.599% Recall 0.8 0.8 ==== =================== F1-score 0.8 0.8 =================== ================ ================ The output will present the threshold together with the FtA, FMR, FMNR, FAR, FRR and The output will present the threshold together with the FPR, FNR, Precision, Recall, F1-score and HTER on the given set, calculated using such a threshold. The relative counts of FAs HTER on the given set, calculated using such a threshold. The relative counts of FAs and FRs are also displayed between parenthesis. and FRs are also displayed between parenthesis. ... @@ -531,37 +533,23 @@ To evaluate the performance of a new score file with a given threshold, use ... @@ -531,37 +533,23 @@ To evaluate the performance of a new score file with a given threshold, use .. code-block:: sh .. code-block:: sh$ bob measure metrics --thres 0.006 eval-1.txt ./bin/bob measure metrics ./MTest1/scores-eval --thres 0.006 [Min. criterion: user provider] Threshold on Development set eval-1: 6.000000e-03 [Min. criterion: user provided] Threshold on Development set ./MTest1/scores-eval: 6.000000e-03 ==== ==================== bob.measure@2018-06-29 10:22:06,852 -- ERROR: NaNs scores (1.0%) were found in ./MTest1/scores-eval .. Development eval-1 =================== ================ ==== ==================== .. Development FtA 0.000% =================== ================ FMR 5.010% (24/479) False Positive Rate 15.2% (751/4942) FNMR 6.977% (33/473) False Negative Rate 16.1% (796/4954) FAR 4.770% Precision 0.8 FRR 11.442% Recall 0.8 HTER 8.106% F1-score 0.8 ==== ==================== =================== ================ You can simultaneously conduct the threshold computation and its performance You can simultaneously conduct the threshold computation and its performance on an evaluation set: on an evaluation set: .. code-block:: sh \$ bob measure metrics -e dev-1.txt eval-1.txt [Min. criterion: EER] Threshold on Development set dev-1: -8.025286e-03 ==== =================== =============== .. Development dev-1 Eval. eval-1 ==== =================== =============== FtA 0.000% 0.000% FMR 6.263% (31/495) 5.637% (27/479) FNMR 6.208% (28/451) 6.131% (29/473) FAR 5.924% 5.366% FRR 11.273% 10.637% HTER 8.599% 8.001% ==== =================== =============== .. note:: .. note:: Table format can be changed using --tablefmt option, the default format Table format can be changed using --tablefmt option, the default format being rst. Please refer to bob measure metrics --help for more details. being rst. Please refer to bob measure metrics --help for more details. ... ...
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!