Commit be481619 authored by Manuel Günther's avatar Manuel Günther

Introduced Detection and Identification Rate curve (see Handbook of Face Recognition)

parent 4099b17d
......@@ -79,32 +79,38 @@ def relevance (input, machine):
return retval
def recognition_rate(cmc_scores, threshold=None):
"""recognition_rate(cmc_scores, threshold) -> RR
def recognition_rate(cmc_scores, rank = None, threshold=None):
"""recognition_rate(cmc_scores, threshold) -> RR
Calculates the recognition rate from the given input, which is identical
to the rank 1 (C)MC value.
to the CMC value for the given ``rank``.
The input has a specific format, which is a list of two-element tuples. Each
of the tuples contains the negative and the positive scores for one test
item. To read the lists from score files in 4 or 5 column format, please use
the :py:func:`bob.measure.load.cmc_four_column` or
:py:func:`bob.measure.load.cmc_five_column` function.
This function requires that at least one positive example is provided for each pair -- a property that is assured by these functions.
If **threshold** is set to `None`, the recognition rate is defined as the number of test items, for which the
positive score is greater than or equal to all negative scores, divided by
the number of all test items. If several positive scores for one test item exist, the **highest** score is taken.
If **threshold** assumes one value, the recognition rate is defined as the number of test items, for which the
If **threshold** is given, the recognition rate is defined as the number of test items, for which the
positive score is greater than or equal to all negative scores and the threshold divided by
the number of all test items. If several positive scores for one test item exist, the **highest** score is taken.
**Parameters:**
``cmc_scores`` : CMC scores loaded with one of the functions (:py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column`)
``threshold`` : Decision threshold. If `None`, the decision threshold will be the **highest** positive score.
``cmc_scores`` : [(array_like(1D, float), array_like(1D, float))]
CMC scores loaded with one of the functions (:py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column`)
``rank`` : int
The ranks for which the recognition rate should be computed.
``threshold`` : float or ``None``
Decision threshold. If ``None``, the decision threshold will be the **highest** positive score.
**Returns:**
``RR`` : float
......@@ -114,30 +120,27 @@ def recognition_rate(cmc_scores, threshold=None):
if not cmc_scores:
return 0.
correct = 0.
if rank is None:
rank = 1
correct = 0
for neg, pos in cmc_scores:
if((type(pos)!=float) and (len(pos) == 0)):
raise ValueError("For the CMC computation at least one positive score per pair is necessary.")
#If threshold is none, let's use the highest positive score as the decision threshold
if(threshold is None):
# get the maximum positive score for the current probe item
# (usually, there is only one positive score, but just in case...)
max_pos = numpy.max(pos)
# check if the positive score is smaller than all negative scores
if (neg < max_pos).all():
correct += 1.
else:
#If threshold is NOT None, we have an openset identification
max_pos = numpy.max(pos)
if((threshold < max_pos) and (neg < max_pos).all()):
correct += 1.
# return relative number of correctly matched scores
return correct / float(len(cmc_scores))
# get the maximum positive score for the current probe item
# (usually, there is only one positive score, but just in case...)
max_pos = numpy.max(pos)
# count the number of negative scores that are higher than the best positive score
index = numpy.sum(neg >= max_pos)
if index < rank and (threshold is None or threshold <= max_pos):
correct += 1
return correct / float(len(cmc_scores))
def cmc(cmc_scores):
def cmc(cmc_scores, threshold = None):
"""cmc(cmc_scores) -> curve
Calculates the cumulative match characteristic (CMC) from the given input.
......@@ -152,14 +155,15 @@ def cmc(cmc_scores):
calculated. The rank is computed as the number of negative scores that are
higher than the positive score. If several positive scores for one test item
exist, the **highest** positive score is taken. The CMC finally computes how
many test items have rank r or higher.
many test items have rank r or higher, divided by the total number of test values.
**Parameters:**
``cmc_scores`` : [(array_like(1D, float), array_like(1D, float))]
A list of tuples, where each tuple contains the ``negative`` and ``positive`` scores for one probe of the database
``threshold`` : Decision threshold. If `None`, the decision threshold will be the **highest** positive score.
``threshold`` : float or ``None``
Decision threshold. If ``None``, the decision threshold will be the **highest** positive score.
**Returns:**
......@@ -177,16 +181,17 @@ def cmc(cmc_scores):
for neg, pos in cmc_scores:
if((type(pos)!=float) and (len(pos) == 0)):
raise ValueError("For the CMC computation at least one positive score is necessary. Please review who you are loading the scores. You must set `load_only_negatives=False` in the :py:func:`bob.measure.load.cmc_four_column` or `:py:func:`bob.measure.load.cmc_five_column` methods.")
raise ValueError("For the CMC computation at least one positive score per pair is necessary.")
# get the maximum positive score for the current probe item
# (usually, there is only one positive score, but just in case...)
# (usually, there is only one positive score, but just in case...)
max_pos = numpy.max(pos)
# count the number of negative scores that are higher than the best positive score
# count the number of negative scores that are higher than the best positive score
index = numpy.sum(neg >= max_pos)
match_characteristic[index] += 1
if threshold is None or threshold <= max_pos:
match_characteristic[index] += 1
# cumulate
cumulative_match_characteristic = numpy.ndarray(match_characteristic.shape, numpy.float64)
count = 0.
......@@ -197,7 +202,6 @@ def cmc(cmc_scores):
return cumulative_match_characteristic
def get_config():
"""Returns a string containing the configuration information.
"""
......
......@@ -3,10 +3,34 @@
# Chakka Murali Mohan, Trainee, IDIAP Research Institute, Switzerland.
# Mon 23 May 2011 14:36:14 CEST
def log_values(min_step = -4, counts_per_step = 4):
"""log_values(min_step, counts_per_step) -> log_list
This function computes log-scaled values between :math:`10^{M}` and 1 (including), where :math:`M` is the ``min_ste`` argument, which needs to be a negative integer.
The integral ``counts_per_step`` value defines how many values between two adjacent powers of 10 will be created.
The total number of values will be ``-min_step * counts_per_step + 1``.
**Parameters:**
``min_step`` : int (negative)
The power of 10 that will be the minimum value. E.g., the default ``-4`` will result in the first number to be :math:`10^{-4}` = ``0.00001`` or ``0.01%``
``counts_per_step`` : int (positive)
The number of values that will be put between two adjacent powers of 10.
With the default value ``4`` (and default values of ``min_step``), we will get ``log_list[0] == 1e-4``, ``log_list[4] == 1e-3``, ..., ``log_list[16] == 1``.
**Returns**
``log_list`` : [float]
A list of logarithmically scaled values between :math:`10^{M}` and 1.
"""
import math
return [math.pow(10., i * 1./counts_per_step) for i in range(min_step*counts_per_step,0)] + [1.]
"""Methods to plot error analysis figures such as ROC, precision-recall curve, EPC and DET"""
def roc(negatives, positives, npoints=100, CAR=False, **kwargs):
"""Plots Receiver Operating Charactaristic (ROC) curve.
"""Plots Receiver Operating Characteristic (ROC) curve.
This method will call ``matplotlib`` to plot the ROC curve for a system which
contains a particular set of negatives (impostors) and positives (clients)
......@@ -51,6 +75,46 @@ def roc(negatives, positives, npoints=100, CAR=False, **kwargs):
return pyplot.semilogx(100.0*out[0,:], 100.0*(1-out[1,:]), **kwargs)
def roc_for_far(negatives, positives, far_values = log_values(), **kwargs):
"""Plots Receiver Operating Characteristic (ROC) curve for the given list of False Acceptance Rates (FAR).
This method will call ``matplotlib`` to plot the ROC curve for a system which
contains a particular set of negatives (impostors) and positives (clients)
scores. We use the standard :py:func:`matplotlib.pyplot.semilogx` command. All parameters
passed with exception of the three first parameters of this method will be
directly passed to the plot command.
The plot will represent the False Acceptance Rate (FAR) on the horizontal axis and the Correct Acceptance Rate (CAR) on the vertical axis.
The values for the axis will be computed using :py:func:`bob.measure.roc_for_far`.
.. note::
This function does not initiate and save the figure instance, it only
issues the plotting command. You are the responsible for setting up and
saving the figure as you see fit.
**Parameters:**
``negatives, positives`` : array_like(1D, float)
The list of negative and positive scores forwarded to :py:func:`bob.measure.roc`
``far_values`` : [float]
The values for the FAR, where the CAR should be plotted; each value should be in range [0,1].
``kwargs`` : keyword arguments
Extra plotting parameters, which are passed directly to :py:func:`matplotlib.pyplot.plot`.
**Returns:**
The return value is the matplotlib line that was added as defined by :py:func:`matplotlib.pyplot.semilogx`.
"""
from matplotlib import pyplot
from . import roc_for_far as calc
out = calc(negatives, positives, far_values)
return pyplot.semilogx(100.0*out[0,:], 100.0*(1-out[1,:]), **kwargs)
def precision_recall_curve(negatives, positives, npoints=100, **kwargs):
"""Plots Precision-Recall curve.
......@@ -341,3 +405,50 @@ def cmc(cmc_scores, logx = True, **kwargs):
pyplot.plot(range(1, len(out)+1), out * 100, **kwargs)
return len(out)
def detection_identification_rate(cmc_scores, far_values = log_values(), rank = None, logx = True, **kwargs):
"""Plots the Detection & Identification rate curve over the FAR for the given FAR values.
This curve is designed to be used in an open set identification protocol, and defined in Chapter 14.1 of [LiJain2005]_.
**Parameters:**
``cmc_scores`` : [(array_like(1D, float), array_like(1D, float))]
See :py:func:`bob.measure.cmc`
``far_values`` : [float]
The values for the FAR, where the CAR should be plotted; each value should be in range [0,1].
``rank`` : int or ``None``
The rank for which the curve should be plotted. If ``None``, rank 1 is assumed.
``logx`` : bool
Plot the FAR axis in logarithmic scale using :py:func:`matplotlib.pyplot.semilogx` or in linear scale using :py:func:`matplotlib.pyplot.plot`? (Default: ``True``)
``kwargs`` : keyword arguments
Extra plotting parameters, which are passed directly to :py:func:`matplotlib.pyplot.plot` or :py:func:`matplotlib.pyplot.semilogx`.
**Returns:**
The return value is the ``matplotlib`` line that was added as defined by :py:func:`matplotlib.pyplot.plot`.
.. [LiJain2005] **Stan Li and Anil K. Jain**, *Handbook of Face Recognition*, Springer, 2005
"""
from matplotlib import pyplot
from . import far_threshold, recognition_rate
# get all negative scores and sort them to compute the FAR thresholds
negatives = sorted(n for neg,pos in cmc_scores for n in neg)
# compute thresholds based on FAR values
thresholds = [far_threshold(negatives, [], v, True) for v in far_values]
# compute recognition rates based on threshold for the given rank
rates = [100.*recognition_rate(cmc_scores, rank, t) for t in thresholds]
# plot curve
if logx:
return pyplot.semilogx(far_values, rates, **kwargs)
else:
return pyplot.plot(far_values, rates, **kwargs)
......@@ -30,6 +30,7 @@ def parse_command_line(command_line_options):
parser.add_argument('-s', '--score-file', required = True, help = 'The score file in 4 or 5 column format to test.')
parser.add_argument('-o', '--output-pdf-file', default = 'cmc.pdf', help = 'The PDF file to write.')
parser.add_argument('-l', '--log-x-scale', action='store_true', help = 'Plot logarithmic Rank axis.')
parser.add_argument('-r', '--rank', type=int, help = 'Plot Detection & Identification rate curve for the given rank instead of the CMC curve.')
parser.add_argument('-x', '--no-plot', action = 'store_true', help = 'Do not print a PDF file, but only report the results.')
parser.add_argument('-p', '--parser', default = '4column', choices = ('4column', '5column'), help = 'The type of the score file.')
......@@ -57,7 +58,7 @@ def main(command_line_options = None):
data = {'4column' : load.cmc_four_column, '5column' : load.cmc_five_column}[args.parser](args.score_file)
# compute recognition rate
rr = recognition_rate(data)
rr = recognition_rate(data, args.rank)
print("Recognition rate for score file", args.score_file, "is %3.2f%%" % (rr * 100))
if not args.no_plot:
......@@ -71,19 +72,34 @@ def main(command_line_options = None):
# CMC
fig = mpl.figure()
max_rank = plot.cmc(data, color=(0,0,1), linestyle='--', dashes=(6,2), logx = args.log_x_scale)
mpl.title("CMC Curve")
if args.log_x_scale:
mpl.xlabel('Rank (log)')
if args.rank is None:
max_rank = plot.cmc(data, color=(0,0,1), linestyle='--', dashes=(6,2), logx = args.log_x_scale)
mpl.title("CMC Curve")
if args.log_x_scale:
mpl.xlabel('Rank (log)')
else:
mpl.xlabel('Rank')
mpl.ylabel('Recognition Rate in %')
ticks = [int(t) for t in mpl.xticks()[0]]
mpl.xticks(ticks, ticks)
mpl.xlim([1, max_rank])
else:
mpl.xlabel('Rank')
mpl.ylabel('Recognition Rate in %')
plot.detection_identification_rate(data, rank = args.rank, color=(0,0,1), linestyle='--', dashes=(6,2), logx = args.log_x_scale)
mpl.title("Detection & Identification Curve")
if args.log_x_scale:
mpl.xlabel('False Acceptance Rate (log) in %')
else:
mpl.xlabel('False Acceptance Rate in %')
mpl.ylabel('Detection & Identification Rate in %')
ticks = ["%s"%(t*100) for t in mpl.xticks()[0]]
mpl.xticks(mpl.xticks()[0], ticks)
mpl.xlim([1e-4, 1])
mpl.grid(True, color=(0.3,0.3,0.3))
mpl.ylim(ymax=101)
# convert log-scale ticks to normal numbers
ticks = [int(t) for t in mpl.xticks()[0]]
mpl.xticks(ticks, ticks)
mpl.xlim([0.9, max_rank + 0.1])
pp.savefig(fig)
pp.close()
......
......@@ -72,3 +72,4 @@ def test_compute_cmc():
from .script.plot_cmc import main
nose.tools.eq_(main(['--self-test', '--score-file', SCORES_4COL_CMC, '--log-x-scale']), 0)
nose.tools.eq_(main(['--self-test', '--score-file', SCORES_5COL_CMC, '--parser', '5column']), 0)
nose.tools.eq_(main(['--self-test', '--score-file', SCORES_4COL_CMC, '--rank', '1']), 0)
......@@ -202,6 +202,7 @@ You should see an image like the following one:
.. plot::
import numpy
numpy.random.seed(42)
import bob.measure
from matplotlib import pyplot
......@@ -215,13 +216,13 @@ You should see an image like the following one:
pyplot.title('ROC')
As can be observed, plotting methods live in the namespace
:py:mod:`bob.measure.plot`. They work like `Matplotlib`_'s `plot()`_ method
:py:mod:`bob.measure.plot`. They work like the :py:func:`matplotlib.pyplot.plot`
itself, except that instead of receiving the x and y point coordinates as
parameters, they receive the two :py:class:`numpy.ndarray` arrays with
negatives and positives, as well as an indication of the number of points the
curve must contain.
As in `Matplotlib`_'s `plot()`_ command, you can pass optional parameters for
As in the :py:func:`matplotlib.pyplot.plot` command, you can pass optional parameters for
the line as shown in the example to setup its color, shape and even the label.
For an overview of the keywords accepted, please refer to the `Matplotlib`_'s
Documentation. Other plot properties such as the plot title, axis labels,
......@@ -250,6 +251,7 @@ This will produce an image like the following one:
.. plot::
import numpy
numpy.random.seed(42)
import bob.measure
from matplotlib import pyplot
......@@ -300,6 +302,7 @@ This will produce an image like the following one:
.. plot::
import numpy
numpy.random.seed(42)
import bob.measure
from matplotlib import pyplot
......@@ -323,26 +326,55 @@ The CMC can be calculated from a relatively complex data structure, which define
.. plot::
import numpy
numpy.random.seed(42)
import bob.measure
from matplotlib import pyplot
scores = []
cmc_scores = []
for probe in range(10):
positives = numpy.random.normal(1, 1, 1)
negatives = numpy.random.normal(0, 1, 19)
scores.append((negatives, positives))
bob.measure.plot.cmc(scores, logx=False)
cmc_scores.append((negatives, positives))
bob.measure.plot.cmc(cmc_scores, logx=False)
pyplot.title('CMC')
pyplot.xlabel('Rank')
pyplot.xticks([1,5,10,20])
pyplot.xlim([1,20])
pyplot.ylim([0,100])
pyplot.ylabel('Probability of Recognition (%)')
Usually, there is only a single positive score per probe, but this is not a fixed restriction.
.. note::
The complex data structure can be read from our default 4 or 5 column score files using the :py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column` function.
Detection & Identification Curve
================================
The detection & identification curve is designed to evaluate open set identification tasks.
It can be plotted using the :py:func:`bob.measure.plot.detection_identification_rate` function.
Here, we plot the detection & identification curve for rank 1, so that the recognition rate for FAR=1 will be identical to the rank one recognition rate obtained in the CMC plot above.
.. plot::
import numpy
numpy.random.seed(42)
import bob.measure
from matplotlib import pyplot
cmc_scores = []
for probe in range(10):
positives = numpy.random.normal(1, 1, 1)
negatives = numpy.random.normal(0, 1, 19)
cmc_scores.append((negatives, positives))
bob.measure.plot.detection_identification_rate(cmc_scores, rank=1, logx=True)
pyplot.xlabel('FAR')
pyplot.ylabel('Detection & Identification Rate (%)')
pyplot.ylim([0,100])
Fine-tunning
============
......@@ -497,5 +529,4 @@ These information are simply stored in the score file, and no further check is a
.. _`The Expected Performance Curve`: http://publications.idiap.ch/downloads/reports/2005/bengio_2005_icml.pdf
.. _`The DET curve in assessment of detection task performance`: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.4489&rep=rep1&type=pdf
.. _`plot()`: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot
.. _openbr: http://openbiometrics.org
......@@ -92,6 +92,7 @@ Plotting
bob.measure.plot.epc
bob.measure.plot.precision_recall_curve
bob.measure.plot.cmc
bob.measure.plot.detection_identification_rate
OpenBR conversions
------------------
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment