Commit 4422a1ac authored by André Anjos's avatar André Anjos 💬
Browse files

Improve documentation

parent 4011fbe8
......@@ -13,10 +13,9 @@ from . import openbr
import numpy
def mse (estimation, target):
"""mse(estimation, target) -> error
"""Mean square error between a set of outputs and target values
Calculates the mean square error between a set of outputs and target
values using the following formula:
Uses the formula:
.. math::
......@@ -26,14 +25,30 @@ def mse (estimation, target):
have 2 dimensions. Different examples are organized as rows while different
features in the estimated values or targets are organized as different
columns.
Parameters:
estimation (array): an N-dimensional array that corresponds to the value
estimated by your procedure
target (array): an N-dimensional array that corresponds to the expected
value
Returns:
float: The average of the squared error between the estimated value and the
target
"""
return numpy.mean((estimation - target)**2, 0)
def rmse (estimation, target):
"""rmse(estimation, target) -> error
"""Calculates the root mean square error between a set of outputs and target
Calculates the root mean square error between a set of outputs and target
values using the following formula:
Uses the formula:
.. math::
......@@ -43,14 +58,30 @@ def rmse (estimation, target):
have 2 dimensions. Different examples are organized as rows while different
features in the estimated values or targets are organized as different
columns.
Parameters:
estimation (array): an N-dimensional array that corresponds to the value
estimated by your procedure
target (array): an N-dimensional array that corresponds to the expected
value
Returns:
float: The square-root of the average of the squared error between the
estimated value and the target
"""
return numpy.sqrt(mse(estimation, target))
def relevance (input, machine):
"""relevance (input, machine) -> relevances
"""Calculates the relevance of every input feature to the estimation process
Calculates the relevance of every input feature to the estimation process
using the following definition from:
Uses the formula:
Neural Triggering System Operating on High Resolution Calorimetry
Information, Anjos et al, April 2006, Nuclear Instruments and Methods in
......@@ -65,6 +96,22 @@ def relevance (input, machine):
input vectors. For this to work, the `input` parameter has to be a 2D array
with features arranged column-wise while different examples are arranged
row-wise.
Parameters:
input (array): an N-dimensional array that corresponds to the value
estimated by your model
machine (object): A machine that can be called to "process" your input
Returns:
array: An 1D float array as large as the number of columns (second
dimension) of your input array, estimating the "relevance" of each input
column (or feature) to the score provided by the machine.
"""
o = machine(input)
......@@ -80,51 +127,74 @@ def relevance (input, machine):
def recognition_rate(cmc_scores, threshold = None, rank = 1):
"""recognition_rate(cmc_scores, rank, threshold) -> RR
"""Calculates the recognition rate from the given input
Calculates the recognition rate from the given input, which is identical
to the CMC value for the given ``rank``.
It is identical to the CMC value for the given ``rank``.
The input has a specific format, which is a list of two-element tuples.
Each of the tuples contains the negative :math:`\\{S_p^-\\}` and the positive :math:`\\{S_p^+\\}` scores for one probe item :math:`p`, or ``None`` in case of open set recognition.
To read the lists from score files in 4 or 5 column format, please use the :py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column` function.
The input has a specific format, which is a list of two-element tuples. Each
of the tuples contains the negative :math:`\\{S_p^-\\}` and the positive
:math:`\\{S_p^+\\}` scores for one probe item :math:`p`, or ``None`` in case
of open set recognition. To read the lists from score files in 4 or 5 column
format, please use the :py:func:`bob.measure.load.cmc_four_column` or
:py:func:`bob.measure.load.cmc_five_column` function.
If **threshold** is set to ``None``, the rank 1 recognition rate is defined as the number of test items, for which the highest positive :math:`\\max\\{S_p^+\\}` score is greater than or equal to all negative scores, divided by the number of all probe items :math:`P`:
If ``threshold`` is set to ``None``, the rank 1 recognition rate is defined
as the number of test items, for which the highest positive
:math:`\\max\\{S_p^+\\}` score is greater than or equal to all negative
scores, divided by the number of all probe items :math:`P`:
.. math::
\\mathrm{RR} = \\frac{1}{P} \\sum_{p=1}^{P} \\begin{cases} 1 & \\mathrm{if } \\max\\{S_p^+\\} >= \\max\\{S_p^-\\}\\\\ 0 & \\mathrm{otherwise} \\end{cases}
For a given rank :math:`r>1`, up to :math:`r` negative scores that are higher than the highest positive score are allowed to still count as correctly classified in the top :math:`r` rank.
For a given rank :math:`r>1`, up to :math:`r` negative scores that are higher
than the highest positive score are allowed to still count as correctly
classified in the top :math:`r` rank.
If ``threshold`` :math:`\\theta` is given, **all** scores below threshold will be filtered out.
Hence, if all positive scores are below threshold :math:`\\max\\{S_p^+\\} < \\theta`, the probe will be misclassified **at any rank**.
If ``threshold`` :math:`\\theta` is given, **all** scores below threshold
will be filtered out. Hence, if all positive scores are below threshold
:math:`\\max\\{S_p^+\\} < \\theta`, the probe will be misclassified **at any
rank**.
For open set recognition, i.e., when there exist a tuple including negative scores without corresponding positive scores (``None``), and **all** negative scores are below ``threshold`` :math:`\\max\\{S_p^+\\} < \\theta`, the probe item is correctly rejected, **and it does not count into the denominator** :math:`P`.
When no ``threshold`` is provided, the open set probes will **always** count as misclassified, regardless of the ``rank``.
For open set recognition, i.e., when there exist a tuple including negative
scores without corresponding positive scores (``None``), and **all** negative
scores are below ``threshold`` :math:`\\max\\{S_p^+\\} < \\theta`, the probe
item is correctly rejected, **and it does not count into the denominator**
:math:`P`. When no ``threshold`` is provided, the open set probes will
**always** count as misclassified, regardless of the ``rank``.
.. warn:
For open set tests, this rate does not correspond to a standard rate.
Please use :py:func:`detection_identification_rate` and :py:func:`false_alarm_rate` instead.
Please use :py:func:`detection_identification_rate` and
:py:func:`false_alarm_rate` instead.
Parameters:
**Parameters:**
cmc_scores (list): A list in the format ``[(negatives, positives), ...]``
containing the CMC scores loaded with one of the functions
(:py:func:`bob.measure.load.cmc_four_column` or
:py:func:`bob.measure.load.cmc_five_column`).
``cmc_scores`` : [(array_like(1D, float), array_like(1D, float))]
CMC scores loaded with one of the functions (:py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column`).
Each pair contains the ``negative`` and the ``positive`` scores for **one probe item**.
Each pair can contain up to one empty array (or ``None``), i.e., in case of open set recognition.
Each pair contains the ``negative`` and the ``positive`` scores for **one
probe item**. Each pair can contain up to one empty array (or ``None``),
i.e., in case of open set recognition.
``threshold`` : float or ``None``
Decision threshold. If not ``None``, **all** scores will be filtered by the threshold.
In an open set recognition problem, all open set scores (negatives with no corresponding positive) for which all scores are below threshold, will be counted as correctly rejected and **removed** from the probe list (i.e., the denominator).
threshold (Optional[float]): Decision threshold. If not ``None``, **all**
scores will be filtered by the threshold. In an open set recognition
problem, all open set scores (negatives with no corresponding positive)
for which all scores are below threshold, will be counted as correctly
rejected and **removed** from the probe list (i.e., the denominator).
``rank`` : int or ``None``
The rank for which the recognition rate should be computed, 1 by default.
rank (Optional[int]):
The rank for which the recognition rate should be computed, 1 by default.
**Returns:**
``RR`` : float
The (open set) recognition rate for the given rank, a value between 0 and 1.
Returns:
float: The (open set) recognition rate for the given rank, a value between
0 and 1.
"""
# If no scores are given, the recognition rate is exactly 0.
if not cmc_scores:
......@@ -184,9 +254,7 @@ def recognition_rate(cmc_scores, threshold = None, rank = 1):
def cmc(cmc_scores):
"""cmc(cmc_scores) -> curve
Calculates the cumulative match characteristic (CMC) from the given input.
"""Calculates the cumulative match characteristic (CMC) from the given input.
The input has a specific format, which is a list of two-element tuples. Each
of the tuples contains the negative and the positive scores for one probe
......@@ -194,25 +262,38 @@ def cmc(cmc_scores):
the :py:func:`bob.measure.load.cmc_four_column` or
:py:func:`bob.measure.load.cmc_five_column` function.
For each probe item the probability that the rank :math:`r` of the positive score is
calculated. The rank is computed as the number of negative scores that are
higher than the positive score. If several positive scores for one test item
exist, the **highest** positive score is taken. The CMC finally computes how
many test items have rank r or higher, divided by the total number of test values.
For each probe item the probability that the rank :math:`r` of the positive
score is calculated. The rank is computed as the number of negative scores
that are higher than the positive score. If several positive scores for one
test item exist, the **highest** positive score is taken. The CMC finally
computes how many test items have rank r or higher, divided by the total
number of test values.
.. note::
The CMC is not available for open set classification.
Please use the :py:func:`detection_identification_rate` and :py:func:`false_alarm_rate` instead.
**Parameters:**
The CMC is not available for open set classification. Please use the
:py:func:`detection_identification_rate` and :py:func:`false_alarm_rate`
instead.
Parameters:
cmc_scores (list): A list in the format ``[(negatives, positives), ...]``
containing the CMC scores loaded with one of the functions
(:py:func:`bob.measure.load.cmc_four_column` or
:py:func:`bob.measure.load.cmc_five_column`).
``cmc_scores`` : [(array_like(1D, float), array_like(1D, float))]
A list of tuples, where each tuple contains the ``negative`` and ``positive`` scores for one probe of the database
Each pair contains the ``negative`` and the ``positive`` scores for **one
probe item**. Each pair can contain up to one empty array (or ``None``),
i.e., in case of open set recognition.
**Returns:**
``curve`` : array_like(2D, float)
The CMC curve, with the Rank in the first column and the number of correctly classified clients (in this rank) in the second column.
Returns:
array: A 2D float array representing the CMC curve, with the Rank in the
first column and the number of correctly classified clients (in this
rank) in the second column.
"""
# If no scores are given, we cannot plot anything
......@@ -243,33 +324,41 @@ def cmc(cmc_scores):
def detection_identification_rate(cmc_scores, threshold, rank = 1):
"""detection_identification_rate(cmc_scores, threshold, rank) -> dir
"""Computes the `detection and identification rate` for the given threshold.
This value is designed to be used in an open set identification protocol, and
defined in Chapter 14.1 of [LiJain2005]_.
Although the detection and identification rate is designed to be computed on
an open set protocol, it uses only the probe elements, for which a
corresponding gallery element exists. For closed set identification
protocols, this function is identical to :py:func:`recognition_rate`. The
only difference is that for this function, a ``threshold`` for the scores
need to be defined, while for :py:func:`recognition_rate` it is optional.
Parameters:
Computes the `detection and identification rate` for the given threshold.
This value is designed to be used in an open set identification protocol, and defined in Chapter 14.1 of [LiJain2005]_.
cmc_scores (list): A list in the format ``[(negatives, positives), ...]``
containing the CMC scores loaded with one of the functions
(:py:func:`bob.measure.load.cmc_four_column` or
:py:func:`bob.measure.load.cmc_five_column`).
Although the detection and identification rate is designed to be computed on an open set protocol, it uses only the probe elements, for which a corresponding gallery element exists.
For closed set identification protocols, this function is identical to :py:func:`recognition_rate`.
The only difference is that for this function, a ``threshold`` for the scores need to be defined, while for :py:func:`recognition_rate` it is optional.
Each pair contains the ``negative`` and the ``positive`` scores for **one
probe item**. Each pair can contain up to one empty array (or ``None``),
i.e., in case of open set recognition.
**Parameters:**
threshold (float): The decision threshold :math:`\\tau``.
``cmc_scores`` : [(array_like(1D, float), array_like(1D, float))]
CMC scores loaded with one of the functions (:py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column`).
Each pair contains the ``negative`` and the ``positive`` scores for **one probe item**.
There need to be at least one probe item, for which positive and negative scores exist.
rank (Optional[int]): The rank for which the curve should be plotted
``threshold`` : float
The decision threshold :math:`\\tau``.
``rank`` : int
The rank for which the curve should be plotted, by default 1.
Returns:
**Returns:**
float: The detection and identification rate for the given threshold.
``dir`` : float
The detection and identification rate for the given threshold.
"""
# count the correctly classifier probes
correct = 0
counter = 0
......@@ -299,27 +388,34 @@ def detection_identification_rate(cmc_scores, threshold, rank = 1):
def false_alarm_rate(cmc_scores, threshold):
"""false_alarm_rate(cmc_scores, threshold) -> far
"""Computes the `false alarm rate` for the given threshold,.
This value is designed to be used in an open set identification protocol, and
defined in Chapter 14.1 of [LiJain2005]_.
The false alarm rate is designed to be computed on an open set protocol, it
uses only the probe elements, for which **no** corresponding gallery element
exists.
Parameters:
Computes the `false alarm rate` for the given threshold,.
This value is designed to be used in an open set identification protocol, and defined in Chapter 14.1 of [LiJain2005]_.
cmc_scores (list): A list in the format ``[(negatives, positives), ...]``
containing the CMC scores loaded with one of the functions
(:py:func:`bob.measure.load.cmc_four_column` or
:py:func:`bob.measure.load.cmc_five_column`).
The false alarm rate is designed to be computed on an open set protocol, it uses only the probe elements, for which **no** corresponding gallery element exists.
Each pair contains the ``negative`` and the ``positive`` scores for **one
probe item**. Each pair can contain up to one empty array (or ``None``),
i.e., in case of open set recognition.
**Parameters:**
threshold (float): The decision threshold :math:`\\tau``.
``cmc_scores`` : [(array_like(1D, float), array_like(1D, float))]
CMC scores loaded with one of the functions (:py:func:`bob.measure.load.cmc_four_column` or :py:func:`bob.measure.load.cmc_five_column`).
Each pair contains the ``negative`` and the ``positive`` scores for **one probe item**.
There need to be at least one probe item, for which only negative scores exist.
``threshold`` : float
The decision threshold :math:`\\tau``.
Returns:
**Returns:**
float: The false alarm rate.
``far`` : float
The false alarm rate.
"""
incorrect = 0
counter = 0
......
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# Manuel Guenther <Manuel.Guenther@idiap.ch>
# Thu May 16 11:41:49 CEST 2013
#
# Copyright (C) 2011-2013 Idiap Research Institute, Martigny, Switzerland
"""Measures for calibration"""
import math
import numpy
def cllr(negatives, positives):
"""cllr(negatives, positives) -> cllr
"""Cost of log likelihood ratio as defined by the Bosaris toolkit
Computes the 'cost of log likelihood ratio' (:math:`C_{llr}`) measure as
given in the Bosaris toolkit
Parameters:
negatives (array): 1D float array that contains the scores of the
"negative" (noise, non-class) samples of your classifier.
Computes the 'cost of log likelihood ratio' (:math:`C_{llr}`) measure as given in the Bosaris toolkit
positives (array): 1D float array that contains the scores of the
"positive" (signal, class) samples of your classifier.
**Parameters:**
``negatives, positives`` : array_like(1D, float)
The scores computed by comparing elements from different classes and the same class, respectively.
Returns:
**Returns**
float: The computed :math:`C_{llr}` value.
``cllr`` : float
The computed :math:`C_{llr}` value.
"""
sum_pos, sum_neg = 0., 0.
for pos in positives:
......@@ -34,19 +38,25 @@ def cllr(negatives, positives):
def min_cllr(negatives, positives):
"""min_cllr(negatives, positives) -> min_cllr
"""Minimum cost of log likelihood ratio as defined by the Bosaris toolkit
Computes the 'minimum cost of log likelihood ratio' (:math:`C_{llr}^{min}`)
measure as given in the bosaris toolkit
Parameters:
negatives (array): 1D float array that contains the scores of the
"negative" (noise, non-class) samples of your classifier.
Computes the 'minimum cost of log likelihood ratio' (:math:`C_{llr}^{min}`) measure as given in the bosaris toolkit
positives (array): 1D float array that contains the scores of the
"positive" (signal, class) samples of your classifier.
**Parameters:**
``negatives, positives`` : array_like(1D, float)
The scores computed by comparing elements from different classes and the same class, respectively.
Returns:
**Returns**
float: The computed :math:`C_{llr}^{min}` value.
``min_cllr`` : float
The computed :math:`C_{llr}^{min}` value.
"""
from bob.math import pavx
......
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# Andre Anjos <andre.anjos@idiap.ch>
# Mon 23 May 2011 16:23:05 CEST
"""A set of utilities to load score files with different formats.
......@@ -13,23 +12,30 @@ import os
import logging
logger = logging.getLogger('bob.measure')
def open_file(filename, mode='rt'):
"""open_file(filename) -> file_like
"""Opens the given score file for reading.
Score files might be raw text files, or a tar-file including a single score
file inside.
Parameters:
filename (str, file): The name of the score file to open, or a file-like
object open for reading. If a file name is given, the according file
might be a raw text file or a (compressed) tar file containing a raw text
file.
Opens the given score file for reading.
Score files might be raw text files, or a tar-file including a single score file inside.
**Parameters:**
Returns:
``filename`` : str or file-like
The name of the score file to open, or a file-like object open for reading.
If a file name is given, the according file might be a raw text file or a (compressed) tar file containing a raw text file.
**Returns:**
file: A read-only file-like object as it would be returned by
:py:func:`open`.
``file_like`` : file-like
A read-only file-like object as it would be returned by open().
"""
if not isinstance(filename, str) and hasattr(filename, 'read'):
# It seems that this is an open file
return filename
......@@ -54,32 +60,36 @@ def open_file(filename, mode='rt'):
def four_column(filename):
"""four_column(filename) -> claimed_id, real_id, test_label, score
"""Loads a score set from a single file and yield its lines
Loads a score set from a single file and yield its lines (to avoid loading
the score file at once into memory). This function verifies that all fields
are correctly placed and contain valid fields. The score file must contain
the following information in each line:
.. code-block:: text
claimed_id real_id test_label score
Loads a score set from a single file and yield its lines (to avoid loading the score file at once into memory).
This function verifies that all fields are correctly placed and contain valid fields.
The score file must contain the following information in each line:
claimed_id real_id test_label score
Parameters:
**Parametes:**
filename (str, file): The file object that will be opened with
:py:func:`open_file` containing the scores.
``filename`` : str or file-like
The file object that will be opened with :py:func:`open_file` containing the scores.
**Yields:**
Returns:
``claimed_id`` : str
The claimed identity -- the client name of the model that was used in the comparison
str: The claimed identity -- the client name of the model that was used in
the comparison
``real_id`` : str
The real identity -- the client name of the probe that was used in the comparison
str: The real identity -- the client name of the probe that was used in the
comparison
``test_label`` : str
A label of the probe -- usually the probe file name, or the probe id
str: A label of the probe -- usually the probe file name, or the probe id
float: The result of the comparison of the model and the probe
``score`` : float
The result of the comparison of the model and the probe
"""
for i, l in enumerate(open_file(filename)):
......@@ -97,57 +107,67 @@ def four_column(filename):
def split_four_column(filename):
"""split_four_column(filename) -> negatives, positives
"""Loads a score set from a single file and splits the scores
Loads a score set from a single file and splits the scores
between negatives and positives. The score file has to respect the 4 column
format as defined in the method :py:func:`four_column`.
Loads a score set from a single file and splits the scores between negatives
and positives. The score file has to respect the 4 column format as defined
in the method :py:func:`four_column`.
This method avoids loading and allocating memory for the strings present in
the file. We only keep the scores.
**Parameters:**
``filename`` : str or file-like
The file that will be opened with :py:func:`open_file` containing the scores.
Parameters:
filename (str, file): The file object that will be opened with
:py:func:`open_file` containing the scores.
**Returns:**
Returns:
``negatives`` : array_like(1D, float)
The list of ``score``'s, for which the ``claimed_id`` and the ``real_id`` differed (see :py:func:`four_column`).
negatives (array): 1D float array containing the list of scores, for which
the ``claimed_id`` and the ``real_id`` differed (see
:py:func:`four_column`)
positivies (array): 1D float array containing the list of scores, for which
the ``claimed_id`` and the ``real_id`` are identical (see
:py:func:`four_column`)
``positives`` : array_like(1D, float)
The list of ``score``'s, for which the ``claimed_id`` and the ``real_id`` are identical (see :py:func:`four_column`).
"""
score_lines = load_score(filename, 4)
return get_negatives_positives(score_lines)
def cmc_four_column(filename):
"""cmc_four_column(filename) -> cmc_scores
"""Loads scores to compute CMC curves from a file in four column format.
Loads scores to compute CMC curves from a file in four column format.
The four column file needs to be in the same format as described in :py:func:`four_column`,
and the ``test_label`` (column 3) has to contain the test/probe file name or a probe id.
The four column file needs to be in the same format as described in
:py:func:`four_column`, and the ``test_label`` (column 3) has to contain the
test/probe file name or a probe id.
This function returns a list of tuples.
For each probe file, the tuple consists of a list of negative scores and a list of positive scores.
Usually, the list of positive scores should contain only one element, but more are allowed.
The result of this function can directly be passed to, e.g., the :py:func:`bob.measure.cmc` function.
This function returns a list of tuples. For each probe file, the tuple
consists of a list of negative scores and a list of positive scores.
Usually, the list of positive scores should contain only one element, but
more are allowed. The result of this function can directly be passed to,
e.g., the :py:func:`bob.measure.cmc` function.
**Parameters:**
Parameters:
``filename`` : str or file-like
The file that will be opened with :py:func:`open_file` containing the scores.
filename (str, file): The file object that will be opened with
:py:func:`open_file` containing the scores.
**Returns:**
Returns:
``cmc_scores`` : [(negatives, positives)]
A list of tuples, where each tuple contains the ``negative`` and ``positive`` scores for one probe of the database.
Both ``negatives`` and ``positives`` can be either an 1D :py:class:`numpy.ndarray` of type ``float``, or ``None``.
list: A list of tuples, where each tuple contains the ``negative`` and
``positive`` scores for one probe of the database. Both ``negatives`` and
``positives`` can be either an 1D :py:class:`numpy.ndarray` of type
``float``, or ``None``.
"""
# extract positives and negatives
pos_dict = {}