error.h 17.1 KB
Newer Older
André Anjos's avatar
André Anjos committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
/**
 * @date Wed Apr 20 08:02:30 2011 +0200
 * @author Andre Anjos <andre.anjos@idiap.ch>
 *
 * @brief A set of methods that evaluates error from score sets
 *
 * Copyright (C) Idiap Research Institute, Martigny, Switzerland
 */

#ifndef BOB_MEASURE_ERROR_H
#define BOB_MEASURE_ERROR_H

#include <blitz/array.h>
#include <utility>
#include <vector>

namespace bob { namespace measure {

  /**
   * Calculates the FA ratio and the FR ratio given positive and negative
   * scores and a threshold. 'positives' holds the score information for
   * samples that are labeled to belong to a certain class (a.k.a., "signal"
   * or "client"). 'negatives' holds the score information for samples that are
   * labeled *not* to belong to the class (a.k.a., "noise" or "impostor").
   *
   * It is expected that 'positive' scores are, at least by design, greater
   * than 'negative' scores. So, every positive value that falls bellow the
   * threshold is considered a false-rejection (FR). 'negative' samples that
   * fall above the threshold are considered a false-accept (FA).
   *
   * Positives that fall on the threshold (exactly) are considered correctly
   * classified. Negatives that fall on the threshold (exactly) are considered
   * *incorrectly* classified. This equivalent to setting the comparison like
   * this pseudo-code:
   *
   * foreach (positive as K) if K < threshold: falseRejectionCount += 1
   * foreach (negative as K) if K >= threshold: falseAcceptCount += 1
   *
   * The 'threshold' value does not necessarily have to fall in the range
   * covered by the input scores (negatives and positives altogether), but if
   * it does not, the output will be either (1.0, 0.0) or (0.0, 1.0)
   * depending on the side the threshold falls.
   *
   * The output is in form of a std::pair of two double-precision real numbers.
   * The numbers range from 0 to 1. The first element of the pair is the
   * false-accept ratio. The second element of the pair is the false-rejection
   * ratio.
   *
   * It is possible that scores are inverted in the negative/positive sense. In
   * some setups the designer may have setup the system so 'positive' samples
   * have a smaller score than the 'negative' ones. In this case, make sure you
   * normalize the scores so positive samples have greater scores before
   * feeding them into this method.
   */
  std::pair<double, double> farfrr(const blitz::Array<double,1>& negatives,
      const blitz::Array<double,1>& positives, double threshold);

  /**
   * Calculates the precision and recall (sensitiveness) values given positive and negative
   * scores and a threshold. 'positives' holds the score information for
   * samples that are labeled to belong to a certain class (a.k.a., "signal"
   * or "client"). 'negatives' holds the score information for samples that are
   * labeled *not* to belong to the class (a.k.a., "noise" or "impostor").
   *
65
   * For more precise details about how the method considers error rates, please refer to the documentation of the method bob.measure.farfrr.
André Anjos's avatar
André Anjos committed
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
   *
   * It is possible that scores are inverted in the negative/positive sense. In
   * some setups the designer may have setup the system so 'positive' samples
   * have a smaller score than the 'negative' ones. In this case, make sure you
   * normalize the scores so positive samples have greater scores before
   * feeding them into this method.
   */
  std::pair<double, double> precision_recall(const blitz::Array<double,1>& negatives,
      const blitz::Array<double,1>& positives, double threshold);

  /**
   * This method computes F score of the accuracy of the classification. It is a weighted mean of precision and recall measurements. The weight parameter needs to be non-negative real value. In case the weight parameter is 1, the F-score is called F1 score and is a harmonic mean between precision and recall values.
   */
  double f_score(const blitz::Array<double,1>& negatives,
    const blitz::Array<double,1>& positives, double threshold, double weight) ;

  /**
   * This method returns a blitz::Array composed of booleans that pin-point
   * which positives where correctly classified in a 'positive' score sample,
   * given a threshold. It runs the formula:
   *
   * foreach (element k in positive)
   *   if positive[k] >= threshold: returnValue[k] = true
   *   else: returnValue[k] = false
   */
  inline blitz::Array<bool,1> correctlyClassifiedPositives
    (const blitz::Array<double,1>& positives, double threshold) {
      return blitz::Array<bool,1>(positives >= threshold);
    }

  /**
   * This method returns a blitz::Array composed of booleans that pin-point
   * which negatives where correctly classified in a 'negative' score sample,
   * given a threshold. It runs the formula:
   *
   * foreach (element k in negative)
   *   if negative[k] < threshold: returnValue[k] = true
   *   else: returnValue[k] = false
   */
  inline blitz::Array<bool,1> correctlyClassifiedNegatives
    (const blitz::Array<double,1>& negatives, double threshold) {
      return blitz::Array<bool,1>(negatives < threshold);
    }

  /**
   * This method can calculate a threshold based on a set of scores (positives
   * and negatives) given a certain minimization criteria, input as a
   * functional predicate. For a discussion on 'positive' and 'negative' see
   * bob::measure::farfrr().
115
   * Here, it is expected that the positives and the negatives are sorted ascendantly.
André Anjos's avatar
André Anjos committed
116 117 118 119 120 121 122 123 124 125 126
   *
   * The predicate method gives back the current minimum given false-acceptance
   * (FA) and false-rejection (FR) ratios for the input data. As a predicate,
   * it has to be a non-member method or a pre-configured functor where we can
   * use operator(). The API for the method is:
   *
   * double predicate(double fa_ratio, double fr_ratio);
   *
   * Please note that this method will only work with single-minimum smooth
   * predicates.
   *
127 128 129 130 131
   * The minimization is carried out in a data-driven way.
   * Starting from the lowest score (might be a positive or a negative), it
   * increases the threshold based on the distance between the current score
   * and the following higher score (also keeping track of duplicate scores)
   * and computes the predicate for each possible threshold.
André Anjos's avatar
André Anjos committed
132
   *
133 134
   * Finally, that threshold is returned, for which the predicate returned the
   * lowest value.
André Anjos's avatar
André Anjos committed
135
   */
136 137
  template <typename T>
  double minimizingThreshold(const blitz::Array<double,1>& negatives, const blitz::Array<double,1>& positives, T& predicate){
138 139
    if (!negatives.size() || !positives.size()) throw std::runtime_error("Cannot compute threshold when no positives or no negatives are given");

140
    // iterate over the whole set of points
141
    auto pos_it = positives.begin(), neg_it = negatives.begin();
142 143 144 145 146 147 148 149

    // iterate over all possible far and frr points and compute the predicate for each possible threshold...
    double min_predicate = 1e8;
    double min_threshold = 1e8;
    double current_predicate = 1e8;
    // we start with the extreme values for far and frr
    double far = 1., frr = 0.;
    // the decrease/increase for far/frr when moving one negative/positive
150
    double far_decrease = 1./negatives.extent(0), frr_increase = 1./positives.extent(0);
151 152 153
    // we start with the threshold based on the minimum score
    double current_threshold = std::min(*pos_it, *neg_it);
    // now, iterate over both lists, in a sorted order
154
    while (pos_it != positives.end() && neg_it != negatives.end()){
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174
      // compute predicate
      current_predicate = predicate(far, frr);
      if (current_predicate <= min_predicate){
        min_predicate = current_predicate;
        min_threshold = current_threshold;
      }
      if (*pos_it >= *neg_it){
        // compute current threshold
        current_threshold = *neg_it;
        // go to the next negative value
        ++neg_it;
        far -= far_decrease;
      } else {
        // compute current threshold
        current_threshold = *pos_it;
        // go to the next positive
        ++pos_it;
        frr += frr_increase;
      }
      // increase positive and negative as long as they contain the same value
175
      while (neg_it != negatives.end() && *neg_it == current_threshold) {
176 177 178 179
        // go to next negative
        ++neg_it;
        far -= far_decrease;
      }
180
      while (pos_it != positives.end() && *pos_it == current_threshold) {
181 182 183 184 185
        // go to next positive
        ++pos_it;
        frr += frr_increase;
      }
      // compute a new threshold based on the center between last and current score, if we are not already at the end of the score lists
186 187
      if (neg_it != negatives.end() || pos_it != positives.end()){
        if (neg_it != negatives.end() && pos_it != positives.end())
188
          current_threshold += std::min(*pos_it, *neg_it);
189
        else if (neg_it != negatives.end())
190 191 192 193 194 195 196 197 198 199 200 201 202
          current_threshold += *neg_it;
        else
          current_threshold += *pos_it;
        current_threshold /= 2;
      }
    } // while

    // now, we have reached the end of one list (usually the negatives)
    // so, finally compute predicate for the last time
    current_predicate = predicate(far, frr);
    if (current_predicate < min_predicate){
      min_predicate = current_predicate;
      min_threshold = current_threshold;
André Anjos's avatar
André Anjos committed
203 204
    }

205 206 207 208
    // return the best threshold found
    return min_threshold;
  }

André Anjos's avatar
André Anjos committed
209 210 211 212 213 214
  /**
   * Calculates the threshold that is, as close as possible, to the
   * equal-error-rate (EER) given the input data. The EER should be the point
   * where the FAR equals the FRR. Graphically, this would be equivalent to the
   * intersection between the R.O.C. (or D.E.T.) curves and the identity.
   */
215
  double eerThreshold(const blitz::Array<double,1>& negatives, const blitz::Array<double,1>& positives, bool isSorted = false);
André Anjos's avatar
André Anjos committed
216 217

  /**
218
   * Calculates the equal-error-rate (EER) given the input data, on the ROC
André Anjos's avatar
André Anjos committed
219 220 221
   * Convex Hull, as performed in the Bosaris toolkit.
   * (https://sites.google.com/site/bosaristoolkit/)
   */
222
  double eerRocch(const blitz::Array<double,1>& negatives, const blitz::Array<double,1>& positives);
André Anjos's avatar
André Anjos committed
223 224 225 226 227 228 229 230 231 232 233 234 235 236

  /**
   * Calculates the threshold that minimizes the error rate, given the input
   * data. An optional parameter 'cost' determines the relative importance
   * between false-accepts and false-rejections. This number should be between
   * 0 and 1 and will be clipped to those extremes.
   *
   * The value to minimize becomes:
   *
   * ER_cost = [cost * FAR] + [(1-cost) * FRR]
   *
   * The higher the cost, the higher the importance given to *not* making
   * mistakes classifying negatives/noise/impostors.
   */
237
  double minWeightedErrorRateThreshold(const blitz::Array<double,1>& negatives, const blitz::Array<double,1>& positives, double cost, bool isSorted = false);
André Anjos's avatar
André Anjos committed
238 239 240 241

  /**
   * Calculates the minWeightedErrorRateThreshold() when the cost is 0.5.
   */
242 243
  inline double minHterThreshold(const blitz::Array<double,1>& negatives, const blitz::Array<double,1>& positives, bool isSorted = false) {
    return minWeightedErrorRateThreshold(negatives, positives, 0.5, isSorted);
André Anjos's avatar
André Anjos committed
244 245 246 247 248 249 250 251 252 253 254 255 256
  }

  /**
   * Computes the threshold such that the real FAR is as close as possible
   * to the requested far_value.
   *
   * @param negatives The impostor scores to be used for computing the FAR
   * @param positives The client scores; ignored by this function
   * @param far_value The FAR value where the threshold should be computed
   *
   * @return The computed threshold
   */
  double farThreshold(const blitz::Array<double,1>& negatives,
257
      const blitz::Array<double,1>& positives, double far_value, bool isSorted = false);
André Anjos's avatar
André Anjos committed
258 259 260 261 262 263 264 265 266 267 268 269

  /**
   * Computes the threshold such that the real FRR is as close as possible
   * to the requested frr_value.
   *
   * @param negatives The impostor scores; ignored by this function
   * @param positives The client scores to be used for computing the FRR
   * @param frr_value The FRR value where the threshold should be computed
   *
   * @return The computed threshold
   */
  double frrThreshold(const blitz::Array<double,1>& negatives,
270
      const blitz::Array<double,1>& positives, double frr_value, bool isSorted = false);
André Anjos's avatar
André Anjos committed
271 272 273 274 275 276 277 278 279 280 281 282

  /**
   * Calculates the ROC curve given a set of positive and negative scores and a
   * number of desired points. Returns a two-dimensional blitz::Array of
   * doubles that express the X (FRR) and Y (FAR) coordinates in this order.
   * The points in which the ROC curve are calculated are distributed
   * uniformly in the range [min(negatives, positives), max(negatives,
   * positives)].
   */
  blitz::Array<double,2> roc
    (const blitz::Array<double,1>& negatives,
     const blitz::Array<double,1>& positives, size_t points);
283

André Anjos's avatar
André Anjos committed
284 285 286 287 288 289 290 291 292 293 294 295 296
  /**
   * Calculates the precision-recall curve given a set of positive and negative scores and a
   * number of desired points. Returns a two-dimensional blitz::Array of
   * doubles that express the X (precision) and Y (recall) coordinates in this order.
   * The points in which the curve is calculated are distributed
   * uniformly in the range [min(negatives, positives), max(negatives,
   * positives)].
   */
  blitz::Array<double,2> precision_recall_curve
    (const blitz::Array<double,1>& negatives,
     const blitz::Array<double,1>& positives, size_t points);

  /**
297 298
   * Calculates the ROC Convex Hull (ROCCH) given a set of positive and
   * negative scores and a number of desired points. Returns a
André Anjos's avatar
André Anjos committed
299 300 301 302 303 304 305 306 307 308 309 310
   * two-dimensional blitz::Array of doubles that contain the coordinates
   * of the vertices of the ROC Convex Hull (the first row is for "pmiss"
   * and the second row is for "pfa").
   * Reference: Bosaris toolkit
   * (https://sites.google.com/site/bosaristoolkit/)
   */
  blitz::Array<double,2> rocch
    (const blitz::Array<double,1>& negatives,
     const blitz::Array<double,1>& positives);

  /**
   * Calculates the Equal Error Rate (EER) on the ROC Convex Hull (ROCCH)
311
   * from the 2-row matrices containing the pmiss and pfa vectors
André Anjos's avatar
André Anjos committed
312 313
   * (which is the output of the bob::measure::rocch()).
   * Note: pmiss and pfa contain the coordinates of the vertices of the
314
   *       ROC Convex Hull.
André Anjos's avatar
André Anjos committed
315 316 317 318 319 320 321 322 323 324 325 326 327
   * Reference: Bosaris toolkit
   * (https://sites.google.com/site/bosaristoolkit/)
   */
  double rocch2eer(const blitz::Array<double,2>& pmiss_pfa);

  /**
   * Calculates the ROC curve given a set of positive and negative scores at
   * the given FAR coordinates. Returns a two-dimensional blitz::Array of
   * doubles that express the X (FAR) and Y (CAR) coordinates in this order.
   */
  blitz::Array<double,2> roc_for_far(
      const blitz::Array<double,1>& negatives,
      const blitz::Array<double,1>& positives,
328 329
      const blitz::Array<double,1>& far_list,
      bool isSorted = false);
André Anjos's avatar
André Anjos committed
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388

  /**
   * Returns the Deviate Scale equivalent of a false rejection/acceptance
   * ratio.
   *
   * The algorithm that calculates the deviate scale is based on function
   * ppndf() from the NIST package DETware version 2.1, freely available on the
   * internet. Please consult it for more details.
   */
  double ppndf(double value);

  /**
   * Calculates the DET curve given a set of positive and negative scores and a
   * number of desired points. Returns a two-dimensional blitz::Array of
   * doubles that express on its rows:
   *
   * 0: X axis values in the normal deviate scale for the false-rejections
   * 1: Y axis values in the normal deviate scale for the false-accepts
   *
   * You can plot the results using your preferred tool to first create a plot
   * using rows 0 and 1 from the returned value and then place replace the X/Y
   * axis annotation using a pre-determined set of tickmarks as recommended by
   * NIST.
   *
   * The algorithm that calculates the deviate scale is based on function
   * ppndf() from the NIST package DETware version 2.1, freely available on the
   * internet. Please consult it for more details.
   *
   * By 20.04.2011, you could find such package here:
   * http://www.itl.nist.gov/iad/mig/tools/
   */
  blitz::Array<double,2> det
    (const blitz::Array<double,1>& negatives,
     const blitz::Array<double,1>& positives, size_t points);

  /**
   * Calculates the EPC curve given a set of positive and negative scores and a
   * number of desired points. Returns a two-dimensional blitz::Array of
   * doubles that express the X (cost) and Y (HTER on the test set given the
   * min. HTER threshold on the development set) coordinates in this order.
   * Please note that, in order to calculate the EPC curve, one needs two sets
   * of data comprising a development set and a test set. The minimum weighted
   * error is calculated on the development set and then applied to the test
   * set to evaluate the half-total error rate at that position.
   *
   * The EPC curve plots the HTER on the test set for various values of 'cost'.
   * For each value of 'cost', a threshold is found that provides the minimum
   * weighted error (see minWeightedErrorRateThreshold()) on the development
   * set. Each threshold is consecutively applied to the test set and the
   * resulting HTER values are plotted in the EPC.
   *
   * The cost points in which the EPC curve are calculated are distributed
   * uniformly in the range [0.0, 1.0].
   */
  blitz::Array<double,2> epc
    (const blitz::Array<double,1>& dev_negatives,
     const blitz::Array<double,1>& dev_positives,
     const blitz::Array<double,1>& test_negatives,
     const blitz::Array<double,1>& test_positives,
389 390
     size_t points,
     bool isSorted = false);
André Anjos's avatar
André Anjos committed
391 392 393 394

}}

#endif /* BOB_MEASURE_ERROR_H */