Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • bob.measure bob.measure
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 6
    • Issues 6
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Terraform modules
    • Model experiments
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • bobbob
  • bob.measurebob.measure
  • Issues
  • #27

FAR and FRR thresholds are computed even when there is no data support

I have lately come across a situation, where FAR (and FRR) thresholds were computed, although they should not have been. Imagine the negative score distribution [0.5, 0.6, 0.7, 0.8, 0.9, 1., 1., 1., 1., 1.]. A threshold should now be computed for FAR=0.1. Our current implementation of bob.measure.far_threshold will return the threshold 1. However, this threshold does not give us a false acceptance rate of 0.1, but of 0.5. In fact, there is no (data-driven) threshold that would provide a false acceptance rate of 0.1.

A similar issue arises, when the number of data points is not sufficient for a given threshold to be computed. From only 10 data points, you cannot provide a (data-driven) threshold for FAR=0.05, while our current implementation happily provides one.

There are two possible solutions for this issue. First, we can simply return a threshold that is just slightly higher than the largest negative (or slightly lower than the largest positive when computing FRR threshold). This will indeed provide a solution, but this is not justified by data point and might be arbitrarily wrong, i.e., when applied to other test data.

Instead, we should just return NaN, since we really cannot compute a justified threshold for the requested FAR or FRR values.

Assignee
Assign to
Time tracking