FAR and FRR thresholds are computed even when there is no data support
I have lately come across a situation, where FAR (and FRR) thresholds were computed, although they should not have been.
Imagine the negative score distribution [0.5, 0.6, 0.7, 0.8, 0.9, 1., 1., 1., 1., 1.]
. A threshold should now be computed for FAR=0.1
. Our current implementation of bob.measure.far_threshold
will return the threshold 1
. However, this threshold does not give us a false acceptance rate of 0.1
, but of 0.5
. In fact, there is no (data-driven) threshold that would provide a false acceptance rate of 0.1
.
A similar issue arises, when the number of data points is not sufficient for a given threshold to be computed.
From only 10 data points, you cannot provide a (data-driven) threshold for FAR=0.05
, while our current implementation happily provides one.
There are two possible solutions for this issue. First, we can simply return a threshold that is just slightly higher than the largest negative (or slightly lower than the largest positive when computing FRR threshold). This will indeed provide a solution, but this is not justified by data point and might be arbitrarily wrong, i.e., when applied to other test data.
Instead, we should just return NaN
, since we really cannot compute a justified threshold for the requested FAR or FRR values.