FAR and FRR thresholds are computed even when there is no data support
I have lately come across a situation, where FAR (and FRR) thresholds were computed, although they should not have been.
Imagine the negative score distribution
[0.5, 0.6, 0.7, 0.8, 0.9, 1., 1., 1., 1., 1.]. A threshold should now be computed for
FAR=0.1. Our current implementation of
bob.measure.far_threshold will return the threshold
1. However, this threshold does not give us a false acceptance rate of
0.1, but of
0.5. In fact, there is no (data-driven) threshold that would provide a false acceptance rate of
A similar issue arises, when the number of data points is not sufficient for a given threshold to be computed.
From only 10 data points, you cannot provide a (data-driven) threshold for
FAR=0.05, while our current implementation happily provides one.
There are two possible solutions for this issue. First, we can simply return a threshold that is just slightly higher than the largest negative (or slightly lower than the largest positive when computing FRR threshold). This will indeed provide a solution, but this is not justified by data point and might be arbitrarily wrong, i.e., when applied to other test data.
Instead, we should just return
NaN, since we really cannot compute a justified threshold for the requested FAR or FRR values.