I have lately come across a situation, where FAR (and FRR) thresholds were computed, although they should not have been.
Imagine the negative score distribution [0.5, 0.6, 0.7, 0.8, 0.9, 1., 1., 1., 1., 1.]. A threshold should now be computed for FAR=0.1. Our current implementation of bob.measure.far_threshold will return the threshold 1. However, this threshold does not give us a false acceptance rate of 0.1, but of 0.5. In fact, there is no (data-driven) threshold that would provide a false acceptance rate of 0.1.
A similar issue arises, when the number of data points is not sufficient for a given threshold to be computed.
From only 10 data points, you cannot provide a (data-driven) threshold for FAR=0.05, while our current implementation happily provides one.
There are two possible solutions for this issue.
First, we can simply return a threshold that is just slightly higher than the largest negative (or slightly lower than the largest positive when computing FRR threshold). This will indeed provide a solution, but this is not justified by data point and might be arbitrarily wrong, i.e., when applied to other test data.
Instead, we should just return NaN, since we really cannot compute a justified threshold for the requested FAR or FRR values.
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
ping @andre.anjos, @sebastien.marcel, @akomaty Do you have an opinion about this? Do you think it is reasonable to return NaN, when there is no theoretical support for the data, or do you think it would be better to return a threshold that is just slightly higher (or lower for FRR) than the lowest score?
The current implementation is wrong for sure. I came across this issue when trying to plot a detection & identification rate curve with a relatively low number of open-set probes (i.e., with no match in the gallery). I used the bob.measure.plot.detection_identification_curve for the plots. I have attached a plot, which shows in more detail, what is my concern: dir.pdf
I have evaluated a couple of methods (details are not important), where several negative scores for open-set probes were identically, i.e., 1 (since they were probabilities). Thus, there is no real threshold that would provide a low false alarm rate. However, bob.measure.far_threshold was returning 1 for those low false alarm rates. When using this threshold on the in-gallery-probes, a lot of them are classified correctly (as classification includes scores that are exactly at the threshold). Now, as the threshold is identical for all low FAR values, the DIR is also identical, leading to straight lines on the left-hand-side of the attached plot.
When looking at the attached plot, one would assume the alpha=1 would have the best performance, since it has higher DIR for low FAR. However, this is false conclusion, simply because such low false alarm rates cannot be achieved for this classifier. Hence, the correct solution would be to stop the plot at the point, where the line becomes straight.
With my current implementation the related branch, exactly this will happen.
Unfortunately, this has some more implications, which might question some of our older results. For example, when we have only few negative scores, a threshold for low false acceptance rates cannot reliably be estimated and, hence, some of out ROC plots would change. Handling of the NaN values that are now returned by bob.measure.far_threshold need to be done in all of our ROC, DET and related plots.
hi @mguenther, how about throwing a warning when bob.measure.far_threshold cannot compute the threshold and returns NaN? I am using this new implementation and my code stopped working with no error or warning.
Also, why not raise an exception? What is the point of returning NaNs anyways? Nothing is going to work with a NaN. Instead it makes you chase the code to see where the NaN appeared first.
@amohammadi First, I think a warning can be emitted easily. I just didn't want to emit a warning for every time that this happens. Imagine computing an ROC or DIR plot, thresholds cannot be computed for several FAR values. Hence, you would emit this warning several times, and these warnings might over-flood your output.
Second, returning NaN should be the preferred way over raising exceptions, for two reasons.
When plotting with matplotlib, NaN's are just not plotted. Hence, when you want to plot an ROC or a DIR, the plot will automatically stop on the left hand side, i.e., where data is insufficient to compute thresholds for FAR's. We have done that in this paper, for example: https://arxiv.org/pdf/1705.01567.pdf
This code is implemented in C++. A general rule for C++ code is that exceptions should only be raised for fatal errors, which is not the case here. Since this function returns double, a proper way to signal errors is by returning NaN.
I understand that handling NaN values is not very simple. In the python bindings, we might turn the NaN return value into None, if desired. The positive point is that handling None in Python is much easier than NaN, but None is not a proper floating point value anymore.
First, I think a warning can be emitted easily. I just didn't want to emit a warning for every time that this happens.
I understand that is annoying but still it is better than not seeing anything. Also, you may see this warning only two or three times at most for plotting DIR.
In [107]: bob.measure.far_threshold(neg, pos, 0.1)
Out[107]: 8.0 # wrong
In [108]: bob.measure.farfrr(neg, pos, 8.0)
Out[108]: (0.2, 0.0) # wrong
In this example, I am asking for a threshold to get 10% FAR. However when I use the threshold I get 20% FAR. This is wrong.The returned threshold must be `9.0` not `8.0`.
I still don't understand. In the issue I said that you shouldn't be able to have a threshold for 5% FAR (0.05, see above). 50% FAR is fine with 9 points.
I have to double-check the implementation of far_threshold. I might have introduced a <= where there should be only a <.
Oh, so you are only checking in case of low limits ... This is what I did not get. Thanks for explaining.
I thought when you have 9 samples, you could only get threshold from these FAR values: np.arange(9+1)/9 == [ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444, 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.0]. and .50 (50%) is not there.
Yes. For 9 values you can still define a threshold that gives you (at least) 50% FAR. I am checking only the low limits for FAR and FRR.
I am currently checking, why your test code fails. I was hoping that I was through with the implementation of FAR and FRR, but it seems that it is still incorrect.
Yes. For 9 values you can still define a threshold that gives you (at least) 50% FAR. I am checking only the low limits for FAR and FRR.
Yes now I get it. But please note that when you ask for an FAR of 50% you should receive a threshold that gives you at most 50%. This is important. When you are deploying a biometric system, you always need to make sure that your FAR and FRR value is lower than a certain amount not at least that value.