eer_threshold doesn't behave as expected
Created by: siebenkopf
Hi there,
I have found out that the eer_threshold
function (as well as the min_hter_threshold
function) does not perform as expected in case of highly unbalanced scores. Particularly, when I try to use a special low valued error indicator such as -10000
, the returned threshold is garbage. Consider the following example:
positives = [1.]*100 + [-10000.]
negatives = [-1.]*100 + [-10000.]
threshold = bob.measure.eer_threshold(negatives, positives)
print threshold, bob.measure.farfrr(negatives, positives, threshold)
print 0, bob.measure.farfrr(negatives, positives, 0)
The outputs of the two print commands are:
-4999.5 (0.9900990099009901, 0.009900990099009901)
0 (0.0, 0.009900990099009901)
So, the second output, where I estimated a threshold 0
myself, is much more suitable than the first output, which was generated by eer_threshold
(the min_hter_threshold
behaves exactly the same).
I would suggest, we should rethink our implementation of the thresholds. I am pretty sure that there exist smarter ways (I had implemented some during my PhD...)
Cheers Manuel