Picking a "single" threshold during evaluation is hard
Ideally, we should have a flexible threshold selection mechanism:
- If the user provides a floating point number, we apply this to all splits
- If the user provides a split name, we calculate the threshold a priori on that set, then apply it to all the other sets
- If the user provides no input concerning thresholds, then the strategy should be this:
- If there is a split named
train
, then the threshold is calculated on this set, and always applied a posteriori - If there is a split named
validation
, then the threshold is calculated on this set, and always applied a posteriori - For all other splits, we use the
validation
split threshold that was calculated. If novalidation
split is present, we default to half-way between min(labels) and max(labels).
- If there is a split named