Saliency map interpretability via IoU and IoDA use a single bounding box and could be improved
If using proportional energy resolves this, we should then just drop the calculation of IoU and IoDA!
Here are the notes of discussions with @ogueler:
I'm looking at the saliencymap_evaluator.py, and specifically to how IoU and IoDA are computed. In the function called > process_target_class(), you "calculate bounding-boxES for the largest connected component". By the end of the process, the > function comes out with a single, overall bounding box for the largest connected component. The question: why does the function > say "bounding-boxES" when it computes a single box? Moreover, at this point, you know the number N of ground-truth bounding-boxes > available. So why not picking the N largest connected components instead and focusing on only one?
With a single (large) connected component, we end-up biasing the IoU and the IoDA, right?
I mean, instead of multiple.
Answer:
Yes, at first it would make more sense to do so. I had just an initial implementation and I didn't pursue IoU and IoDA further, > and it remained that way.
I also thought that introducing multiple such N components would even introduce more imprecise artifacts or noise, which wouldn't > really be a good metric overall. I thought focussing on the most important region would be a good trade-off between this noise and > a somewhat usable metric. That is why I focussed on just 1 ground truth bounding box, but this approach is also a little bit > "dishonest", or not representing the whole "picture" (metaphorically).
Then I continued:
And then, of course, there is the question: is there a benefit to using bounding boxes (of the connected components) instead of > > the actual connected components for estimating the IoU and IoDA? We may again be introducing bias?
Answer:
Yes, that is why I shifted towards proportional energy. In my opinion, it does what a version of IoU or IoDA would do without the > detected boxes, but in a better way.