Take a closer look at the performance of pretrained models (densenet, alexnet) w.r.t. input size
When loading imagenet pretrained weights, these models are typically optimised for image sizes of 224x224, whereas we input images of size 512x512 (probably to match Pasa's model).
It would be interesting to verify if this input size (512x512) does not worsen the training and classification after that.
One possible path for these tests would be to fine-tune several densenet or alexnet networks from different input sizes:
- 512x512
- 384x384
- 256x256
- 224x224 (original)
And check if that makes a difference.