Take a closer look at the performance of pretrained models (densenet, alexnet) w.r.t. input size

When loading imagenet pretrained weights, these models are typically optimised for image sizes of 224x224, whereas we input images of size 512x512 (probably to match Pasa's model).

It would be interesting to verify if this input size (512x512) does not worsen the training and classification after that.

One possible path for these tests would be to fine-tune several densenet or alexnet networks from different input sizes:

512x512
384x384
256x256
224x224 (original)

And check if that makes a difference.