Computing resources are not being used efficiently
Depending on the dataset used, the GPU usage can be much lower than what we would expect. The plots showing resource usage (CPU/GPU %) are not clean, with large fluctuations at each epoch, and peaks of GPU usage at around 10% only.
What is also strange is that the Montgomery-Shenzhen-Indian dataset has high GPU usage but the Montgomery dataset alone mostly uses the CPU.
Here are some ideas of where things could go wrong:
- Issue with monitoring code
- Monitoring interval not adapted
- Concatenated DataModules behaving differently than single DataModules
- Lightning configurations not being specified in some places, resulting in different "auto" behaviours depending on the number of samples in the DataModule