Computing resources are not being used efficiently

Depending on the dataset used, the GPU usage can be much lower than what we would expect. The plots showing resource usage (CPU/GPU %) are not clean, with large fluctuations at each epoch, and peaks of GPU usage at around 10% only.

What is also strange is that the Montgomery-Shenzhen-Indian dataset has high GPU usage but the Montgomery dataset alone mostly uses the CPU.

Here are some ideas of where things could go wrong:

  • Issue with monitoring code
  • Monitoring interval not adapted
  • Concatenated DataModules behaving differently than single DataModules
  • Lightning configurations not being specified in some places, resulting in different "auto" behaviours depending on the number of samples in the DataModule