Improve logging hook to report CPU and GPU memory and utilisation
Following issues with ssh
/fork
behaviour at Idiap GPU hosts, I implemented a change in my logging hook (largely based on this package's), to incorporate automatic in-process measures of the CPU/GPU memory, model and utilisation. I strongly recommend you do the same in this package (via copy/paste or similar):
https://gitlab.idiap.ch/bob/bob.ip.hed/blob/master/bob/ip/hed/hooks.py
I tested and it produces something like this:
bob.ip.hed.hooks@2018-10-10 07:38:38,999 -- INFO: training 50, loss = 0.67 (0.186 ops/sec, cpu = [98.3, 43.2] %, cpumem = 2.6/29.5 GB, gpu = 100 % (Tesla K80), gpumem = 10940 MiB/11439 MiB)
I hope it helps you not having anymore to ssh into the host to dig those.