Allows GPU logging in multi-GPU systems; Fixes DRIVE download link

added 1 commit

8adf7474 - Update doc/links.rst

@Txiao: Thanks for this MR!

I think that the correct download link for DRIVE is https://medicine.uiowa.edu/eye/rite-dataset (RITE) dataset. Could you please check and update the link you provided? I'm not comfortable with a GitHub link w/o a proper licensing gateway.

I'll check your patch to the GPU in a while.

changed title from Enable multi GPU information collection to Allows GPU logging in multi-GPU systems; Fixes DRIVE download link

changed the description

added 1 commit

62f555e1 - [utils.resources] Also logs the total number of GPUs in the system

Compare with previous version

@Txiao: The patch you provided indeed helps parsing nvidia-smi in case there are multiple GPUs in the system. What is less clear to me is how you plot the trainload.pdf considering the stats from the GPU you are actually using. At the present patch, this is not included.

I modified the MR slightly, to include the number of GPUs in the system to be captured as a constant at the begin of training. Concerning what I just described, can you please show:

An example output of nvidia-smi for your machine
What does the file model/trainlog.csv after your patch?
How do these changes are reflected on bob/ip/binseg/scripts/train_analysis.py? That is, how do you know to which GPU your memory/resource utilisation refers to?

Following our meeting, we decided to provide the parameter -id=N to nvidia-smi instead of parsing its multi-line output. Then, you must change the ResourceMonitor class to input gpu_id instead of has_gpu with the correct identity or None if no GPU is available.