Allows GPU logging in multi-GPU systems; Fixes DRIVE download link
This MR includes the following changes:
- GPU information logging will now work in single or multi-GPU systems (closes #18 (closed))
- The DRIVE download link is fixed (closes #17 (closed))
Merge request reports
Activity
assigned to @andre.anjos
- Resolved by André Anjos
@Txiao: Thanks for this MR!
I think that the correct download link for DRIVE is https://medicine.uiowa.edu/eye/rite-dataset (RITE) dataset. Could you please check and update the link you provided? I'm not comfortable with a GitHub link w/o a proper licensing gateway.
I'll check your patch to the GPU in a while.
added 1 commit
- 62f555e1 - [utils.resources] Also logs the total number of GPUs in the system
@Txiao: The patch you provided indeed helps parsing
nvidia-smi
in case there are multiple GPUs in the system. What is less clear to me is how you plot thetrainload.pdf
considering the stats from the GPU you are actually using. At the present patch, this is not included.I modified the MR slightly, to include the number of GPUs in the system to be captured as a constant at the begin of training. Concerning what I just described, can you please show:
- An example output of
nvidia-smi
for your machine - What does the file
model/trainlog.csv
after your patch? - How do these changes are reflected on
bob/ip/binseg/scripts/train_analysis.py
? That is, how do you know to which GPU your memory/resource utilisation refers to?
Edited by André Anjos- An example output of
Another option would be to change completely to
torch
's native GPU query as documented here: https://pytorch.org/docs/stable/cuda.htmlI am not sure, though, if all required functionality is provided here.
- Resolved by Yannick DAYER
I can run the code on the machine rolf and get a good result, but I can't pass the pipelines here. Does that mean I have bugs in the code?
Edited by Tan Xiao
added 2 commits