This implements additional improvements in the Bob 9 FRGC implementation.
memory_demandingto True for FRGC
- Implement a hash trick for the checkpointing (some folders in the FRGC Idiap resource contain >10k files), this makes the runs faster overall
- Add a
listing.csvfile in the tarfile, which contains the full list of files in the database + the metadata for each file. This makes it easy to create new protocols, or to read metadata from files not in the currently implemented protocols, by simply loading this listing with Pandas. N.B. :
- This listing does NOT include the 3D files contained in the database
- It is quite hard to find full & explicit documentation on the content of FRGC 2.0, so I am not 100% sure that I got every file, I had to kind of explore the available XML files. In particular, there are some JPG files for which I was completely unable to find annotations, so those are not included in the listing. At least, this listing now contains annotations for files used in MIPGAN that were not used yet in the implemented protocols (which is what I needed in the first place)
What this does NOT fix
- Legacy baselines still gets stuck at the
write_scoresstage (takes forever). Note that this might still potentially be linked to overcrowded folders. Indeed, even when adding a
hash_fnto FRGCDatabase, it does not currently impact the checkpointing behaviour of legacy BioAlgorithm. Do we want to try fix that or should we consider that it is not very meaningful to run legacy baselines on FRGC ?
- Running Inception-Resnet pipelines on FRGC still leads to MemoryError. This can be solved by running on the
sgpuqueue, though. Is that enough for us ?