Cannot cache dataset in parallel under Linux
For some reason that is hard to reproduce in test conditions, running `mednet train` under Linux seems to be triggering some sort of blocking behaviour while instantiating a `mednet.data.datamodule._CachedDataset` (ie. using the `--cache-samples` command-line option). This is reproducible for our stock Python 3.12 environment at least.
Using a multiprocessing context = `spawn` or `forkserver` (instead of the default, which is `fork` on Linux) mitigates the issue and the blocking behaviour does not happen. However, we are soon flooded with `torch_shm` file descriptors (visible under `/proc/<proc-id>/fd/*`), which trigger a second issue regarding resource limits in the machine. The current setting is to allow for a maximum of 1024 file descriptors per process. This number can be manually increased with `ulimit -n 4096`, however we quickly reach OS limits as space (on a protected filesystem?) seems to run out.
When the process needs to open more file descriptors (and it cannot), it issues the following exception: `RuntimeError: received 0 items of ancdata`. Searching for this error on the internet, recipes point to issues in the way pytorch manages the pickling of tensors through the processes, via shared memory pipes. Setting `torch.multiprocessing.set_sharing_strategy("file_system")` does not seem to solve this issue.
The current mitigation strategy is to set `parallel = -1` if trying to use a `_CachedDataset` under Linux. This is done automatically for the user and a WARNING pointing to this ticket is issued. We are now also using `spawn` throughout all platforms, including linux, from now on. Apparently this is going to be [the default behaviour from Python 3.14](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) anyways, due to issues in supporting `fork` in multi-threaded applications.
A test suite was created at `tests/test_dataset.py`, however the issue does not seem to be reproducible in this standalone environment (all tests pass).
issue