[LGBPHS] wrong tempfiles path when running on the grid
Hi, I have an issue when running the LGBPHS baseline, e.g.
bob bio vanilla-biometrics pipeline mobio-male lgbphs -vv -l sge
where it fails with the following traceback :
Click to see traceback
Traceback (most recent call last):
File "./bin/bob", line 47, in <module>
sys.exit(bob.extension.scripts.main_cli())
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.bio.base/bob/bio/base/script/vanilla_biometrics.py", line 215, in vanilla_biometrics
**kwargs,
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.bio.base/bob/bio/base/pipelines/vanilla_biometrics/vanilla_biometrics.py", line 143, in execute_vanilla_biometrics
_ = compute_scores(post_processed_scores, dask_client)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.bio.base/bob/bio/base/pipelines/vanilla_biometrics/vanilla_biometrics.py", line 23, in compute_scores
result = result.compute(scheduler=dask_client)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/dask-2.30.0-py3.7.egg/dask/base.py", line 167, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/dask-2.30.0-py3.7.egg/dask/base.py", line 452, in compute
results = schedule(dsk, keys, **kwargs)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/distributed-2.30.1-py3.7.egg/distributed/client.py", line 2725, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/distributed-2.30.1-py3.7.egg/distributed/client.py", line 1992, in gather
asynchronous=asynchronous,
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/distributed-2.30.1-py3.7.egg/distributed/client.py", line 833, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/distributed-2.30.1-py3.7.egg/distributed/utils.py", line 340, in sync
raise exc.with_traceback(tb)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/distributed-2.30.1-py3.7.egg/distributed/utils.py", line 324, in f
result[0] = yield future
File "/idiap/temp/lcolbois/miniconda3/envs/bob_tf2/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/distributed-2.30.1-py3.7.egg/distributed/client.py", line 1851, in _gather
raise exception.with_traceback(traceback)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.bio.base/bob/bio/base/pipelines/vanilla_biometrics/pipelines.py", line 175, in write_scores
return self.score_writer.write(scores)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.bio.base/bob/bio/base/pipelines/vanilla_biometrics/score_writers.py", line 56, in write
return _write(probe_sampleset)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.bio.base/bob/bio/base/pipelines/vanilla_biometrics/score_writers.py", line 35, in _write
if isinstance(probe[0], DelayedSample):
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.pipelines/bob/pipelines/sample.py", line 165, in __getitem__
return self.samples.__getitem__(item)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.pipelines/bob/pipelines/sample.py", line 187, in samples
return self._load()
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/src/bob.bio.base/bob/bio/base/pipelines/vanilla_biometrics/legacy.py", line 364, in _load
return joblib.load("/tmp/" + path)
File "/remote/idiap.svm/temp.biometric03/lcolbois/bob.bio.face/eggs/joblib-0.17.0-py3.7.egg/joblib/numpy_pickle.py", line 577, in load
with open(filename, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp//tmp/tmp1dt8ulg0/scores/uman/m103/02_mobile/m103_02_f12_i0_0uman/m103/01_mobile/m103_01_p01_i0_0_uman/m104/01_mobile/m104_01_p01_i0_0_uman/m106/01_mobile/m106_01_p01_i0_0.joblib'
Looks like there is an error when computing the path of some temporary files. Note that the issue:
- Is specific to LGBPHS (Gabor graph for example works flawlessly)
- Does not happen when running in local
- Does not happen when using checkpointing (
-c
)
I am unsure how to start tracking down the root cause.
ping @tiago.pereira @ydayer