Memory issues with big dataset
When running a GMM experiment with a big dataset (Nist-SRE04to16) the pipeline fails with a memory error on a worker.
for example on the branch of !55 (merged):
bob bio pipeline simple -vvv -d nist-sre04to16 -p gmm-voxforge -o results\~/gmm_nist -l sge
I ran with the default Dask sge
client as well as the sge-io-big-non-adaptive
, asking for 128 nodes (but got only ~60 while running).
The issue seems to happen before reaching the k-means initialization, maybe hinting at an issue in the Dask bags to array wrapping.
I also tried running the experiment with a lower Dask memory limit for each node, forcing the workers to spill their memory to disk early, trying to prevent the memory error if it reached the hard cap. This failed too (the workers effectively spilled to disk but still failed with a memory error).
Local Output and Traceback (Click to expand)
[...]
bob.pipelines.wrappers@2022-05-16 13:04:59,280 -- DEBUG: ToDaskBag(npartitions=128).transform
bob.pipelines.wrappers@2022-05-16 13:04:59,926 -- DEBUG: Dask|Checkpoint|Sample|Energy_.transform
bob.pipelines.wrappers@2022-05-16 13:04:59,927 -- DEBUG: Dask|Checkpoint|Sample|Cepstra.transform
bob.pipelines.wrappers@2022-05-16 13:04:59,929 -- DEBUG: Dask|Checkpoint|Sample|GMM(con.fit
bob.pipelines.wrappers@2022-05-16 13:04:59,941 -- DEBUG: Preparing data as dask arrays for fit
Traceback (most recent call last):
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/bin/bob", line 10, in <module>
sys.exit(main_cli())
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/script/pipeline_simple.py", line 276, in pipeline_simple
execute_pipeline_simple(
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/pipelines/entry_points.py", line 225, in execute_pipeline_simple
result = pipeline(
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/pipelines/pipelines.py", line 109, in __call__
self.transformer = self.train_background_model(background_model_samples)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/pipelines/pipelines.py", line 144, in train_background_model
return self.transformer.fit(background_model_samples)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/sklearn/pipeline.py", line 394, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 881, in fit
return self._fit_on_dask_array(X, y, **fit_params)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 835, in _fit_on_dask_array
X, fit_params = self._get_fit_params_from_sample_bags(bags)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 816, in _get_fit_params_from_sample_bags
X = _array_from_sample_bags(bags, input_attribute, ndim=2)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 693, in _array_from_sample_bags
lengths, shapes = dask.compute(lengths, shapes)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/dask/base.py", line 573, in compute
results = schedule(dsk, keys, **kwargs)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/client.py", line 3010, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/client.py", line 2162, in gather
return self.sync(
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/utils.py", line 311, in sync
return sync(
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/utils.py", line 378, in sync
raise exc.with_traceback(tb)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/utils.py", line 351, in f
result = yield future
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/client.py", line 2025, in _gather
raise exception.with_traceback(traceback)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
return func(*args, **kwargs)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 664, in _sample_attribute
return [getattr(s, attribute) for s in samples]
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 664, in <listcomp>
return [getattr(s, attribute) for s in samples]
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/sample.py", line 170, in __getattribute__
return super().__getattribute__(name)
File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/sample.py", line 188, in data
return self._load()
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/bob/io/base/__init__.py", line 191, in load
return open_file(inputs)
File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/bob/io/base/__init__.py", line 101, in open_file
return np.array(f[key])
numpy.core._exceptions.MemoryError: Unable to allocate 7.89 MiB for an array with shape (17226, 60) and data type float64