Memory issues with big dataset

When running a GMM experiment with a big dataset (Nist-SRE04to16) the pipeline fails with a memory error on a worker.

for example on the branch of !55 (merged):

bob bio pipeline simple -vvv -d nist-sre04to16 -p gmm-voxforge -o results\~/gmm_nist -l sge

I ran with the default Dask sge client as well as the sge-io-big-non-adaptive, asking for 128 nodes (but got only ~60 while running).

The issue seems to happen before reaching the k-means initialization, maybe hinting at an issue in the Dask bags to array wrapping.

I also tried running the experiment with a lower Dask memory limit for each node, forcing the workers to spill their memory to disk early, trying to prevent the memory error if it reached the hard cap. This failed too (the workers effectively spilled to disk but still failed with a memory error).

Local Output and Traceback (Click to expand)
[...]
bob.pipelines.wrappers@2022-05-16 13:04:59,280 -- DEBUG: ToDaskBag(npartitions=128).transform                                                                                                                                                  
bob.pipelines.wrappers@2022-05-16 13:04:59,926 -- DEBUG: Dask|Checkpoint|Sample|Energy_.transform                                                                                                                                              
bob.pipelines.wrappers@2022-05-16 13:04:59,927 -- DEBUG: Dask|Checkpoint|Sample|Cepstra.transform                                                                                                                                              
bob.pipelines.wrappers@2022-05-16 13:04:59,929 -- DEBUG: Dask|Checkpoint|Sample|GMM(con.fit                                                                                                                                                    
bob.pipelines.wrappers@2022-05-16 13:04:59,941 -- DEBUG: Preparing data as dask arrays for fit                                                                                                                                                 
Traceback (most recent call last):                                                                                                                                                                                                             
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/bin/bob", line 10, in <module>                                                                                                                                                          
    sys.exit(main_cli())                                                                                                                                                                                                                       
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1128, in __call__                                                                                                                      
    return self.main(*args, **kwargs)                                                                                                                                                                                                          
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1053, in main                                                                                                                          
    rv = self.invoke(ctx)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1659, in invoke 
    return _process_result(sub_ctx.command.invoke(sub_ctx)) 
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1659, in invoke 
    return _process_result(sub_ctx.command.invoke(sub_ctx)) 
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1659, in invoke 
    return _process_result(sub_ctx.command.invoke(sub_ctx)) 
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 1395, in invoke 
    return ctx.invoke(self.callback, **ctx.params)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/script/pipeline_simple.py", line 276, in pipeline_simple
    execute_pipeline_simple(
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/pipelines/entry_points.py", line 225, in execute_pipeline_simple
    result = pipeline(
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/pipelines/pipelines.py", line 109, in __call__
    self.transformer = self.train_background_model(background_model_samples)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.bio.base/bob/bio/base/pipelines/pipelines.py", line 144, in train_background_model
    return self.transformer.fit(background_model_samples)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/sklearn/pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 881, in fit
    return self._fit_on_dask_array(X, y, **fit_params)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 835, in _fit_on_dask_array
    X, fit_params = self._get_fit_params_from_sample_bags(bags)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 816, in _get_fit_params_from_sample_bags
    X = _array_from_sample_bags(bags, input_attribute, ndim=2)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 693, in _array_from_sample_bags
    lengths, shapes = dask.compute(lengths, shapes)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/dask/base.py", line 573, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/client.py", line 3010, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/client.py", line 2162, in gather
    return self.sync(
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/utils.py", line 311, in sync
    return sync(
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/utils.py", line 378, in sync
    raise exc.with_traceback(tb)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/utils.py", line 351, in f
    result = yield future
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/distributed/client.py", line 2025, in _gather
    raise exception.with_traceback(traceback)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
    return func(*args, **kwargs)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 664, in _sample_attribute
    return [getattr(s, attribute) for s in samples]
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/wrappers.py", line 664, in <listcomp>
    return [getattr(s, attribute) for s in samples]
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/sample.py", line 170, in __getattribute__
    return super().__getattribute__(name)
  File "/remote/idiap.svm/temp.devel01/ydayer/spear_develop/bob.pipelines/bob/pipelines/sample.py", line 188, in data
    return self._load()
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/bob/io/base/__init__.py", line 191, in load
    return open_file(inputs)
  File "/idiap/home/ydayer/miniconda3/envs/spear_bob10/lib/python3.8/site-packages/bob/io/base/__init__.py", line 101, in open_file
    return np.array(f[key])
numpy.core._exceptions.MemoryError: Unable to allocate 7.89 MiB for an array with shape (17226, 60) and data type float64