bob.pipelines issueshttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues2020-12-13T11:51:16Zhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/30CSVBaseSampleLoader does not support delayed metadata2020-12-13T11:51:16ZAmir MOHAMMADICSVBaseSampleLoader does not support delayed metadataSince DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in ...Since DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in a delayed manner because they might not exist and annotaitons might not be used. see https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/e2459dc5784045261ccc25df204a852bb527239e/bob/pipelines/datasets/sample_loaders.py#L60Bob 9.0.0Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/29jman (gridtk) like interface for submitting dask jobs2024-03-21T10:21:20ZAmir MOHAMMADIjman (gridtk) like interface for submitting dask jobsWe need:
1. A command that automatically creates a dask client for us to be used for SGE submission.
2. A history of the commands that were executed.
3. An automatic tracking of dask logs.We need:
1. A command that automatically creates a dask client for us to be used for SGE submission.
2. A history of the commands that were executed.
3. An automatic tracking of dask logs.Bob 9.0.0https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/27The `DelayedSampleCall` makes pipelines memory greedy.2020-11-26T18:06:00ZTiago de Freitas PereiraThe `DelayedSampleCall` makes pipelines memory greedy.The way we delay transformer calls (look https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/bob/pipelines/wrappers.py#L132) makes our pipeline super memory greedy.
I'm running a simple experiment **LOCALLY**, no dask involved, on `...The way we delay transformer calls (look https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/bob/pipelines/wrappers.py#L132) makes our pipeline super memory greedy.
I'm running a simple experiment **LOCALLY**, no dask involved, on `bob.bio.base` wrapping everything with the `CheckpointWrapper`; and my experiment blows 32GB of my RAM + my swap without writing one single file from mine experiment.
Do you have any thoughts on this @amohammadi ?
Do you think is a good call the `DelayedSampleCall`?
Thanks
ping @ydayerBob 9.0.0https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/26DelayedSamples with arbitrary delayed attributes2020-11-23T10:27:22ZAmir MOHAMMADIDelayedSamples with arbitrary delayed attributesI think it is often required that we load some attributes of sample in a lazy manner.
We do this using our DelayedSample class but the problem with that is that it can only delay loading of `data`.
We need a generic implementation that d...I think it is often required that we load some attributes of sample in a lazy manner.
We do this using our DelayedSample class but the problem with that is that it can only delay loading of `data`.
We need a generic implementation that delays the loading of everything like `sample.annotations`.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/24Do not cache data in DelayedSample2020-11-23T10:27:21ZAmir MOHAMMADIDo not cache data in DelayedSampleThis is important as loading DelayedSamples and stacking them in SampleBatch
will lead to the data being kept in the memory twice.
For example, see:
```python
import bob.pipelines as mario
import numpy as np
from functools import partial...This is important as loading DelayedSamples and stacking them in SampleBatch
will lead to the data being kept in the memory twice.
For example, see:
```python
import bob.pipelines as mario
import numpy as np
from functools import partial
a = np.zeros((1000, 1000))
def load(i):
# normally we load an array from disk
return a[i]
samples = [mario.DelayedSample(partial(load, i=i)) for i in range(len(a))]
samples[:2]
# [DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=0)),
# DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=1))]
a2 = np.array(mario.SampleBatch(samples))
np.shares_memory(a, a2)
# False
```
so you can see that SampleBatch always leads to a copy of data and caching data
in delayed samples always leads to doulbe memory usage.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/23What is the purpose of sge_gpu.py2020-12-03T19:06:47ZAmir MOHAMMADIWhat is the purpose of sge_gpu.pyI thought the whole idea of our pipelines was to use resource tags to properly allocate jobs to the correct worker.
But, now I see that we have 2 config files: `sge_default` and `sge_gpu`, why is that?
Is this because resource tags are ...I thought the whole idea of our pipelines was to use resource tags to properly allocate jobs to the correct worker.
But, now I see that we have 2 config files: `sge_default` and `sge_gpu`, why is that?
Is this because resource tags are not known? I think this issue is also relevant to https://gitlab.idiap.ch/bob/bob.bio.base/-/issues/145Bob 9.0.0https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/22Provide mechanism for reading database lists from inside a zip file and a mec...2020-12-04T18:26:20ZAmir MOHAMMADIProvide mechanism for reading database lists from inside a zip file and a mechanism to download themThe filelist databases interfaces are excellent but I think we're lacking two features:
* [x] Reading the filelists from inside a zip file (to save space).
* [x] Automatic downloading of these filelists and saving them in e.g. `~/.bob...The filelist databases interfaces are excellent but I think we're lacking two features:
* [x] Reading the filelists from inside a zip file (to save space).
* [x] Automatic downloading of these filelists and saving them in e.g. `~/.bob` for convenience.
I don't think these file lists should be checked into the source code and I think they should be managed
the same way as we handle deep learning models.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/20Problem while using `sge_default` dask client2020-10-12T11:25:50ZVictor BROSProblem while using `sge_default` dask clientFOr some reason `bob.pipelines.distributed.sge.SGEIdiapJob` is requiring a class variable `config_name`.
I've patched myself to make it work, but this needs a proper fix
```
bob@2020-10-09 14:36:37,427 -- DEBUG: Logging of the `bob' log...FOr some reason `bob.pipelines.distributed.sge.SGEIdiapJob` is requiring a class variable `config_name`.
I've patched myself to make it work, but this needs a proper fix
```
bob@2020-10-09 14:36:37,427 -- DEBUG: Logging of the `bob' logger was set to 3
bob.extension.config@2020-10-09 14:36:37,430 -- DEBUG: Loading configuration file `./experiments/vera-finger/veradb.py'...
bob.extension.config@2020-10-09 14:36:38,765 -- DEBUG: Loading configuration file `./experiments/vera-finger/vera_miura.py'...
bob.bio.base@2020-10-09 14:36:39,001 -- INFO: Using `bob.bio.base` legacy algorithm <class 'bob.bio.vein.algorithm.MiuraMatch'>(ch=80, cw=90, multiple_model_scoring='average', multiple_probe_scoring='average')
bob.extension.config@2020-10-09 14:36:39,002 -- DEBUG: Loading configuration file `./src/bob.pipelines/bob/pipelines/config/distributed/sge_default.py'...
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f92cb83db50>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/distributed-2.30.0-py3.7.egg/distributed/deploy/spec.py:320> exception=ValueError("The class <class 'bob.pipelines.distributed.sge.SGEIdiapJob'> is required to have a 'config_name' class variable.\nIf you have created this class, please add a 'config_name' class variable.\nIf not this may be a bug, feel free to create an issue at: https://github.com/dask/dask-jobqueue/issues/new")>)
Traceback (most recent call last):
File "/idiap/temp/vbros/miniconda3/envs/bob.bio.vein/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/idiap/temp/vbros/miniconda3/envs/bob.bio.vein/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
future.result()
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/distributed-2.30.0-py3.7.egg/distributed/deploy/spec.py", line 348, in _correct_state_internal
worker = cls(self.scheduler.address, **opts)
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/src/bob.pipelines/bob/pipelines/distributed/sge.py", line 56, in __init__
super().__init__(*args, config_name=config_name, death_timeout=10000, **kwargs)
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/dask_jobqueue-0.7.1-py3.7.egg/dask_jobqueue/core.py", line 156, in __init__
default_config_name = self.default_config_name()
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/dask_jobqueue-0.7.1-py3.7.egg/dask_jobqueue/core.py", line 260, in default_config_name
"https://github.com/dask/dask-jobqueue/issues/new".format(cls)
ValueError: The class <class 'bob.pipelines.distributed.sge.SGEIdiapJob'> is required to have a 'config_name' class variable.
```Bob 9.0.0Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/19Dask Client as python resources2020-10-12T14:19:51ZTiago de Freitas PereiraDask Client as python resourcesWe should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.We should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.Bob 9.0.0Yannick DAYERYannick DAYERhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/13do not propogate _ variables when config chain loading2020-10-16T14:14:36ZAmir MOHAMMADIdo not propogate _ variables when config chain loadingThis is to remind me that when we move config chain loading from bob.extension to here.This is to remind me that when we move config chain loading from bob.extension to here.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADI