bob.pipelines issues
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues
2022-05-09T14:20:33Z
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/34
Annoying warnings dask-jobqueue
2022-05-09T14:20:33Z
Tiago de Freitas Pereira
Annoying warnings dask-jobqueue
Since `distributed>2021.x.x` we are having some annoying warnings coming `dask_jobqueue`
```
...lib/python3.8/site-packages/dask_jobqueue/core.py:321: FutureWarning: ignoring was deprecated in version 2021.06.1 and will be removed in a ...
Since `distributed>2021.x.x` we are having some annoying warnings coming `dask_jobqueue`
```
...lib/python3.8/site-packages/dask_jobqueue/core.py:321: FutureWarning: ignoring was deprecated in version 2021.06.1 and will be removed in a future release. Please use contextlib.suppress from the standard library instead.
with ignoring(RuntimeError): # deleting job when job already gone
```
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/25
Better handling of file structures with CheckpointWrapper
2021-08-13T11:13:09Z
Tiago de Freitas Pereira
Better handling of file structures with CheckpointWrapper
Some databases have file structures very flat (with more than 30k files in a directory).
This is not good for the Idiap file structure and can let our I/O super slow.
We should implement a hash function in `CheckpointWrapper.make_path` ...
Some databases have file structures very flat (with more than 30k files in a directory).
This is not good for the Idiap file structure and can let our I/O super slow.
We should implement a hash function in `CheckpointWrapper.make_path` that generates directory names given `sample.key` to limit the number of files in a directory to 1000 files.
ping @lcolbois (this touches experiments with IJB-C)
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/33
Breakdown samplesets
2021-10-29T15:34:56Z
Tiago de Freitas Pereira
Breakdown samplesets
We should have a function on `bob.pipelines` where it takes as input a `SampleSet` with `N` samples and outputs `N` `SampleSets` with 1 `Sample` each.
ping @hotroshi
We should have a function on `bob.pipelines` where it takes as input a `SampleSet` with `N` samples and outputs `N` `SampleSets` with 1 `Sample` each.
ping @hotroshi
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/44
check_parameters_for_validity does not always return the same type
2022-12-06T11:13:39Z
Yannick DAYER
check_parameters_for_validity does not always return the same type
Currently, `bob.pipelines.utils.check_parameters_for_validity` can return ["a list or tuple"](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/utils.py#L117).
This seems weird to return a list **or** a tuple. An...
Currently, `bob.pipelines.utils.check_parameters_for_validity` can return ["a list or tuple"](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/utils.py#L117).
This seems weird to return a list **or** a tuple. And somewhere down the line, we actually expect a list (with a `remove` method).
Could you ensure that this returns a `list` in all cases (and edit the docstring to reflect that)?
André MAYORAZ
André MAYORAZ
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/45
CheckpointWrapper on annotator, saving the original dataset images as well as...
2023-01-27T17:11:33Z
Alain KOMATY
CheckpointWrapper on annotator, saving the original dataset images as well as the annotations - waste of sapce!
Hello,
When choosing to checkpoint in the pipeline, the annotator folder will contain the original images of the dataset instead of the annotations (face landmarks for example). One solution is the wrap a CheckpointWrapper around the an...
Hello,
When choosing to checkpoint in the pipeline, the annotator folder will contain the original images of the dataset instead of the annotations (face landmarks for example). One solution is the wrap a CheckpointWrapper around the annotator. This will save the annotations in the annotator folder, but it will also save the original images, because now it is wrapped twice!
This problem comes from the [_wrap](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/wrappers.py#L1014) function in the [wrappers](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/wrappers.py) module.
Thanks to @cecabert, who pointed that in this function, there is no test whether the `estimator` is already an instance of CheckpointWrapper or not! One possible solution could be as follows (tested it and it is working for my pipelines):
```python
def _wrap(estimator, **kwargs):
# wrap the object and pass the kwargs
for w_class in bases:
valid_params = w_class._get_param_names()
params = {k: kwargs.pop(k) for k in valid_params if k in kwargs}
if estimator is None:
estimator = w_class(**params)
else:
if not isinstance(estimator, w_class):
estimator = w_class(estimator, **params)
return estimator, kwargs
```
Yannick DAYER
Yannick DAYER
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/31
Checkpointwrapper sometimes fails due to temporary disk issues
2021-10-29T15:34:56Z
Amir MOHAMMADI
Checkpointwrapper sometimes fails due to temporary disk issues
It's a good idea to retry a couple of times when saving and loading
It's a good idea to retry a couple of times when saving and loading
Tiago de Freitas Pereira
Tiago de Freitas Pereira
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/10
Conflicts ad eternum
2020-04-17T12:31:42Z
Tiago de Freitas Pereira
Conflicts ad eternum
Hi guys,
I'm facing some problems with our CI and I need some light.
It's been a while that I'm having some enigmatic conflicts, check here one example.
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/195496/raw
`bdt build` doesn't...
Hi guys,
I'm facing some problems with our CI and I need some light.
It's been a while that I'm having some enigmatic conflicts, check here one example.
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/195496/raw
`bdt build` doesn't work at all on the CI and on my computers.
I'm wondering if it's something with Numpy, because of this message at the logs:
```sh
Attempting to finalize metadata for bob.pipelines
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Adding .* to spec 'numpy 1.16.6' to ensure satisfiability. Please consider putting {{ var_name }}.* or some relational operator (>/</>=/<=) on this spec in meta.yaml, or if req is also a build req, using {{ pin_compatible() }} jinja2 function instead. See https://conda.io/docs/user-guide/tasks/build-packages/variants.html#pinning-at-the-variant-level
WARNING conda_build.utils:ensure_valid_spec(1749): Adding .* to spec 'numpy 1.16.6' to ensure satisfiability. Please consider putting {{ var_name }}.* or some relational operator (>/</>=/<=) on this spec in meta.yaml, or if req is also a build req, using {{ pin_compatible() }} jinja2 function instead. See https://conda.io/docs/user-guide/tasks/build-packages/variants.html#pinning-at-the-variant-level
INFO:bob.devtools.scripts.build@2020-04-17 09:24:21,641: Building bob.pipelines-0.0.1b0-py37 (build: 12) for linux-64
INFO:bob.devtools.bootstrap@2020-04-17 09:24:21,641: environ["BOB_BUILD_NUMBER"] = 12
/scratch/builds/bob/bob.pipelines/miniconda/lib/python3.7/site-packages/conda_build/environ.py:427: UserWarning: The environment variable 'DOCSERVER' is being passed through with value 'http://www.idiap.ch'. If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
UserWarning
/scratch/builds/bob/bob.pipelines/miniconda/lib/python3.7/site-packages/conda_build/environ.py:427: UserWarning: The environment variable 'NOSE_EVAL_ATTR' is being passed through with value ''. If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
UserWarning
BUILD START: ['bob.pipelines-0.0.1b0-py37h9f5372d_12.conda']
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
```
However, if I try to install the `bob.pipelines` setup, everything works fine.
```sh
conda install bob-devel==2020.03.30 dask dask-jobqueue bob.extension bob.io.base -c https://www.idiap.ch/software/bob/conda/label/beta/ --dry-run
```
Can I have some light?
Thanks
ping @andre.anjos @amohammadi
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/30
CSVBaseSampleLoader does not support delayed metadata
2020-12-13T11:51:16Z
Amir MOHAMMADI
CSVBaseSampleLoader does not support delayed metadata
Since DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in ...
Since DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in a delayed manner because they might not exist and annotaitons might not be used. see https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/e2459dc5784045261ccc25df204a852bb527239e/bob/pipelines/datasets/sample_loaders.py#L60
Bob 9.0.0
Tiago de Freitas Pereira
Tiago de Freitas Pereira
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/3
Dask Adaptative scheduling
2020-04-17T11:45:21Z
Tiago de Freitas Pereira
Dask Adaptative scheduling
https://distributed.dask.org/en/latest/_modules/distributed/deploy/adaptive.html
ping @andre.anjos @amohammadi
https://distributed.dask.org/en/latest/_modules/distributed/deploy/adaptive.html
ping @andre.anjos @amohammadi
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/19
Dask Client as python resources
2020-10-12T14:19:51Z
Tiago de Freitas Pereira
Dask Client as python resources
We should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.
We should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.
Bob 9.0.0
Yannick DAYER
Yannick DAYER
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/46
Dask Client configuration not available in installed package
2023-03-28T12:52:29Z
Yannick DAYER
Dask Client configuration not available in installed package
When using the `bob bio simple` commands, the `dask.client` entry-points are not available.
Doing `bob bio pipeline simple -H conf.py` outputs in `conf.py`:
``` python
# ----------8<----------
# dask_client = single-threaded
"""Option...
When using the `bob bio simple` commands, the `dask.client` entry-points are not available.
Doing `bob bio pipeline simple -H conf.py` outputs in `conf.py`:
``` python
# ----------8<----------
# dask_client = single-threaded
"""Optional parameter: dask_client (--dask-client, -l) [default: single-threaded]
Dask client for the execution of the pipeline. Can be a `dask.client' entry point, a module name, or a path to a Python file which contains a variable named `dask_client'.
Registered entries are: []"""
# ----------8<----------
```
Tried with the package installed from conda beta; also tried with `pip install -e`.
Entry points in bob.bio.base and bob.bio.face are working. So I presume it's an issue with how we do it in this package (maybe a wrong name for the entry-point group?).
Yannick DAYER
Yannick DAYER
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/37
dask_jobqueue > 0.7.2 changed the API...
2021-11-30T18:25:54Z
Tiago de Freitas Pereira
dask_jobqueue > 0.7.2 changed the API...
.. and this is breaking `SGEIdiapJob`
`**kwargs` was removed from here
https://github.com/dask/dask-jobqueue/blob/0.7.2/dask_jobqueue/core.py#L132
.. and this is breaking `SGEIdiapJob`
`**kwargs` was removed from here
https://github.com/dask/dask-jobqueue/blob/0.7.2/dask_jobqueue/core.py#L132
Tiago de Freitas Pereira
Tiago de Freitas Pereira
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/5
Dask Mixin classes for pipelines
2020-03-23T17:21:51Z
Amir MOHAMMADI
Dask Mixin classes for pipelines
It would be a good idea to have mixin classes to turn transformers dask aware.
This was proposed by @andre.anjos. @tiago.pereira and I have discussed this and see the comments for what we came up with.
It would be a good idea to have mixin classes to turn transformers dask aware.
This was proposed by @andre.anjos. @tiago.pereira and I have discussed this and see the comments for what we came up with.
Amir MOHAMMADI
Amir MOHAMMADI
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/26
DelayedSamples with arbitrary delayed attributes
2020-11-23T10:27:22Z
Amir MOHAMMADI
DelayedSamples with arbitrary delayed attributes
I think it is often required that we load some attributes of sample in a lazy manner.
We do this using our DelayedSample class but the problem with that is that it can only delay loading of `data`.
We need a generic implementation that d...
I think it is often required that we load some attributes of sample in a lazy manner.
We do this using our DelayedSample class but the problem with that is that it can only delay loading of `data`.
We need a generic implementation that delays the loading of everything like `sample.annotations`.
Bob 9.0.0
Amir MOHAMMADI
Amir MOHAMMADI
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/35
doctests fail with the new version of xarray
2021-10-29T15:34:56Z
Amir MOHAMMADI
doctests fail with the new version of xarray
We build with `xarray 0.18.0` and doctests pass there, but when tested with `xarray 0.19.0` which is the latest in the defaults channel, the doctests fail.
Since I cannot create a doctest that works with both of them, I am suggesting to ...
We build with `xarray 0.18.0` and doctests pass there, but when tested with `xarray 0.19.0` which is the latest in the defaults channel, the doctests fail.
Since I cannot create a doctest that works with both of them, I am suggesting to pin xarray till the next minor version (instead of next major version).
See: https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/243826
Amir MOHAMMADI
Amir MOHAMMADI
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/24
Do not cache data in DelayedSample
2020-11-23T10:27:21Z
Amir MOHAMMADI
Do not cache data in DelayedSample
This is important as loading DelayedSamples and stacking them in SampleBatch
will lead to the data being kept in the memory twice.
For example, see:
```python
import bob.pipelines as mario
import numpy as np
from functools import partial...
This is important as loading DelayedSamples and stacking them in SampleBatch
will lead to the data being kept in the memory twice.
For example, see:
```python
import bob.pipelines as mario
import numpy as np
from functools import partial
a = np.zeros((1000, 1000))
def load(i):
# normally we load an array from disk
return a[i]
samples = [mario.DelayedSample(partial(load, i=i)) for i in range(len(a))]
samples[:2]
# [DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=0)),
# DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=1))]
a2 = np.array(mario.SampleBatch(samples))
np.shares_memory(a, a2)
# False
```
so you can see that SampleBatch always leads to a copy of data and caching data
in delayed samples always leads to doulbe memory usage.
Bob 9.0.0
Amir MOHAMMADI
Amir MOHAMMADI
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/13
do not propogate _ variables when config chain loading
2020-10-16T14:14:36Z
Amir MOHAMMADI
do not propogate _ variables when config chain loading
This is to remind me that when we move config chain loading from bob.extension to here.
This is to remind me that when we move config chain loading from bob.extension to here.
Bob 9.0.0
Amir MOHAMMADI
Amir MOHAMMADI
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/41
Expanding samples on the fly
2022-05-31T13:53:53Z
Anjith GEORGE
anjith.george@idiap.ch
Expanding samples on the fly
I have a usecase where I want to expand one image to multiple images in the `pipelinesimple`. Essentially, I can create
a transformer which can operate directly on samples, which takes in a `SampleSet` with one image and return a `Sampl...
I have a usecase where I want to expand one image to multiple images in the `pipelinesimple`. Essentially, I can create
a transformer which can operate directly on samples, which takes in a `SampleSet` with one image and return a `SampleSet` with `n` samples.
I managed to make it work, when every thing is in memory (with `-m` option), and checkpointing is a problem since there are new samples. I can create new keys on the fly, what do you think is the best way to go about this.
@amohammadi @tiago.pereira @ydayer
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/8
Follow-up from "WIP: Make scikit operations daskable"
2020-05-05T07:39:23Z
Tiago de Freitas Pereira
Follow-up from "WIP: Make scikit operations daskable"
The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50104): (+3 comments)
> All these dynamic object creation is going to m...
The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50104): (+3 comments)
> All these dynamic object creation is going to make debugging a hell, wouldn't it? Could you print a traceback here when something fails in sklearn estimator?
https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/7
Follow-up from "WIP: Make scikit operations daskable"
2020-05-05T07:39:10Z
Tiago de Freitas Pereira
Follow-up from "WIP: Make scikit operations daskable"
The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50099): (+2 comments)
> `features_dir` was optional, please revert this.
The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50099): (+2 comments)
> `features_dir` was optional, please revert this.
Tiago de Freitas Pereira
Tiago de Freitas Pereira