bob.pipelines issueshttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues2020-12-13T11:51:16Zhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/30CSVBaseSampleLoader does not support delayed metadata2020-12-13T11:51:16ZAmir MOHAMMADICSVBaseSampleLoader does not support delayed metadataSince DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in ...Since DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in a delayed manner because they might not exist and annotaitons might not be used. see https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/e2459dc5784045261ccc25df204a852bb527239e/bob/pipelines/datasets/sample_loaders.py#L60Bob 9.0.0Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/29jman (gridtk) like interface for submitting dask jobs2024-03-21T10:21:20ZAmir MOHAMMADIjman (gridtk) like interface for submitting dask jobsWe need:
1. A command that automatically creates a dask client for us to be used for SGE submission.
2. A history of the commands that were executed.
3. An automatic tracking of dask logs.We need:
1. A command that automatically creates a dask client for us to be used for SGE submission.
2. A history of the commands that were executed.
3. An automatic tracking of dask logs.Bob 9.0.0https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/28Logging level does not propogate to workers2020-11-27T18:16:28ZAmir MOHAMMADILogging level does not propogate to workersWhen I run a script in verbose mode with dask, the debug logs are not printed in worker logs.When I run a script in verbose mode with dask, the debug logs are not printed in worker logs.Bob 9.0.0Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/27The `DelayedSampleCall` makes pipelines memory greedy.2020-11-26T18:06:00ZTiago de Freitas PereiraThe `DelayedSampleCall` makes pipelines memory greedy.The way we delay transformer calls (look https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/bob/pipelines/wrappers.py#L132) makes our pipeline super memory greedy.
I'm running a simple experiment **LOCALLY**, no dask involved, on `...The way we delay transformer calls (look https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/bob/pipelines/wrappers.py#L132) makes our pipeline super memory greedy.
I'm running a simple experiment **LOCALLY**, no dask involved, on `bob.bio.base` wrapping everything with the `CheckpointWrapper`; and my experiment blows 32GB of my RAM + my swap without writing one single file from mine experiment.
Do you have any thoughts on this @amohammadi ?
Do you think is a good call the `DelayedSampleCall`?
Thanks
ping @ydayerBob 9.0.0https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/26DelayedSamples with arbitrary delayed attributes2020-11-23T10:27:22ZAmir MOHAMMADIDelayedSamples with arbitrary delayed attributesI think it is often required that we load some attributes of sample in a lazy manner.
We do this using our DelayedSample class but the problem with that is that it can only delay loading of `data`.
We need a generic implementation that d...I think it is often required that we load some attributes of sample in a lazy manner.
We do this using our DelayedSample class but the problem with that is that it can only delay loading of `data`.
We need a generic implementation that delays the loading of everything like `sample.annotations`.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/24Do not cache data in DelayedSample2020-11-23T10:27:21ZAmir MOHAMMADIDo not cache data in DelayedSampleThis is important as loading DelayedSamples and stacking them in SampleBatch
will lead to the data being kept in the memory twice.
For example, see:
```python
import bob.pipelines as mario
import numpy as np
from functools import partial...This is important as loading DelayedSamples and stacking them in SampleBatch
will lead to the data being kept in the memory twice.
For example, see:
```python
import bob.pipelines as mario
import numpy as np
from functools import partial
a = np.zeros((1000, 1000))
def load(i):
# normally we load an array from disk
return a[i]
samples = [mario.DelayedSample(partial(load, i=i)) for i in range(len(a))]
samples[:2]
# [DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=0)),
# DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=1))]
a2 = np.array(mario.SampleBatch(samples))
np.shares_memory(a, a2)
# False
```
so you can see that SampleBatch always leads to a copy of data and caching data
in delayed samples always leads to doulbe memory usage.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/23What is the purpose of sge_gpu.py2020-12-03T19:06:47ZAmir MOHAMMADIWhat is the purpose of sge_gpu.pyI thought the whole idea of our pipelines was to use resource tags to properly allocate jobs to the correct worker.
But, now I see that we have 2 config files: `sge_default` and `sge_gpu`, why is that?
Is this because resource tags are ...I thought the whole idea of our pipelines was to use resource tags to properly allocate jobs to the correct worker.
But, now I see that we have 2 config files: `sge_default` and `sge_gpu`, why is that?
Is this because resource tags are not known? I think this issue is also relevant to https://gitlab.idiap.ch/bob/bob.bio.base/-/issues/145Bob 9.0.0https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/22Provide mechanism for reading database lists from inside a zip file and a mec...2020-12-04T18:26:20ZAmir MOHAMMADIProvide mechanism for reading database lists from inside a zip file and a mechanism to download themThe filelist databases interfaces are excellent but I think we're lacking two features:
* [x] Reading the filelists from inside a zip file (to save space).
* [x] Automatic downloading of these filelists and saving them in e.g. `~/.bob...The filelist databases interfaces are excellent but I think we're lacking two features:
* [x] Reading the filelists from inside a zip file (to save space).
* [x] Automatic downloading of these filelists and saving them in e.g. `~/.bob` for convenience.
I don't think these file lists should be checked into the source code and I think they should be managed
the same way as we handle deep learning models.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/20Problem while using `sge_default` dask client2020-10-12T11:25:50ZVictor BROSProblem while using `sge_default` dask clientFOr some reason `bob.pipelines.distributed.sge.SGEIdiapJob` is requiring a class variable `config_name`.
I've patched myself to make it work, but this needs a proper fix
```
bob@2020-10-09 14:36:37,427 -- DEBUG: Logging of the `bob' log...FOr some reason `bob.pipelines.distributed.sge.SGEIdiapJob` is requiring a class variable `config_name`.
I've patched myself to make it work, but this needs a proper fix
```
bob@2020-10-09 14:36:37,427 -- DEBUG: Logging of the `bob' logger was set to 3
bob.extension.config@2020-10-09 14:36:37,430 -- DEBUG: Loading configuration file `./experiments/vera-finger/veradb.py'...
bob.extension.config@2020-10-09 14:36:38,765 -- DEBUG: Loading configuration file `./experiments/vera-finger/vera_miura.py'...
bob.bio.base@2020-10-09 14:36:39,001 -- INFO: Using `bob.bio.base` legacy algorithm <class 'bob.bio.vein.algorithm.MiuraMatch'>(ch=80, cw=90, multiple_model_scoring='average', multiple_probe_scoring='average')
bob.extension.config@2020-10-09 14:36:39,002 -- DEBUG: Loading configuration file `./src/bob.pipelines/bob/pipelines/config/distributed/sge_default.py'...
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f92cb83db50>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/distributed-2.30.0-py3.7.egg/distributed/deploy/spec.py:320> exception=ValueError("The class <class 'bob.pipelines.distributed.sge.SGEIdiapJob'> is required to have a 'config_name' class variable.\nIf you have created this class, please add a 'config_name' class variable.\nIf not this may be a bug, feel free to create an issue at: https://github.com/dask/dask-jobqueue/issues/new")>)
Traceback (most recent call last):
File "/idiap/temp/vbros/miniconda3/envs/bob.bio.vein/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/idiap/temp/vbros/miniconda3/envs/bob.bio.vein/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
future.result()
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/distributed-2.30.0-py3.7.egg/distributed/deploy/spec.py", line 348, in _correct_state_internal
worker = cls(self.scheduler.address, **opts)
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/src/bob.pipelines/bob/pipelines/distributed/sge.py", line 56, in __init__
super().__init__(*args, config_name=config_name, death_timeout=10000, **kwargs)
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/dask_jobqueue-0.7.1-py3.7.egg/dask_jobqueue/core.py", line 156, in __init__
default_config_name = self.default_config_name()
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/dask_jobqueue-0.7.1-py3.7.egg/dask_jobqueue/core.py", line 260, in default_config_name
"https://github.com/dask/dask-jobqueue/issues/new".format(cls)
ValueError: The class <class 'bob.pipelines.distributed.sge.SGEIdiapJob'> is required to have a 'config_name' class variable.
```Bob 9.0.0Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/19Dask Client as python resources2020-10-12T14:19:51ZTiago de Freitas PereiraDask Client as python resourcesWe should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.We should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.Bob 9.0.0Yannick DAYERYannick DAYERhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/13do not propogate _ variables when config chain loading2020-10-16T14:14:36ZAmir MOHAMMADIdo not propogate _ variables when config chain loadingThis is to remind me that when we move config chain loading from bob.extension to here.This is to remind me that when we move config chain loading from bob.extension to here.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/43Remove Bob Extension as dependency2024-01-08T13:57:32ZAndré MAYORAZRemove Bob Extension as dependencyBob extension has to be removed. It is used at three places in this package :
- [x] bob.pipelines/doc/conf.py
- To load the association list for the packages for intersphinx. We have to see if it is better to do the association between ...Bob extension has to be removed. It is used at three places in this package :
- [x] bob.pipelines/doc/conf.py
- To load the association list for the packages for intersphinx. We have to see if it is better to do the association between the package and their URL directly or any other solution.
- [x] bob.pipelines/src/bob/pipelines/distributed/sge.py
- To load the rc configuration. This can be replaced by the [exposed](https://gitlab.idiap.ch/bob/exposed) package
- [x] bob.pipelines/src/bob/pipelines/datasets.py
- To list the files and folders inside a folder or a tarball and search for files either in a file structure or in a tarball.Roadmap to the major version of Bob 12André MAYORAZAndré MAYORAZhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/42Switch to new CI/CD configuration2023-01-02T14:21:40ZYannick DAYERSwitch to new CI/CD configurationWe need to adapt this package to the new CI/CD and package format using citools:
- [x] Modify `pyproject.toml`:
- [x] Add information from `setup.py`,
- [x] Add version from `version.txt`,
- [x] Add requirements from `requir...We need to adapt this package to the new CI/CD and package format using citools:
- [x] Modify `pyproject.toml`:
- [x] Add information from `setup.py`,
- [x] Add version from `version.txt`,
- [x] Add requirements from `requirements.txt` and `conda/meta.yaml`,
- [x] Empty `setup.py`:
- Leave the call to `setup()` for compatibility,
- [x] Remove `version.txt`,
- [x] Remove `requirements.txt`,
- [x] Modify `conda/meta.yaml`,
- [x] Import data from `pyproject.toml` (`name`, `version`, ...),
- [x] Add the `source.path` field with value `..`,
- [x] Add the `build.noarch` field with value `python`,
- [x] Edit the `build.script` to only contain `"{{ PYTHON }} -m pip install {{ SRC_DIR }} -vv"`,
- [x] Remove test and documentation commands and comments,
- [x] Modify `.gitlab-ci.yml` to point to citools' `python.yml`,
- Use the fields format instead of the URL,
- [x] Move files to follow the `src` layout:
- [x] the whole `bob` folder to `src/bob/`,
- [x] all the tests in `tests/`,
- [x] the test data files in `tests/data`,
- [x] Edit the tests to load the data correctly, either with `os.path.join(os.path.basename(__file__), "data/xxx.txt")` or `pkg_resources.resource_filename(__name__, "data/xxx.txt")`,
- [x] Activate the `packages` option in `settings -> general -> visibility` in the Gitlab project,
- [x] Edit the latest doc badges to point to the `sphinx` directory in `doc/[...]/master`:
- [x] in README.md,
- [x] in the GitLab project settings,
- [x] Edit the coverage badges to point to the doc's coverage directory:
- [x] in README.md,
- [x] in the GitLab project settings,
- [x] Ensure the CI pipeline passes.
You can look at [bob.learn.em](https://gitlab.idiap.ch/bob/bob.learn.em) for an example of a ported package.Roadmap to the major version of Bob 12https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/36Job Failed #2473682021-10-18T15:42:20ZAmir MOHAMMADIJob Failed #247368Job [#247368](https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/247368) failed for 3dab034b96d96a9f95435639b9c892171fb50ac4:
```
+ sphinx-build -aEW /scratch/builds/bob/bob.pipelines/conda/../doc /scratch/builds/bob/bob.pipelines/conda/....Job [#247368](https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/247368) failed for 3dab034b96d96a9f95435639b9c892171fb50ac4:
```
+ sphinx-build -aEW /scratch/builds/bob/bob.pipelines/conda/../doc /scratch/builds/bob/bob.pipelines/conda/../sphinx
Running Sphinx v4.2.0
Adding intersphinx source for `python': https://docs.python.org/3.8/
Adding intersphinx source for `numpy': https://numpy.org/doc/1.21/
Adding intersphinx source for `setuptools': https://setuptools.readthedocs.io/en/latest/
Adding intersphinx source for `scikit-learn': https://scikit-learn.org/stable/
Adding intersphinx source for `dask': https://docs.dask.org/en/latest/
Adding intersphinx source for `dask-jobqueue': https://jobqueue.dask.org/en/latest/
Adding intersphinx source for `distributed': https://distributed.dask.org/en/latest/
Adding intersphinx source for `xarray': https://xarray.pydata.org/en/stable/
Found documentation for bob.extension on http://www.idiap.ch/software/bob/docs/bob/bob.extension/master/; adding intersphinx source
Found documentation for bob.io.base on http://www.idiap.ch/software/bob/docs/bob/bob.io.base/master/; adding intersphinx source
Found documentation for bob.db.base on http://www.idiap.ch/software/bob/docs/bob/bob.db.base/master/; adding intersphinx source
[autosummary] generating autosummary for: checkpoint.rst, dask.rst, index.rst, py_api.rst, sample.rst, xarray.rst
loading intersphinx inventory from https://docs.python.org/3.8/objects.inv...
loading intersphinx inventory from https://numpy.org/doc/1.21/objects.inv...
loading intersphinx inventory from https://setuptools.readthedocs.io/en/latest/objects.inv...
loading intersphinx inventory from https://scikit-learn.org/stable/objects.inv...
loading intersphinx inventory from https://docs.dask.org/en/latest/objects.inv...
loading intersphinx inventory from https://jobqueue.dask.org/en/latest/objects.inv...
loading intersphinx inventory from https://distributed.dask.org/en/latest/objects.inv...
loading intersphinx inventory from https://xarray.pydata.org/en/stable/objects.inv...
loading intersphinx inventory from http://www.idiap.ch/software/bob/docs/bob/bob.extension/master/objects.inv...
loading intersphinx inventory from http://www.idiap.ch/software/bob/docs/bob/bob.io.base/master/objects.inv...
loading intersphinx inventory from http://www.idiap.ch/software/bob/docs/bob/bob.db.base/master/objects.inv...
intersphinx inventory has moved: https://setuptools.readthedocs.io/en/latest/objects.inv -> https://setuptools.pypa.io/en/latest/objects.inv
building [mo]: all of 0 po files
building [html]: all source files
updating environment: [new config] 6 added, 0 changed, 0 removed
reading sources... [ 16%] checkpoint
reading sources... [ 33%] dask
reading sources... [ 50%] index
reading sources... [ 66%] py_api
reading sources... [ 83%] sample
reading sources... [100%] xarray
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [ 16%] checkpoint
writing output... [ 33%] dask
writing output... [ 50%] index
writing output... [ 66%] py_api
writing output... [ 83%] sample
writing output... [100%] xarray
Warning, treated as error:
/scratch/builds/bob/bob.pipelines/doc/dask.rst:101:unknown document: dask:setup/adaptive
```Conda-forge migrationAmir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/46Dask Client configuration not available in installed package2023-03-28T12:52:29ZYannick DAYERDask Client configuration not available in installed packageWhen using the `bob bio simple` commands, the `dask.client` entry-points are not available.
Doing `bob bio pipeline simple -H conf.py` outputs in `conf.py`:
``` python
# ----------8<----------
# dask_client = single-threaded
"""Option...When using the `bob bio simple` commands, the `dask.client` entry-points are not available.
Doing `bob bio pipeline simple -H conf.py` outputs in `conf.py`:
``` python
# ----------8<----------
# dask_client = single-threaded
"""Optional parameter: dask_client (--dask-client, -l) [default: single-threaded]
Dask client for the execution of the pipeline. Can be a `dask.client' entry point, a module name, or a path to a Python file which contains a variable named `dask_client'.
Registered entries are: []"""
# ----------8<----------
```
Tried with the package installed from conda beta; also tried with `pip install -e`.
Entry points in bob.bio.base and bob.bio.face are working. So I presume it's an issue with how we do it in this package (maybe a wrong name for the entry-point group?).Yannick DAYERYannick DAYERhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/45CheckpointWrapper on annotator, saving the original dataset images as well as...2023-01-27T17:11:33ZAlain KOMATYCheckpointWrapper on annotator, saving the original dataset images as well as the annotations - waste of sapce!Hello,
When choosing to checkpoint in the pipeline, the annotator folder will contain the original images of the dataset instead of the annotations (face landmarks for example). One solution is the wrap a CheckpointWrapper around the an...Hello,
When choosing to checkpoint in the pipeline, the annotator folder will contain the original images of the dataset instead of the annotations (face landmarks for example). One solution is the wrap a CheckpointWrapper around the annotator. This will save the annotations in the annotator folder, but it will also save the original images, because now it is wrapped twice!
This problem comes from the [_wrap](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/wrappers.py#L1014) function in the [wrappers](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/wrappers.py) module.
Thanks to @cecabert, who pointed that in this function, there is no test whether the `estimator` is already an instance of CheckpointWrapper or not! One possible solution could be as follows (tested it and it is working for my pipelines):
```python
def _wrap(estimator, **kwargs):
# wrap the object and pass the kwargs
for w_class in bases:
valid_params = w_class._get_param_names()
params = {k: kwargs.pop(k) for k in valid_params if k in kwargs}
if estimator is None:
estimator = w_class(**params)
else:
if not isinstance(estimator, w_class):
estimator = w_class(estimator, **params)
return estimator, kwargs
```Yannick DAYERYannick DAYERhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/44check_parameters_for_validity does not always return the same type2022-12-06T11:13:39ZYannick DAYERcheck_parameters_for_validity does not always return the same typeCurrently, `bob.pipelines.utils.check_parameters_for_validity` can return ["a list or tuple"](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/utils.py#L117).
This seems weird to return a list **or** a tuple. An...Currently, `bob.pipelines.utils.check_parameters_for_validity` can return ["a list or tuple"](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/utils.py#L117).
This seems weird to return a list **or** a tuple. And somewhere down the line, we actually expect a list (with a `remove` method).
Could you ensure that this returns a `list` in all cases (and edit the docstring to reflect that)?André MAYORAZAndré MAYORAZhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/41Expanding samples on the fly2022-05-31T13:53:53ZAnjith GEORGEanjith.george@idiap.chExpanding samples on the flyI have a usecase where I want to expand one image to multiple images in the `pipelinesimple`. Essentially, I can create
a transformer which can operate directly on samples, which takes in a `SampleSet` with one image and return a `Sampl...I have a usecase where I want to expand one image to multiple images in the `pipelinesimple`. Essentially, I can create
a transformer which can operate directly on samples, which takes in a `SampleSet` with one image and return a `SampleSet` with `n` samples.
I managed to make it work, when every thing is in memory (with `-m` option), and checkpointing is a problem since there are new samples. I can create new keys on the fly, what do you think is the best way to go about this.
@amohammadi @tiago.pereira @ydayerhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/40Nightlies are failing because of this package2021-11-30T14:05:42ZTiago de Freitas PereiraNightlies are failing because of this packageCheck here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ======...Check here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ===================================
______________________ test_dataset_pipeline_with_dask_ml ______________________
def test_dataset_pipeline_with_dask_ml():
scaler = dask_ml.preprocessing.StandardScaler()
pca = dask_ml.decomposition.PCA(n_components=3, random_state=0)
clf = SGDClassifier(random_state=0, loss="log", penalty="l2", tol=1e-3)
clf = dask_ml.wrappers.Incremental(clf, scoring="accuracy")
iris_ds = _build_iris_dataset(shuffle=True)
estimator = mario.xr.DatasetPipeline(
[
dict(
estimator=scaler,
output_dims=[("feature", None)],
input_dask_array=True,
),
dict(
estimator=pca,
output_dims=[("pca_features", 3)],
input_dask_array=True,
),
dict(
estimator=clf,
fit_input=["data", "target"],
output_dims=[],
input_dask_array=True,
fit_kwargs=dict(classes=range(3)),
),
]
)
with dask.config.set(scheduler="synchronous"):
> estimator = estimator.fit(iris_ds)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/tests/test_xarray.py:260:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:551: in fit
self._transform(ds, do_fit=True)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:510: in _transform
block.estimator_ = _fit(*args, block=block)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:243: in _fit
block.estimator.fit(*args, **block.fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:495: in fit
self._fit_for_estimator(estimator, X, y, **fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:479: in _fit_for_estimator
result = fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:139: in fit
return value.compute()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:553: in get_sync
return get_async(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in <listcomp>
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:17: in _partial_fit
model.partial_fit(x, y, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:841: in partial_fit
return self._partial_fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:572: in _partial_fit
X, y = self._validate_data(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/base.py:576: in _validate_data
X, y = check_X_y(X, y, **check_params)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:956: in check_X_y
X = check_array(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
array = ('pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48', 1, 0)
accept_sparse = 'csr'
def check_array(
array,
accept_sparse=False,
*,
accept_large_sparse=True,
dtype="numeric",
order=None,
copy=False,
force_all_finite=True,
ensure_2d=True,
allow_nd=False,
ensure_min_samples=1,
ensure_min_features=1,
estimator=None,
):
"""Input validation on an array, list, sparse matrix or similar.
By default, the input is checked to be a non-empty 2D array containing
only finite values. If the dtype of the array is object, attempt
converting to float, raising on failure.
Parameters
----------
array : object
Input object to check / convert.
accept_sparse : str, bool or list/tuple of str, default=False
String[s] representing allowed sparse matrix formats, such as 'csc',
'csr', etc. If the input is sparse but not in the allowed format,
it will be converted to the first listed format. True allows the input
to be any format. False means that a sparse matrix input will
raise an error.
accept_large_sparse : bool, default=True
If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by
accept_sparse, accept_large_sparse=False will cause it to be accepted
only if its indices are stored with a 32-bit dtype.
.. versionadded:: 0.20
dtype : 'numeric', type, list of type or None, default='numeric'
Data type of result. If None, the dtype of the input is preserved.
If "numeric", dtype is preserved unless array.dtype is object.
If dtype is a list of types, conversion on the first type is only
performed if the dtype of the input is not in the list.
order : {'F', 'C'} or None, default=None
Whether an array will be forced to be fortran or c-style.
When order is None (default), then if copy=False, nothing is ensured
about the memory layout of the output array; otherwise (copy=True)
the memory layout of the returned array is kept as close as possible
to the original array.
copy : bool, default=False
Whether a forced copy will be triggered. If copy=False, a copy might
be triggered by a conversion.
force_all_finite : bool or 'allow-nan', default=True
Whether to raise an error on np.inf, np.nan, pd.NA in array. The
possibilities are:
- True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- 'allow-nan': accepts only np.nan and pd.NA values in array. Values
cannot be infinite.
.. versionadded:: 0.20
``force_all_finite`` accepts the string ``'allow-nan'``.
.. versionchanged:: 0.23
Accepts `pd.NA` and converts it into `np.nan`
ensure_2d : bool, default=True
Whether to raise a value error if array is not 2D.
allow_nd : bool, default=False
Whether to allow array.ndim > 2.
ensure_min_samples : int, default=1
Make sure that the array has a minimum number of samples in its first
axis (rows for a 2D array). Setting to 0 disables this check.
ensure_min_features : int, default=1
Make sure that the 2D array has some minimum number of features
(columns). The default value of 1 rejects empty datasets.
This check is only enforced when the input data has effectively 2
dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0
disables this check.
estimator : str or estimator instance, default=None
If passed, include the name of the estimator in warning messages.
Returns
-------
array_converted : object
The converted and validated array.
"""
if isinstance(array, np.matrix):
warnings.warn(
"np.matrix usage is deprecated in 1.0 and will raise a TypeError "
"in 1.2. Please convert to a numpy array with np.asarray. For "
"more information see: "
"https://numpy.org/doc/stable/reference/generated/numpy.matrix.html", # noqa
FutureWarning,
)
# store reference to original array to check if copy is needed when
# function returns
array_orig = array
# store whether originally we wanted numeric dtype
dtype_numeric = isinstance(dtype, str) and dtype == "numeric"
dtype_orig = getattr(array, "dtype", None)
if not hasattr(dtype_orig, "kind"):
# not a data type (e.g. a column named dtype in a pandas DataFrame)
dtype_orig = None
# check if the object contains several dtypes (typically a pandas
# DataFrame), and store them. If not, store None.
dtypes_orig = None
has_pd_integer_array = False
if hasattr(array, "dtypes") and hasattr(array.dtypes, "__array__"):
# throw warning if columns are sparse. If all columns are sparse, then
# array.sparse exists and sparsity will be preserved (later).
with suppress(ImportError):
from pandas.api.types import is_sparse
if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas.DataFrame with sparse columns found."
"It will be converted to a dense numpy array."
)
dtypes_orig = list(array.dtypes)
# pandas boolean dtype __array__ interface coerces bools to objects
for i, dtype_iter in enumerate(dtypes_orig):
if dtype_iter.kind == "b":
dtypes_orig[i] = np.dtype(object)
elif dtype_iter.name.startswith(("Int", "UInt")):
# name looks like an Integer Extension Array, now check for
# the dtype
with suppress(ImportError):
from pandas import (
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
)
if isinstance(
dtype_iter,
(
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
),
):
has_pd_integer_array = True
if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
dtype_orig = np.result_type(*dtypes_orig)
if dtype_numeric:
if dtype_orig is not None and dtype_orig.kind == "O":
# if input is object, convert to float.
dtype = np.float64
else:
dtype = None
if isinstance(dtype, (list, tuple)):
if dtype_orig is not None and dtype_orig in dtype:
# no dtype conversion required
dtype = None
else:
# dtype conversion required. Let's select the first element of the
# list of accepted types.
dtype = dtype[0]
if has_pd_integer_array:
# If there are any pandas integer extension arrays,
array = array.astype(dtype)
if force_all_finite not in (True, False, "allow-nan"):
raise ValueError(
'force_all_finite should be a bool or "allow-nan". Got {!r} instead'.format(
force_all_finite
)
)
if estimator is not None:
if isinstance(estimator, str):
estimator_name = estimator
else:
estimator_name = estimator.__class__.__name__
else:
estimator_name = "Estimator"
context = " by %s" % estimator_name if estimator is not None else ""
# When all dataframe columns are sparse, convert to a sparse array
if hasattr(array, "sparse") and array.ndim > 1:
# DataFrame.sparse only supports `to_coo`
array = array.sparse.to_coo()
if array.dtype == np.dtype("object"):
unique_dtypes = set([dt.subtype.name for dt in array_orig.dtypes])
if len(unique_dtypes) > 1:
raise ValueError(
"Pandas DataFrame with mixed sparse extension arrays "
"generated a sparse matrix with object dtype which "
"can not be converted to a scipy sparse matrix."
"Sparse extension arrays should all have the same "
"numeric type."
)
if sp.issparse(array):
_ensure_no_complex_data(array)
array = _ensure_sparse_format(
array,
accept_sparse=accept_sparse,
dtype=dtype,
copy=copy,
force_all_finite=force_all_finite,
accept_large_sparse=accept_large_sparse,
)
else:
# If np.array(..) gives ComplexWarning, then we convert the warning
# to an error. This is needed because specifying a non complex
# dtype to the function converts complex to real dtype,
# thereby passing the test made in the lines following the scope
# of warnings context manager.
with warnings.catch_warnings():
try:
warnings.simplefilter("error", ComplexWarning)
if dtype is not None and np.dtype(dtype).kind in "iu":
# Conversion float -> int should not contain NaN or
# inf (numpy#14412). We cannot use casting='safe' because
# then conversion float -> int would be disallowed.
array = np.asarray(array, order=order)
if array.dtype.kind == "f":
_assert_all_finite(array, allow_nan=False, msg_dtype=dtype)
array = array.astype(dtype, casting="unsafe", copy=False)
else:
> array = np.asarray(array, order=order, dtype=dtype)
E ValueError: could not convert string to float: 'pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48'
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:738: ValueError
```https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/39Passing "resources" to dask_jobqueue.core.Job raises an exception2021-11-29T17:13:12ZManuel Günthersiebenkopf@googlemail.comPassing "resources" to dask_jobqueue.core.Job raises an exceptionWhen loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracin...When loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracing down the error, it seems that you are passing the `resources`: https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L347
as a `kwargs` to `__init__`, which are simply passed on to the base class constructor:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L58
I would recommend to have `resources` as a regular parameter in `__init__` so that it is not passed on to the base class constructor.