bob issueshttps://gitlab.idiap.ch/groups/bob/-/issues2022-01-17T17:21:57Zhttps://gitlab.idiap.ch/bob/bob/-/issues/272Nightlies failing because of this one2022-01-17T17:21:57ZTiago de Freitas PereiraNightlies failing because of this one
https://gitlab.idiap.ch/bob/nightlies/-/jobs/253709/
```sh
with channels:
- http://www.idiap.ch/software/bob/conda/label/beta
- conda-forge
The reported errors are:
Encountered problems while solving:
- cannot install both ...
https://gitlab.idiap.ch/bob/nightlies/-/jobs/253709/
```sh
with channels:
- http://www.idiap.ch/software/bob/conda/label/beta
- conda-forge
The reported errors are:
Encountered problems while solving:
- cannot install both psutil-5.9.0-py38h497a2fe_0 and psutil-5.8.0-py310h6acc77f_2
Traceback (most recent call last):
File "/scratch/builds/bob/nightlies/miniconda/bin/bdt", line 11, in <module>
sys.exit(main())
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/bob/devtools/scripts/bdt.py", line 43, in _decorator
value = view_func(*args, **kwargs)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/bob/devtools/scripts/ci.py", line 739, in nightlies
ctx.invoke(
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/bob/devtools/scripts/bdt.py", line 43, in _decorator
value = view_func(*args, **kwargs)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/bob/devtools/scripts/build.py", line 305, in build
paths = conda_build.api.build(
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/conda_build/api.py", line 186, in build
return build_tree(
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/conda_build/build.py", line 3083, in build_tree
packages_from_this = build(metadata, stats,
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/conda_build/build.py", line 2123, in build
create_build_envs(top_level_pkg, notest)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/conda_build/build.py", line 1980, in create_build_envs
environ.get_install_actions(m.config.test_prefix,
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/boa/cli/mambabuild.py", line 70, in mamba_get_install_actions
solution = solver.solve_for_action(_specs, prefix)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/boa/core/solver.py", line 214, in solve_for_action
t = self.solve(specs)
File "/scratch/builds/bob/nightlies/miniconda/lib/python3.9/site-packages/boa/core/solver.py", line 200, in solve
raise RuntimeError("Solver could not find solution.")
RuntimeError: Solver could not find solution
```Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.bio.face/-/issues/75Resources for arface dataset are mixed up2022-01-14T16:07:35ZManuel Günthersiebenkopf@googlemail.comResources for arface dataset are mixed upIn the `setup.py`, the entries of two ARface protocols are exchanged: https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/bc421ecae8908299ef5b879e965741a25dca6567/setup.py#L242In the `setup.py`, the entries of two ARface protocols are exchanged: https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/bc421ecae8908299ef5b879e965741a25dca6567/setup.py#L242Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.bio.face/-/issues/54Databases to port2022-01-14T09:17:21ZTiago de Freitas PereiraDatabases to portHi guys,
Which face databases do you want to be ported to the new API? I'll add some below, please add more if necessary.
- [x] LFW
- [x] GBU
- [x] rfw (https://gitlab.idiap.ch/bob/bob.bio.face/-/merge_requests/127)
- [ ] MegaFace
...Hi guys,
Which face databases do you want to be ported to the new API? I'll add some below, please add more if necessary.
- [x] LFW
- [x] GBU
- [x] rfw (https://gitlab.idiap.ch/bob/bob.bio.face/-/merge_requests/127)
- [ ] MegaFace
- [ ] CALFW - Cross-Age LFW (CALFW) Database
- [x] Youtube faces
ping @lcolbois @ageorge @hotroshi @amohammadi
thankshttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/173Create an option --force in the VanillaBiometrics CLI command....2022-01-14T09:16:56ZTiago de Freitas PereiraCreate an option --force in the VanillaBiometrics CLI command....In that way checkpoints will be regenerated even if they already exists
Related to #152.In that way checkpoints will be regenerated even if they already exists
Related to #152.https://gitlab.idiap.ch/bob/bob.bio.base/-/issues/166Score normalization pipeline needs some redesign2022-01-13T10:03:34ZTiago de Freitas PereiraScore normalization pipeline needs some redesign`bob.bio.base` implements a pipeline that does several types of score normalization in one shot:
- Z-Norm
- T-Norm
- S-Norm
- ZT-Norm
- Some variations of the adaptative norm.
Although logic (they are all variations of the same thi...`bob.bio.base` implements a pipeline that does several types of score normalization in one shot:
- Z-Norm
- T-Norm
- S-Norm
- ZT-Norm
- Some variations of the adaptative norm.
Although logic (they are all variations of the same thing), this structure doesn't seem to scale to datasets where the number of comparisons explodes to millions of comparisons.
I often face `MemoryError` issues that are super tough to track down (dask memory error).
Furthermore, the code is a bit convoluted. I think we need to break this down into small pieces.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/169CLI plotting commands inconsistent2021-12-15T15:13:31ZManuel Günthersiebenkopf@googlemail.comCLI plotting commands inconsistentWhen using the plotting commands for bob, some of the parameters are expected to be separated by space, and some by comma. For example, the following command does not work:
```
bob bio dir scores-1 scores-2 --legends label-1 label-2
```...When using the plotting commands for bob, some of the parameters are expected to be separated by space, and some by comma. For example, the following command does not work:
```
bob bio dir scores-1 scores-2 --legends label-1 label-2
```
This raises the error:
```
Usage: bob bio dir [OPTIONS] [SCORES]...
Try 'bob bio dir -?' for help.
Error: Invalid value: Number of legends must be >= to the number of systems
```
In fact, the score files must be separated by space, and the legends by comma, in order to work:
```
bob bio dir scores-1 scores-2 --legends label-1,label-2
```
Is there any particular reason for this behavior, i.e., is this expected?https://gitlab.idiap.ch/bob/bob.bio.base/-/issues/172There is no algorithm available to compute average features2021-12-15T14:41:32ZManuel Günthersiebenkopf@googlemail.comThere is no algorithm available to compute average featuresRelated to bob/bob.bio.face#73
The current best way of handling several deep features for enrollment or probing is to compute there average. Currently, this is not implemented. This issue is used to keep track of the implementation of t...Related to bob/bob.bio.face#73
The current best way of handling several deep features for enrollment or probing is to compute there average. Currently, this is not implemented. This issue is used to keep track of the implementation of that feature.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/152`bob bio pipelines vanilla-biometrics` should have an option `--memory` inste...2021-12-15T08:02:00ZTiago de Freitas Pereira`bob bio pipelines vanilla-biometrics` should have an option `--memory` instead `--checkpoint``bob bio pipelines vanilla-biometrics` should checkpoint by default instead of running everything on memory, not on the other way around.
If someone is not aware of what this command does and runs it in a big dataset, the program might ...`bob bio pipelines vanilla-biometrics` should checkpoint by default instead of running everything on memory, not on the other way around.
If someone is not aware of what this command does and runs it in a big dataset, the program might crashes because of OOM.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/164Temporary files and caches are written into the result directory2021-12-14T17:58:32ZManuel Günthersiebenkopf@googlemail.comTemporary files and caches are written into the result directoryIn the old version, we had two different directories to store elements: the `result_directory` and the `temp_directory`. These two directories where there for a purpose: anything inside of `temp` could be easily removed after the experim...In the old version, we had two different directories to store elements: the `result_directory` and the `temp_directory`. These two directories where there for a purpose: anything inside of `temp` could be easily removed after the experiments have finished, while important results were stored in the `results` directory. This separation also allowed to have the temporary files on a local disk -- with much faster access and without backup -- and only the result files in a directory with backups.
Unfortunately, this split has gone in the new version -- for no obvious reasons other than laziness IMHO. Is there any possibility to have some kind of mechanism to have files in `tmp` and cached files in `sampleswrapper`, `biometric_references` and `scores` to be placed in a different directory than the `--output`? Or is there any particular reason that you only want to have a single directory for all output that I am overlooking here?https://gitlab.idiap.ch/bob/bob.bio.face/-/issues/72VGG16 preprocessing buggy?2021-12-14T17:45:35ZManuel Günthersiebenkopf@googlemail.comVGG16 preprocessing buggy?When using the VGG16 network, we need to subtract the RGB mean from the channels. As the images are in bob format (`NxCxHxW`), we would need to subtract the mean from `[:,i,:,:]`. Instead, we subtract it from `[:,:,:,i]`:
https://gitlab....When using the VGG16 network, we need to subtract the RGB mean from the channels. As the images are in bob format (`NxCxHxW`), we would need to subtract the mean from `[:,i,:,:]`. Instead, we subtract it from `[:,:,:,i]`:
https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/3567e990d0e523ceb5d3f9598054d8a27d7f7000/bob/bio/face/embeddings/opencv.py#L140
This is most certainly incorrect, especially since we use the correct dimension later on to convert RGB to BGR:
https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/3567e990d0e523ceb5d3f9598054d8a27d7f7000/bob/bio/face/embeddings/opencv.py#L146
Finally, in the pipeline, we define an MTCNN annotator with particular parameters: https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/3567e990d0e523ceb5d3f9598054d8a27d7f7000/bob/bio/face/embeddings/opencv.py#L203
but this is ignored since the pipeline uses `"mtcnn"`.https://gitlab.idiap.ch/bob/bob.bio.face/-/issues/73Implementation of Distance algorithm for deep feature extractors not optimal2021-12-14T17:02:54ZManuel Günthersiebenkopf@googlemail.comImplementation of Distance algorithm for deep feature extractors not optimalThere are two different concepts that have been emerged lately in face recognition with deep features, which have been shown to improve performance considerably:
1. The best way to handle several samples for enrollment or probing is to ...There are two different concepts that have been emerged lately in face recognition with deep features, which have been shown to improve performance considerably:
1. The best way to handle several samples for enrollment or probing is to compute the average of the features.
2. When comparing deep features, use the cosine similarity.
Unfortunately, neither of the two concepts is used in our baselines, when we simply use the `Distance` implementation from `bob.bio.base`, where the default behavior is:
1. When having several features for enrollment or probing, compute the pairwise distances and then use the average of the scores. This is tricky to see since this is hidden in the base class constructor: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/algorithm/Algorithm.py#L83
which will then be translated to computing **average scores** (not the score between averaged features): https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/utils/__init__.py#L27
2. The default comparison function in `Distance` is the Euclidean distance: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/algorithm/Distance.py#L34
So, when we simply use the default constructor as in here: https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/f494d6cb9ca23d4809e08498d046f2120cb21df3/bob/bio/face/embeddings/pytorch.py#L417
and most probably also in all other implementations, we will get Euclidean instead of cosine distance.
Tasks:
- [ ] Implement the averaging of features both for the enrollment and the probes (in case there are multiple). This can either be done by adapting the existing `Distance` function through adding a different `multiple_model_scoring` or `multiple_probe_scoring` parameter, or by implementing a completely separate Algorithm class for that.
- [ ] Change the default in all of the baselines to use the new behavior, but at least to select the cosine distance instead of Euclidean.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.bio.face/-/issues/70BoundingBoxAnnotatorCrop ignores the selected bounding boxes2021-12-14T14:36:36ZManuel Günthersiebenkopf@googlemail.comBoundingBoxAnnotatorCrop ignores the selected bounding boxesThe `BoundingBoxAnnotatorCrop` is nice in the sense that it can even work when no face was detected by simply using the bounding box. While the `"topleft"` and `"bottomright"` are required to be specified, they are however ignored in th...The `BoundingBoxAnnotatorCrop` is nice in the sense that it can even work when no face was detected by simply using the bounding box. While the `"topleft"` and `"bottomright"` are required to be specified, they are however ignored in the constructor:
https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/ead8c069bafb4024dc15c5df7fdc878aec8bd5f0/bob/bio/face/preprocessor/FaceCrop.py#L542
Instead, the face is cropped and simply scaled to the required size:
https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/ead8c069bafb4024dc15c5df7fdc878aec8bd5f0/bob/bio/face/preprocessor/FaceCrop.py#L616
Finally, the `fixed_positions` are not respected by the annotator, so it is impossible to use it with a dataset that does not provide `topleft` and `bottomright` annotations.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.io.audio/-/issues/9package installed with conda is broken2021-12-13T10:52:37ZAmir MOHAMMADIpackage installed with conda is broken```
bob10/lib/python3.8/site-packages/bob/io/audio/__init__.py", line 4, in <module>
from ._library import *
ImportError: libsox.so.3: cannot open shared object file: No such file or directory
```
Also I see that pytest never ran bo...```
bob10/lib/python3.8/site-packages/bob/io/audio/__init__.py", line 4, in <module>
from ._library import *
ImportError: libsox.so.3: cannot open shared object file: No such file or directory
```
Also I see that pytest never ran bob.io.audio tests in https://gitlab.idiap.ch/bob/bob/-/jobs/249166/rawAmir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/167Algorithms with training that requires split by class don't seem to work2021-12-13T08:13:16ZManuel Günthersiebenkopf@googlemail.comAlgorithms with training that requires split by class don't seem to workWhen running a small baseline algorithm, such as `lda`, it seems that the required classes for the training samples is not forwarded to the training algorithm:
```
$ bob bio pipelines vanilla-biometrics -vv atnt lda
...
File ".../bob...When running a small baseline algorithm, such as `lda`, it seems that the required classes for the training samples is not forwarded to the training algorithm:
```
$ bob bio pipelines vanilla-biometrics -vv atnt lda
...
File ".../bob.bio.base/bob/bio/base/transformers/algorithm.py", line 62, in fit
training_data = split_X_by_y(X, y)
File ".../bob.bio.base/bob/bio/base/transformers/__init__.py", line 6, in split_X_by_y
for x1, y1 in zip(X, y):
TypeError: 'NoneType' object is not iterable
```
I have checked what is going on, and it seems that `y=None` in: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/1c3f542ee4d77592146ddc54aa8a51194a853745/bob/bio/base/transformers/__init__.py#L4
called by:
https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/1c3f542ee4d77592146ddc54aa8a51194a853745/bob/bio/base/transformers/algorithm.py#L61
Unfortunately, I cannot trace the issue back further since my experience in debugging `dask` is very limited.
Maybe we should allow to run the pipeline without `dask` -- as far as I understood, the dask-pipeline is only a wrapper around the whole pipeline. Is it possible to skip using the `dask` wrapper and run everything local in a single thread? This would make debugging much easier.
Actually, I wanted to try out the above pipeline to debug my `dask` setup, which does not work.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/38local-parallel queue is not setup well2021-12-06T11:09:29ZManuel Günthersiebenkopf@googlemail.comlocal-parallel queue is not setup wellThe setup of the current `local-parallel` configuration does not work as expected, for several reasons:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/config/distributed/local_para...The setup of the current `local-parallel` configuration does not work as expected, for several reasons:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/config/distributed/local_parallel.py#L10
1. When we set `processes=False`, we will only use the python threading module, which will effectively limit the CPU usage to around 100% (i.e., one core), no matter how many cores we use. Only with `processes=True`, we will get real parallelization.
2. Selecting all possible CPUs via `cpu_count()` by default does not work well. I have a machine with 128 CPU cores, so setting up all 128 cores takes longer than an experiment -- especially when using `processes=False` above, I commonly get a timeout error.
Before, we had something like `local-p4` with 4 parallel cores, and alike. I think it would be a good idea to incorporate several of these here. Are there any objections?https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/37dask_jobqueue > 0.7.2 changed the API...2021-11-30T18:25:54ZTiago de Freitas Pereiradask_jobqueue > 0.7.2 changed the API..... and this is breaking `SGEIdiapJob`
`**kwargs` was removed from here
https://github.com/dask/dask-jobqueue/blob/0.7.2/dask_jobqueue/core.py#L132.. and this is breaking `SGEIdiapJob`
`**kwargs` was removed from here
https://github.com/dask/dask-jobqueue/blob/0.7.2/dask_jobqueue/core.py#L132Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/168resources.py does not list dask clients2021-11-30T15:17:25ZManuel Günthersiebenkopf@googlemail.comresources.py does not list dask clientsWhile all other parts of the pipeline can be listed through `resources.py`, this is not the case for registered `dask` clients. When running `bob bio pipelines vanilla-biometrics -h` we can see the option `-l, --dask-client`, but curren...While all other parts of the pipeline can be listed through `resources.py`, this is not the case for registered `dask` clients. When running `bob bio pipelines vanilla-biometrics -h` we can see the option `-l, --dask-client`, but currently there is no simple way of listing which clients are registered.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.devtools/-/issues/92Test for yum install does not work if builder tag **contains** `docker`2021-11-30T14:56:56ZAndré AnjosTest for yum install does not work if builder tag **contains** `docker`The test verifies the following:
```python
if "docker" in os.environ.get("CI_RUNNER_TAGS", "") and os.path.exists(
yum_requirements_file
):
...
```
The CI_RUNNER_TAGS for the shell builder says `docker-build`, a...The test verifies the following:
```python
if "docker" in os.environ.get("CI_RUNNER_TAGS", "") and os.path.exists(
yum_requirements_file
):
...
```
The CI_RUNNER_TAGS for the shell builder says `docker-build`, and therefore this test will also succeed then, whereas it shouldn't.
Failing pipeline: https://gitlab.idiap.ch/beat/beat.nightlies/-/jobs/250244
The solution would be to split the tags, and compare to "exactly" `docker` and not to just contain it.Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/40Nightlies are failing because of this package2021-11-30T14:05:42ZTiago de Freitas PereiraNightlies are failing because of this packageCheck here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ======...Check here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ===================================
______________________ test_dataset_pipeline_with_dask_ml ______________________
def test_dataset_pipeline_with_dask_ml():
scaler = dask_ml.preprocessing.StandardScaler()
pca = dask_ml.decomposition.PCA(n_components=3, random_state=0)
clf = SGDClassifier(random_state=0, loss="log", penalty="l2", tol=1e-3)
clf = dask_ml.wrappers.Incremental(clf, scoring="accuracy")
iris_ds = _build_iris_dataset(shuffle=True)
estimator = mario.xr.DatasetPipeline(
[
dict(
estimator=scaler,
output_dims=[("feature", None)],
input_dask_array=True,
),
dict(
estimator=pca,
output_dims=[("pca_features", 3)],
input_dask_array=True,
),
dict(
estimator=clf,
fit_input=["data", "target"],
output_dims=[],
input_dask_array=True,
fit_kwargs=dict(classes=range(3)),
),
]
)
with dask.config.set(scheduler="synchronous"):
> estimator = estimator.fit(iris_ds)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/tests/test_xarray.py:260:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:551: in fit
self._transform(ds, do_fit=True)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:510: in _transform
block.estimator_ = _fit(*args, block=block)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:243: in _fit
block.estimator.fit(*args, **block.fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:495: in fit
self._fit_for_estimator(estimator, X, y, **fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:479: in _fit_for_estimator
result = fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:139: in fit
return value.compute()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:553: in get_sync
return get_async(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in <listcomp>
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:17: in _partial_fit
model.partial_fit(x, y, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:841: in partial_fit
return self._partial_fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:572: in _partial_fit
X, y = self._validate_data(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/base.py:576: in _validate_data
X, y = check_X_y(X, y, **check_params)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:956: in check_X_y
X = check_array(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
array = ('pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48', 1, 0)
accept_sparse = 'csr'
def check_array(
array,
accept_sparse=False,
*,
accept_large_sparse=True,
dtype="numeric",
order=None,
copy=False,
force_all_finite=True,
ensure_2d=True,
allow_nd=False,
ensure_min_samples=1,
ensure_min_features=1,
estimator=None,
):
"""Input validation on an array, list, sparse matrix or similar.
By default, the input is checked to be a non-empty 2D array containing
only finite values. If the dtype of the array is object, attempt
converting to float, raising on failure.
Parameters
----------
array : object
Input object to check / convert.
accept_sparse : str, bool or list/tuple of str, default=False
String[s] representing allowed sparse matrix formats, such as 'csc',
'csr', etc. If the input is sparse but not in the allowed format,
it will be converted to the first listed format. True allows the input
to be any format. False means that a sparse matrix input will
raise an error.
accept_large_sparse : bool, default=True
If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by
accept_sparse, accept_large_sparse=False will cause it to be accepted
only if its indices are stored with a 32-bit dtype.
.. versionadded:: 0.20
dtype : 'numeric', type, list of type or None, default='numeric'
Data type of result. If None, the dtype of the input is preserved.
If "numeric", dtype is preserved unless array.dtype is object.
If dtype is a list of types, conversion on the first type is only
performed if the dtype of the input is not in the list.
order : {'F', 'C'} or None, default=None
Whether an array will be forced to be fortran or c-style.
When order is None (default), then if copy=False, nothing is ensured
about the memory layout of the output array; otherwise (copy=True)
the memory layout of the returned array is kept as close as possible
to the original array.
copy : bool, default=False
Whether a forced copy will be triggered. If copy=False, a copy might
be triggered by a conversion.
force_all_finite : bool or 'allow-nan', default=True
Whether to raise an error on np.inf, np.nan, pd.NA in array. The
possibilities are:
- True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- 'allow-nan': accepts only np.nan and pd.NA values in array. Values
cannot be infinite.
.. versionadded:: 0.20
``force_all_finite`` accepts the string ``'allow-nan'``.
.. versionchanged:: 0.23
Accepts `pd.NA` and converts it into `np.nan`
ensure_2d : bool, default=True
Whether to raise a value error if array is not 2D.
allow_nd : bool, default=False
Whether to allow array.ndim > 2.
ensure_min_samples : int, default=1
Make sure that the array has a minimum number of samples in its first
axis (rows for a 2D array). Setting to 0 disables this check.
ensure_min_features : int, default=1
Make sure that the 2D array has some minimum number of features
(columns). The default value of 1 rejects empty datasets.
This check is only enforced when the input data has effectively 2
dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0
disables this check.
estimator : str or estimator instance, default=None
If passed, include the name of the estimator in warning messages.
Returns
-------
array_converted : object
The converted and validated array.
"""
if isinstance(array, np.matrix):
warnings.warn(
"np.matrix usage is deprecated in 1.0 and will raise a TypeError "
"in 1.2. Please convert to a numpy array with np.asarray. For "
"more information see: "
"https://numpy.org/doc/stable/reference/generated/numpy.matrix.html", # noqa
FutureWarning,
)
# store reference to original array to check if copy is needed when
# function returns
array_orig = array
# store whether originally we wanted numeric dtype
dtype_numeric = isinstance(dtype, str) and dtype == "numeric"
dtype_orig = getattr(array, "dtype", None)
if not hasattr(dtype_orig, "kind"):
# not a data type (e.g. a column named dtype in a pandas DataFrame)
dtype_orig = None
# check if the object contains several dtypes (typically a pandas
# DataFrame), and store them. If not, store None.
dtypes_orig = None
has_pd_integer_array = False
if hasattr(array, "dtypes") and hasattr(array.dtypes, "__array__"):
# throw warning if columns are sparse. If all columns are sparse, then
# array.sparse exists and sparsity will be preserved (later).
with suppress(ImportError):
from pandas.api.types import is_sparse
if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas.DataFrame with sparse columns found."
"It will be converted to a dense numpy array."
)
dtypes_orig = list(array.dtypes)
# pandas boolean dtype __array__ interface coerces bools to objects
for i, dtype_iter in enumerate(dtypes_orig):
if dtype_iter.kind == "b":
dtypes_orig[i] = np.dtype(object)
elif dtype_iter.name.startswith(("Int", "UInt")):
# name looks like an Integer Extension Array, now check for
# the dtype
with suppress(ImportError):
from pandas import (
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
)
if isinstance(
dtype_iter,
(
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
),
):
has_pd_integer_array = True
if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
dtype_orig = np.result_type(*dtypes_orig)
if dtype_numeric:
if dtype_orig is not None and dtype_orig.kind == "O":
# if input is object, convert to float.
dtype = np.float64
else:
dtype = None
if isinstance(dtype, (list, tuple)):
if dtype_orig is not None and dtype_orig in dtype:
# no dtype conversion required
dtype = None
else:
# dtype conversion required. Let's select the first element of the
# list of accepted types.
dtype = dtype[0]
if has_pd_integer_array:
# If there are any pandas integer extension arrays,
array = array.astype(dtype)
if force_all_finite not in (True, False, "allow-nan"):
raise ValueError(
'force_all_finite should be a bool or "allow-nan". Got {!r} instead'.format(
force_all_finite
)
)
if estimator is not None:
if isinstance(estimator, str):
estimator_name = estimator
else:
estimator_name = estimator.__class__.__name__
else:
estimator_name = "Estimator"
context = " by %s" % estimator_name if estimator is not None else ""
# When all dataframe columns are sparse, convert to a sparse array
if hasattr(array, "sparse") and array.ndim > 1:
# DataFrame.sparse only supports `to_coo`
array = array.sparse.to_coo()
if array.dtype == np.dtype("object"):
unique_dtypes = set([dt.subtype.name for dt in array_orig.dtypes])
if len(unique_dtypes) > 1:
raise ValueError(
"Pandas DataFrame with mixed sparse extension arrays "
"generated a sparse matrix with object dtype which "
"can not be converted to a scipy sparse matrix."
"Sparse extension arrays should all have the same "
"numeric type."
)
if sp.issparse(array):
_ensure_no_complex_data(array)
array = _ensure_sparse_format(
array,
accept_sparse=accept_sparse,
dtype=dtype,
copy=copy,
force_all_finite=force_all_finite,
accept_large_sparse=accept_large_sparse,
)
else:
# If np.array(..) gives ComplexWarning, then we convert the warning
# to an error. This is needed because specifying a non complex
# dtype to the function converts complex to real dtype,
# thereby passing the test made in the lines following the scope
# of warnings context manager.
with warnings.catch_warnings():
try:
warnings.simplefilter("error", ComplexWarning)
if dtype is not None and np.dtype(dtype).kind in "iu":
# Conversion float -> int should not contain NaN or
# inf (numpy#14412). We cannot use casting='safe' because
# then conversion float -> int would be disallowed.
array = np.asarray(array, order=order)
if array.dtype.kind == "f":
_assert_all_finite(array, allow_nan=False, msg_dtype=dtype)
array = array.astype(dtype, casting="unsafe", copy=False)
else:
> array = np.asarray(array, order=order, dtype=dtype)
E ValueError: could not convert string to float: 'pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48'
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:738: ValueError
```https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/39Passing "resources" to dask_jobqueue.core.Job raises an exception2021-11-29T17:13:12ZManuel Günthersiebenkopf@googlemail.comPassing "resources" to dask_jobqueue.core.Job raises an exceptionWhen loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracin...When loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracing down the error, it seems that you are passing the `resources`: https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L347
as a `kwargs` to `__init__`, which are simply passed on to the base class constructor:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L58
I would recommend to have `resources` as a regular parameter in `__init__` so that it is not passed on to the base class constructor.