bob.pipelines issueshttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues2024-03-21T10:21:20Zhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/29jman (gridtk) like interface for submitting dask jobs2024-03-21T10:21:20ZAmir MOHAMMADIjman (gridtk) like interface for submitting dask jobsWe need:
1. A command that automatically creates a dask client for us to be used for SGE submission.
2. A history of the commands that were executed.
3. An automatic tracking of dask logs.We need:
1. A command that automatically creates a dask client for us to be used for SGE submission.
2. A history of the commands that were executed.
3. An automatic tracking of dask logs.Bob 9.0.0https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/21Who is Mario?2024-01-08T13:58:44ZTiago de Freitas PereiraWho is Mario?There are several places in the code where we alias `bob.pipelines` as Mario.
Well, it was a good joke at the beginning but now we need to think if we want to keep this.There are several places in the code where we alias `bob.pipelines` as Mario.
Well, it was a good joke at the beginning but now we need to think if we want to keep this.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/43Remove Bob Extension as dependency2024-01-08T13:57:32ZAndré MAYORAZRemove Bob Extension as dependencyBob extension has to be removed. It is used at three places in this package :
- [x] bob.pipelines/doc/conf.py
- To load the association list for the packages for intersphinx. We have to see if it is better to do the association between ...Bob extension has to be removed. It is used at three places in this package :
- [x] bob.pipelines/doc/conf.py
- To load the association list for the packages for intersphinx. We have to see if it is better to do the association between the package and their URL directly or any other solution.
- [x] bob.pipelines/src/bob/pipelines/distributed/sge.py
- To load the rc configuration. This can be replaced by the [exposed](https://gitlab.idiap.ch/bob/exposed) package
- [x] bob.pipelines/src/bob/pipelines/datasets.py
- To list the files and folders inside a folder or a tarball and search for files either in a file structure or in a tarball.Roadmap to the major version of Bob 12André MAYORAZAndré MAYORAZhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/46Dask Client configuration not available in installed package2023-03-28T12:52:29ZYannick DAYERDask Client configuration not available in installed packageWhen using the `bob bio simple` commands, the `dask.client` entry-points are not available.
Doing `bob bio pipeline simple -H conf.py` outputs in `conf.py`:
``` python
# ----------8<----------
# dask_client = single-threaded
"""Option...When using the `bob bio simple` commands, the `dask.client` entry-points are not available.
Doing `bob bio pipeline simple -H conf.py` outputs in `conf.py`:
``` python
# ----------8<----------
# dask_client = single-threaded
"""Optional parameter: dask_client (--dask-client, -l) [default: single-threaded]
Dask client for the execution of the pipeline. Can be a `dask.client' entry point, a module name, or a path to a Python file which contains a variable named `dask_client'.
Registered entries are: []"""
# ----------8<----------
```
Tried with the package installed from conda beta; also tried with `pip install -e`.
Entry points in bob.bio.base and bob.bio.face are working. So I presume it's an issue with how we do it in this package (maybe a wrong name for the entry-point group?).Yannick DAYERYannick DAYERhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/45CheckpointWrapper on annotator, saving the original dataset images as well as...2023-01-27T17:11:33ZAlain KOMATYCheckpointWrapper on annotator, saving the original dataset images as well as the annotations - waste of sapce!Hello,
When choosing to checkpoint in the pipeline, the annotator folder will contain the original images of the dataset instead of the annotations (face landmarks for example). One solution is the wrap a CheckpointWrapper around the an...Hello,
When choosing to checkpoint in the pipeline, the annotator folder will contain the original images of the dataset instead of the annotations (face landmarks for example). One solution is the wrap a CheckpointWrapper around the annotator. This will save the annotations in the annotator folder, but it will also save the original images, because now it is wrapped twice!
This problem comes from the [_wrap](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/wrappers.py#L1014) function in the [wrappers](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/wrappers.py) module.
Thanks to @cecabert, who pointed that in this function, there is no test whether the `estimator` is already an instance of CheckpointWrapper or not! One possible solution could be as follows (tested it and it is working for my pipelines):
```python
def _wrap(estimator, **kwargs):
# wrap the object and pass the kwargs
for w_class in bases:
valid_params = w_class._get_param_names()
params = {k: kwargs.pop(k) for k in valid_params if k in kwargs}
if estimator is None:
estimator = w_class(**params)
else:
if not isinstance(estimator, w_class):
estimator = w_class(estimator, **params)
return estimator, kwargs
```Yannick DAYERYannick DAYERhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/42Switch to new CI/CD configuration2023-01-02T14:21:40ZYannick DAYERSwitch to new CI/CD configurationWe need to adapt this package to the new CI/CD and package format using citools:
- [x] Modify `pyproject.toml`:
- [x] Add information from `setup.py`,
- [x] Add version from `version.txt`,
- [x] Add requirements from `requir...We need to adapt this package to the new CI/CD and package format using citools:
- [x] Modify `pyproject.toml`:
- [x] Add information from `setup.py`,
- [x] Add version from `version.txt`,
- [x] Add requirements from `requirements.txt` and `conda/meta.yaml`,
- [x] Empty `setup.py`:
- Leave the call to `setup()` for compatibility,
- [x] Remove `version.txt`,
- [x] Remove `requirements.txt`,
- [x] Modify `conda/meta.yaml`,
- [x] Import data from `pyproject.toml` (`name`, `version`, ...),
- [x] Add the `source.path` field with value `..`,
- [x] Add the `build.noarch` field with value `python`,
- [x] Edit the `build.script` to only contain `"{{ PYTHON }} -m pip install {{ SRC_DIR }} -vv"`,
- [x] Remove test and documentation commands and comments,
- [x] Modify `.gitlab-ci.yml` to point to citools' `python.yml`,
- Use the fields format instead of the URL,
- [x] Move files to follow the `src` layout:
- [x] the whole `bob` folder to `src/bob/`,
- [x] all the tests in `tests/`,
- [x] the test data files in `tests/data`,
- [x] Edit the tests to load the data correctly, either with `os.path.join(os.path.basename(__file__), "data/xxx.txt")` or `pkg_resources.resource_filename(__name__, "data/xxx.txt")`,
- [x] Activate the `packages` option in `settings -> general -> visibility` in the Gitlab project,
- [x] Edit the latest doc badges to point to the `sphinx` directory in `doc/[...]/master`:
- [x] in README.md,
- [x] in the GitLab project settings,
- [x] Edit the coverage badges to point to the doc's coverage directory:
- [x] in README.md,
- [x] in the GitLab project settings,
- [x] Ensure the CI pipeline passes.
You can look at [bob.learn.em](https://gitlab.idiap.ch/bob/bob.learn.em) for an example of a ported package.Roadmap to the major version of Bob 12https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/44check_parameters_for_validity does not always return the same type2022-12-06T11:13:39ZYannick DAYERcheck_parameters_for_validity does not always return the same typeCurrently, `bob.pipelines.utils.check_parameters_for_validity` can return ["a list or tuple"](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/utils.py#L117).
This seems weird to return a list **or** a tuple. An...Currently, `bob.pipelines.utils.check_parameters_for_validity` can return ["a list or tuple"](https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/src/bob/pipelines/utils.py#L117).
This seems weird to return a list **or** a tuple. And somewhere down the line, we actually expect a list (with a `remove` method).
Could you ensure that this returns a `list` in all cases (and edit the docstring to reflect that)?André MAYORAZAndré MAYORAZhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/41Expanding samples on the fly2022-05-31T13:53:53ZAnjith GEORGEanjith.george@idiap.chExpanding samples on the flyI have a usecase where I want to expand one image to multiple images in the `pipelinesimple`. Essentially, I can create
a transformer which can operate directly on samples, which takes in a `SampleSet` with one image and return a `Sampl...I have a usecase where I want to expand one image to multiple images in the `pipelinesimple`. Essentially, I can create
a transformer which can operate directly on samples, which takes in a `SampleSet` with one image and return a `SampleSet` with `n` samples.
I managed to make it work, when every thing is in memory (with `-m` option), and checkpointing is a problem since there are new samples. I can create new keys on the fly, what do you think is the best way to go about this.
@amohammadi @tiago.pereira @ydayerhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/34Annoying warnings dask-jobqueue2022-05-09T14:20:33ZTiago de Freitas PereiraAnnoying warnings dask-jobqueueSince `distributed>2021.x.x` we are having some annoying warnings coming `dask_jobqueue`
```
...lib/python3.8/site-packages/dask_jobqueue/core.py:321: FutureWarning: ignoring was deprecated in version 2021.06.1 and will be removed in a ...Since `distributed>2021.x.x` we are having some annoying warnings coming `dask_jobqueue`
```
...lib/python3.8/site-packages/dask_jobqueue/core.py:321: FutureWarning: ignoring was deprecated in version 2021.06.1 and will be removed in a future release. Please use contextlib.suppress from the standard library instead.
with ignoring(RuntimeError): # deleting job when job already gone
```https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/38local-parallel queue is not setup well2021-12-06T11:09:29ZManuel Günthersiebenkopf@googlemail.comlocal-parallel queue is not setup wellThe setup of the current `local-parallel` configuration does not work as expected, for several reasons:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/config/distributed/local_para...The setup of the current `local-parallel` configuration does not work as expected, for several reasons:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/config/distributed/local_parallel.py#L10
1. When we set `processes=False`, we will only use the python threading module, which will effectively limit the CPU usage to around 100% (i.e., one core), no matter how many cores we use. Only with `processes=True`, we will get real parallelization.
2. Selecting all possible CPUs via `cpu_count()` by default does not work well. I have a machine with 128 CPU cores, so setting up all 128 cores takes longer than an experiment -- especially when using `processes=False` above, I commonly get a timeout error.
Before, we had something like `local-p4` with 4 parallel cores, and alike. I think it would be a good idea to incorporate several of these here. Are there any objections?https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/37dask_jobqueue > 0.7.2 changed the API...2021-11-30T18:25:54ZTiago de Freitas Pereiradask_jobqueue > 0.7.2 changed the API..... and this is breaking `SGEIdiapJob`
`**kwargs` was removed from here
https://github.com/dask/dask-jobqueue/blob/0.7.2/dask_jobqueue/core.py#L132.. and this is breaking `SGEIdiapJob`
`**kwargs` was removed from here
https://github.com/dask/dask-jobqueue/blob/0.7.2/dask_jobqueue/core.py#L132Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/40Nightlies are failing because of this package2021-11-30T14:05:42ZTiago de Freitas PereiraNightlies are failing because of this packageCheck here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ======...Check here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ===================================
______________________ test_dataset_pipeline_with_dask_ml ______________________
def test_dataset_pipeline_with_dask_ml():
scaler = dask_ml.preprocessing.StandardScaler()
pca = dask_ml.decomposition.PCA(n_components=3, random_state=0)
clf = SGDClassifier(random_state=0, loss="log", penalty="l2", tol=1e-3)
clf = dask_ml.wrappers.Incremental(clf, scoring="accuracy")
iris_ds = _build_iris_dataset(shuffle=True)
estimator = mario.xr.DatasetPipeline(
[
dict(
estimator=scaler,
output_dims=[("feature", None)],
input_dask_array=True,
),
dict(
estimator=pca,
output_dims=[("pca_features", 3)],
input_dask_array=True,
),
dict(
estimator=clf,
fit_input=["data", "target"],
output_dims=[],
input_dask_array=True,
fit_kwargs=dict(classes=range(3)),
),
]
)
with dask.config.set(scheduler="synchronous"):
> estimator = estimator.fit(iris_ds)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/tests/test_xarray.py:260:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:551: in fit
self._transform(ds, do_fit=True)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:510: in _transform
block.estimator_ = _fit(*args, block=block)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:243: in _fit
block.estimator.fit(*args, **block.fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:495: in fit
self._fit_for_estimator(estimator, X, y, **fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:479: in _fit_for_estimator
result = fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:139: in fit
return value.compute()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:553: in get_sync
return get_async(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in <listcomp>
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:17: in _partial_fit
model.partial_fit(x, y, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:841: in partial_fit
return self._partial_fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:572: in _partial_fit
X, y = self._validate_data(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/base.py:576: in _validate_data
X, y = check_X_y(X, y, **check_params)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:956: in check_X_y
X = check_array(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
array = ('pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48', 1, 0)
accept_sparse = 'csr'
def check_array(
array,
accept_sparse=False,
*,
accept_large_sparse=True,
dtype="numeric",
order=None,
copy=False,
force_all_finite=True,
ensure_2d=True,
allow_nd=False,
ensure_min_samples=1,
ensure_min_features=1,
estimator=None,
):
"""Input validation on an array, list, sparse matrix or similar.
By default, the input is checked to be a non-empty 2D array containing
only finite values. If the dtype of the array is object, attempt
converting to float, raising on failure.
Parameters
----------
array : object
Input object to check / convert.
accept_sparse : str, bool or list/tuple of str, default=False
String[s] representing allowed sparse matrix formats, such as 'csc',
'csr', etc. If the input is sparse but not in the allowed format,
it will be converted to the first listed format. True allows the input
to be any format. False means that a sparse matrix input will
raise an error.
accept_large_sparse : bool, default=True
If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by
accept_sparse, accept_large_sparse=False will cause it to be accepted
only if its indices are stored with a 32-bit dtype.
.. versionadded:: 0.20
dtype : 'numeric', type, list of type or None, default='numeric'
Data type of result. If None, the dtype of the input is preserved.
If "numeric", dtype is preserved unless array.dtype is object.
If dtype is a list of types, conversion on the first type is only
performed if the dtype of the input is not in the list.
order : {'F', 'C'} or None, default=None
Whether an array will be forced to be fortran or c-style.
When order is None (default), then if copy=False, nothing is ensured
about the memory layout of the output array; otherwise (copy=True)
the memory layout of the returned array is kept as close as possible
to the original array.
copy : bool, default=False
Whether a forced copy will be triggered. If copy=False, a copy might
be triggered by a conversion.
force_all_finite : bool or 'allow-nan', default=True
Whether to raise an error on np.inf, np.nan, pd.NA in array. The
possibilities are:
- True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- 'allow-nan': accepts only np.nan and pd.NA values in array. Values
cannot be infinite.
.. versionadded:: 0.20
``force_all_finite`` accepts the string ``'allow-nan'``.
.. versionchanged:: 0.23
Accepts `pd.NA` and converts it into `np.nan`
ensure_2d : bool, default=True
Whether to raise a value error if array is not 2D.
allow_nd : bool, default=False
Whether to allow array.ndim > 2.
ensure_min_samples : int, default=1
Make sure that the array has a minimum number of samples in its first
axis (rows for a 2D array). Setting to 0 disables this check.
ensure_min_features : int, default=1
Make sure that the 2D array has some minimum number of features
(columns). The default value of 1 rejects empty datasets.
This check is only enforced when the input data has effectively 2
dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0
disables this check.
estimator : str or estimator instance, default=None
If passed, include the name of the estimator in warning messages.
Returns
-------
array_converted : object
The converted and validated array.
"""
if isinstance(array, np.matrix):
warnings.warn(
"np.matrix usage is deprecated in 1.0 and will raise a TypeError "
"in 1.2. Please convert to a numpy array with np.asarray. For "
"more information see: "
"https://numpy.org/doc/stable/reference/generated/numpy.matrix.html", # noqa
FutureWarning,
)
# store reference to original array to check if copy is needed when
# function returns
array_orig = array
# store whether originally we wanted numeric dtype
dtype_numeric = isinstance(dtype, str) and dtype == "numeric"
dtype_orig = getattr(array, "dtype", None)
if not hasattr(dtype_orig, "kind"):
# not a data type (e.g. a column named dtype in a pandas DataFrame)
dtype_orig = None
# check if the object contains several dtypes (typically a pandas
# DataFrame), and store them. If not, store None.
dtypes_orig = None
has_pd_integer_array = False
if hasattr(array, "dtypes") and hasattr(array.dtypes, "__array__"):
# throw warning if columns are sparse. If all columns are sparse, then
# array.sparse exists and sparsity will be preserved (later).
with suppress(ImportError):
from pandas.api.types import is_sparse
if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas.DataFrame with sparse columns found."
"It will be converted to a dense numpy array."
)
dtypes_orig = list(array.dtypes)
# pandas boolean dtype __array__ interface coerces bools to objects
for i, dtype_iter in enumerate(dtypes_orig):
if dtype_iter.kind == "b":
dtypes_orig[i] = np.dtype(object)
elif dtype_iter.name.startswith(("Int", "UInt")):
# name looks like an Integer Extension Array, now check for
# the dtype
with suppress(ImportError):
from pandas import (
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
)
if isinstance(
dtype_iter,
(
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
),
):
has_pd_integer_array = True
if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
dtype_orig = np.result_type(*dtypes_orig)
if dtype_numeric:
if dtype_orig is not None and dtype_orig.kind == "O":
# if input is object, convert to float.
dtype = np.float64
else:
dtype = None
if isinstance(dtype, (list, tuple)):
if dtype_orig is not None and dtype_orig in dtype:
# no dtype conversion required
dtype = None
else:
# dtype conversion required. Let's select the first element of the
# list of accepted types.
dtype = dtype[0]
if has_pd_integer_array:
# If there are any pandas integer extension arrays,
array = array.astype(dtype)
if force_all_finite not in (True, False, "allow-nan"):
raise ValueError(
'force_all_finite should be a bool or "allow-nan". Got {!r} instead'.format(
force_all_finite
)
)
if estimator is not None:
if isinstance(estimator, str):
estimator_name = estimator
else:
estimator_name = estimator.__class__.__name__
else:
estimator_name = "Estimator"
context = " by %s" % estimator_name if estimator is not None else ""
# When all dataframe columns are sparse, convert to a sparse array
if hasattr(array, "sparse") and array.ndim > 1:
# DataFrame.sparse only supports `to_coo`
array = array.sparse.to_coo()
if array.dtype == np.dtype("object"):
unique_dtypes = set([dt.subtype.name for dt in array_orig.dtypes])
if len(unique_dtypes) > 1:
raise ValueError(
"Pandas DataFrame with mixed sparse extension arrays "
"generated a sparse matrix with object dtype which "
"can not be converted to a scipy sparse matrix."
"Sparse extension arrays should all have the same "
"numeric type."
)
if sp.issparse(array):
_ensure_no_complex_data(array)
array = _ensure_sparse_format(
array,
accept_sparse=accept_sparse,
dtype=dtype,
copy=copy,
force_all_finite=force_all_finite,
accept_large_sparse=accept_large_sparse,
)
else:
# If np.array(..) gives ComplexWarning, then we convert the warning
# to an error. This is needed because specifying a non complex
# dtype to the function converts complex to real dtype,
# thereby passing the test made in the lines following the scope
# of warnings context manager.
with warnings.catch_warnings():
try:
warnings.simplefilter("error", ComplexWarning)
if dtype is not None and np.dtype(dtype).kind in "iu":
# Conversion float -> int should not contain NaN or
# inf (numpy#14412). We cannot use casting='safe' because
# then conversion float -> int would be disallowed.
array = np.asarray(array, order=order)
if array.dtype.kind == "f":
_assert_all_finite(array, allow_nan=False, msg_dtype=dtype)
array = array.astype(dtype, casting="unsafe", copy=False)
else:
> array = np.asarray(array, order=order, dtype=dtype)
E ValueError: could not convert string to float: 'pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48'
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:738: ValueError
```https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/39Passing "resources" to dask_jobqueue.core.Job raises an exception2021-11-29T17:13:12ZManuel Günthersiebenkopf@googlemail.comPassing "resources" to dask_jobqueue.core.Job raises an exceptionWhen loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracin...When loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracing down the error, it seems that you are passing the `resources`: https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L347
as a `kwargs` to `__init__`, which are simply passed on to the base class constructor:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L58
I would recommend to have `resources` as a regular parameter in `__init__` so that it is not passed on to the base class constructor.https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/35doctests fail with the new version of xarray2021-10-29T15:34:56ZAmir MOHAMMADIdoctests fail with the new version of xarrayWe build with `xarray 0.18.0` and doctests pass there, but when tested with `xarray 0.19.0` which is the latest in the defaults channel, the doctests fail.
Since I cannot create a doctest that works with both of them, I am suggesting to ...We build with `xarray 0.18.0` and doctests pass there, but when tested with `xarray 0.19.0` which is the latest in the defaults channel, the doctests fail.
Since I cannot create a doctest that works with both of them, I am suggesting to pin xarray till the next minor version (instead of next major version).
See: https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/243826Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/33Breakdown samplesets2021-10-29T15:34:56ZTiago de Freitas PereiraBreakdown samplesetsWe should have a function on `bob.pipelines` where it takes as input a `SampleSet` with `N` samples and outputs `N` `SampleSets` with 1 `Sample` each.
ping @hotroshiWe should have a function on `bob.pipelines` where it takes as input a `SampleSet` with `N` samples and outputs `N` `SampleSets` with 1 `Sample` each.
ping @hotroshihttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/32Handling preprocessor/extractor failures2021-10-29T15:34:56ZAnjith GEORGEanjith.george@idiap.chHandling preprocessor/extractor failuresCurrently, I couldn't find a clean way to handle preprocessor or extractor failures in the pipelines.
What is the best way to emulate the `--allow-missing-files` flag in the previous `spoof.py` in bob9?.
This is a blocking issue in port...Currently, I couldn't find a clean way to handle preprocessor or extractor failures in the pipelines.
What is the best way to emulate the `--allow-missing-files` flag in the previous `spoof.py` in bob9?.
This is a blocking issue in porting some of the examples from previous bob version.https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/31Checkpointwrapper sometimes fails due to temporary disk issues2021-10-29T15:34:56ZAmir MOHAMMADICheckpointwrapper sometimes fails due to temporary disk issuesIt's a good idea to retry a couple of times when saving and loadingIt's a good idea to retry a couple of times when saving and loadingTiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/36Job Failed #2473682021-10-18T15:42:20ZAmir MOHAMMADIJob Failed #247368Job [#247368](https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/247368) failed for 3dab034b96d96a9f95435639b9c892171fb50ac4:
```
+ sphinx-build -aEW /scratch/builds/bob/bob.pipelines/conda/../doc /scratch/builds/bob/bob.pipelines/conda/....Job [#247368](https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/247368) failed for 3dab034b96d96a9f95435639b9c892171fb50ac4:
```
+ sphinx-build -aEW /scratch/builds/bob/bob.pipelines/conda/../doc /scratch/builds/bob/bob.pipelines/conda/../sphinx
Running Sphinx v4.2.0
Adding intersphinx source for `python': https://docs.python.org/3.8/
Adding intersphinx source for `numpy': https://numpy.org/doc/1.21/
Adding intersphinx source for `setuptools': https://setuptools.readthedocs.io/en/latest/
Adding intersphinx source for `scikit-learn': https://scikit-learn.org/stable/
Adding intersphinx source for `dask': https://docs.dask.org/en/latest/
Adding intersphinx source for `dask-jobqueue': https://jobqueue.dask.org/en/latest/
Adding intersphinx source for `distributed': https://distributed.dask.org/en/latest/
Adding intersphinx source for `xarray': https://xarray.pydata.org/en/stable/
Found documentation for bob.extension on http://www.idiap.ch/software/bob/docs/bob/bob.extension/master/; adding intersphinx source
Found documentation for bob.io.base on http://www.idiap.ch/software/bob/docs/bob/bob.io.base/master/; adding intersphinx source
Found documentation for bob.db.base on http://www.idiap.ch/software/bob/docs/bob/bob.db.base/master/; adding intersphinx source
[autosummary] generating autosummary for: checkpoint.rst, dask.rst, index.rst, py_api.rst, sample.rst, xarray.rst
loading intersphinx inventory from https://docs.python.org/3.8/objects.inv...
loading intersphinx inventory from https://numpy.org/doc/1.21/objects.inv...
loading intersphinx inventory from https://setuptools.readthedocs.io/en/latest/objects.inv...
loading intersphinx inventory from https://scikit-learn.org/stable/objects.inv...
loading intersphinx inventory from https://docs.dask.org/en/latest/objects.inv...
loading intersphinx inventory from https://jobqueue.dask.org/en/latest/objects.inv...
loading intersphinx inventory from https://distributed.dask.org/en/latest/objects.inv...
loading intersphinx inventory from https://xarray.pydata.org/en/stable/objects.inv...
loading intersphinx inventory from http://www.idiap.ch/software/bob/docs/bob/bob.extension/master/objects.inv...
loading intersphinx inventory from http://www.idiap.ch/software/bob/docs/bob/bob.io.base/master/objects.inv...
loading intersphinx inventory from http://www.idiap.ch/software/bob/docs/bob/bob.db.base/master/objects.inv...
intersphinx inventory has moved: https://setuptools.readthedocs.io/en/latest/objects.inv -> https://setuptools.pypa.io/en/latest/objects.inv
building [mo]: all of 0 po files
building [html]: all source files
updating environment: [new config] 6 added, 0 changed, 0 removed
reading sources... [ 16%] checkpoint
reading sources... [ 33%] dask
reading sources... [ 50%] index
reading sources... [ 66%] py_api
reading sources... [ 83%] sample
reading sources... [100%] xarray
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [ 16%] checkpoint
writing output... [ 33%] dask
writing output... [ 50%] index
writing output... [ 66%] py_api
writing output... [ 83%] sample
writing output... [100%] xarray
Warning, treated as error:
/scratch/builds/bob/bob.pipelines/doc/dask.rst:101:unknown document: dask:setup/adaptive
```Conda-forge migrationAmir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/25Better handling of file structures with CheckpointWrapper2021-08-13T11:13:09ZTiago de Freitas PereiraBetter handling of file structures with CheckpointWrapperSome databases have file structures very flat (with more than 30k files in a directory).
This is not good for the Idiap file structure and can let our I/O super slow.
We should implement a hash function in `CheckpointWrapper.make_path` ...Some databases have file structures very flat (with more than 30k files in a directory).
This is not good for the Idiap file structure and can let our I/O super slow.
We should implement a hash function in `CheckpointWrapper.make_path` that generates directory names given `sample.key` to limit the number of files in a directory to 1000 files.
ping @lcolbois (this touches experiments with IJB-C)https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/30CSVBaseSampleLoader does not support delayed metadata2020-12-13T11:51:16ZAmir MOHAMMADICSVBaseSampleLoader does not support delayed metadataSince DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in ...Since DelayedSample supports delayed metadata as well, I think it's a good idea that CSVBaseSampleLoader delays the metadata loading as well.
This is really important as when we query the database, we may want to load the annotations in a delayed manner because they might not exist and annotaitons might not be used. see https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/e2459dc5784045261ccc25df204a852bb527239e/bob/pipelines/datasets/sample_loaders.py#L60Bob 9.0.0Tiago de Freitas PereiraTiago de Freitas Pereira