bob issueshttps://gitlab.idiap.ch/groups/bob/-/issues2022-01-14T09:16:56Zhttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/173Create an option --force in the VanillaBiometrics CLI command....2022-01-14T09:16:56ZTiago de Freitas PereiraCreate an option --force in the VanillaBiometrics CLI command....In that way checkpoints will be regenerated even if they already exists
Related to #152.In that way checkpoints will be regenerated even if they already exists
Related to #152.https://gitlab.idiap.ch/bob/bob/-/issues/271Pytest compatibility2022-03-03T15:47:31ZAmir MOHAMMADIPytest compatibilityWhen we use pytest to test our packages, some tests do not run because they are not found by pytest. Pytest ignores `test.py` files and does not consider them tests!
Looking at the checkout of some bob packages that I have, these package...When we use pytest to test our packages, some tests do not run because they are not found by pytest. Pytest ignores `test.py` files and does not consider them tests!
Looking at the checkout of some bob packages that I have, these packages have a `test.py` file:
```
../bob.pad.base/bob/pad/base/test/test.py
../bob.db.mnist/bob/db/mnist/test.py
../bob.io.image/bob/io/image/test.py
../bob.db.atnt/bob/db/atnt/test.py
../bob.bio.vein/bob/bio/vein/tests/test.py
../bob.blitz/bob/blitz/examples/bob.example.extension/bob/example/extension/test.py
../bob.blitz/bob/blitz/examples/bob.example.library/bob/example/library/test.py
../bob.blitz/bob/blitz/examples/bob.example.project/bob/example/project/test.py
../bob.blitz/bob/blitz/test.py
../bob.devtools/bob/devtools/scripts/test.py
../bob.learn.activation/bob/learn/activation/test.py
../bob.io.stream/bob/io/stream/test/test.py
../bob.ip.stereo/bob/ip/stereo/test/test.py
../bob.io.audio/bob/io/audio/test.py
../bob.io.video/bob/io/video/test.py
../bob.ip.color/bob/ip/color/test.py
../bob.ip.facedetect/bob/ip/flandmark/test.py
../bob.ip.gabor/bob/ip/gabor/test.py
../bob.ip.qualitymeasure/bob/ip/qualitymeasure/test.py
../bob.learn.linear/bob/learn/linear/test.py
../bob.learn.pytorch/bob/learn/pytorch/test/test.py
```Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/172There is no algorithm available to compute average features2021-12-15T14:41:32ZManuel Günthersiebenkopf@googlemail.comThere is no algorithm available to compute average featuresRelated to bob/bob.bio.face#73
The current best way of handling several deep features for enrollment or probing is to compute there average. Currently, this is not implemented. This issue is used to keep track of the implementation of t...Related to bob/bob.bio.face#73
The current best way of handling several deep features for enrollment or probing is to compute there average. Currently, this is not implemented. This issue is used to keep track of the implementation of that feature.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/171In the Algorithm, the model_fusion_function is ignored2022-05-19T14:28:15ZManuel Günthersiebenkopf@googlemail.comIn the Algorithm, the model_fusion_function is ignoredWhile the constructor of the `Algorithm ` class has two parameters dealing with score fusion: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/algorithm/Algorithm.py#L83, one of them i...While the constructor of the `Algorithm ` class has two parameters dealing with score fusion: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/algorithm/Algorithm.py#L83, one of them is ignored. In `score_for_multiple_models` the `model_fusion_function` should b used, but instead the `probe_fusion_function` is used: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/algorithm/Algorithm.py#L218
So far, there is no problem because both have the same default value. But in case someone wants to change only one of them, this is currently not possible.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.bio.face/-/issues/73Implementation of Distance algorithm for deep feature extractors not optimal2021-12-14T17:02:54ZManuel Günthersiebenkopf@googlemail.comImplementation of Distance algorithm for deep feature extractors not optimalThere are two different concepts that have been emerged lately in face recognition with deep features, which have been shown to improve performance considerably:
1. The best way to handle several samples for enrollment or probing is to ...There are two different concepts that have been emerged lately in face recognition with deep features, which have been shown to improve performance considerably:
1. The best way to handle several samples for enrollment or probing is to compute the average of the features.
2. When comparing deep features, use the cosine similarity.
Unfortunately, neither of the two concepts is used in our baselines, when we simply use the `Distance` implementation from `bob.bio.base`, where the default behavior is:
1. When having several features for enrollment or probing, compute the pairwise distances and then use the average of the scores. This is tricky to see since this is hidden in the base class constructor: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/algorithm/Algorithm.py#L83
which will then be translated to computing **average scores** (not the score between averaged features): https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/utils/__init__.py#L27
2. The default comparison function in `Distance` is the Euclidean distance: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/a43b31fd50acc27540ee29924357b8e2301bbe47/bob/bio/base/algorithm/Distance.py#L34
So, when we simply use the default constructor as in here: https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/f494d6cb9ca23d4809e08498d046f2120cb21df3/bob/bio/face/embeddings/pytorch.py#L417
and most probably also in all other implementations, we will get Euclidean instead of cosine distance.
Tasks:
- [ ] Implement the averaging of features both for the enrollment and the probes (in case there are multiple). This can either be done by adapting the existing `Distance` function through adding a different `multiple_model_scoring` or `multiple_probe_scoring` parameter, or by implementing a completely separate Algorithm class for that.
- [ ] Change the default in all of the baselines to use the new behavior, but at least to select the cosine distance instead of Euclidean.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/170We have no clear documentation on how to handle supervised training.....2022-04-25T17:38:13ZTiago de Freitas PereiraWe have no clear documentation on how to handle supervised training........ while using `vanilla-biometrics`.
Internally, we know that we need to add this kwarg to the transformer that does the fit `fit_extra_arguments = (("y", "subject_id"),)` linking the `subject_id` from the sample with the `y` parameter...... while using `vanilla-biometrics`.
Internally, we know that we need to add this kwarg to the transformer that does the fit `fit_extra_arguments = (("y", "subject_id"),)` linking the `subject_id` from the sample with the `y` parameter of the `BaseEstimator`.
However, this is obscure in the documentation.
We need a proper documentation and a simple example.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.video/-/issues/22`test_video_like_container` fails on ARM2022-03-04T14:26:12ZTiago de Freitas Pereira`test_video_like_container` fails on ARMLook at https://gitlab.idiap.ch/bob/bob.bio.video/-/jobs/251433#L2788
https://gitlab.idiap.ch/bob/bob.bio.video/-/jobs/251433Look at https://gitlab.idiap.ch/bob/bob.bio.video/-/jobs/251433#L2788
https://gitlab.idiap.ch/bob/bob.bio.video/-/jobs/251433Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/169CLI plotting commands inconsistent2021-12-15T15:13:31ZManuel Günthersiebenkopf@googlemail.comCLI plotting commands inconsistentWhen using the plotting commands for bob, some of the parameters are expected to be separated by space, and some by comma. For example, the following command does not work:
```
bob bio dir scores-1 scores-2 --legends label-1 label-2
```...When using the plotting commands for bob, some of the parameters are expected to be separated by space, and some by comma. For example, the following command does not work:
```
bob bio dir scores-1 scores-2 --legends label-1 label-2
```
This raises the error:
```
Usage: bob bio dir [OPTIONS] [SCORES]...
Try 'bob bio dir -?' for help.
Error: Invalid value: Number of legends must be >= to the number of systems
```
In fact, the score files must be separated by space, and the legends by comma, in order to work:
```
bob bio dir scores-1 scores-2 --legends label-1,label-2
```
Is there any particular reason for this behavior, i.e., is this expected?https://gitlab.idiap.ch/bob/bob.bio.face/-/issues/72VGG16 preprocessing buggy?2021-12-14T17:45:35ZManuel Günthersiebenkopf@googlemail.comVGG16 preprocessing buggy?When using the VGG16 network, we need to subtract the RGB mean from the channels. As the images are in bob format (`NxCxHxW`), we would need to subtract the mean from `[:,i,:,:]`. Instead, we subtract it from `[:,:,:,i]`:
https://gitlab....When using the VGG16 network, we need to subtract the RGB mean from the channels. As the images are in bob format (`NxCxHxW`), we would need to subtract the mean from `[:,i,:,:]`. Instead, we subtract it from `[:,:,:,i]`:
https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/3567e990d0e523ceb5d3f9598054d8a27d7f7000/bob/bio/face/embeddings/opencv.py#L140
This is most certainly incorrect, especially since we use the correct dimension later on to convert RGB to BGR:
https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/3567e990d0e523ceb5d3f9598054d8a27d7f7000/bob/bio/face/embeddings/opencv.py#L146
Finally, in the pipeline, we define an MTCNN annotator with particular parameters: https://gitlab.idiap.ch/bob/bob.bio.face/-/blob/3567e990d0e523ceb5d3f9598054d8a27d7f7000/bob/bio/face/embeddings/opencv.py#L203
but this is ignored since the pipeline uses `"mtcnn"`.https://gitlab.idiap.ch/bob/bob.io.video/-/issues/19Deprecation in favour of imageio (and imageio-ffmpeg)2022-01-26T08:15:33ZAndré AnjosDeprecation in favour of imageio (and imageio-ffmpeg)@bob: I think it is time to deprecate this package. We have been maintaining this for a long time and it very often requires effort to keep it up.
The package (imageio)[https://github.com/imageio/imageio] seems well documented and main...@bob: I think it is time to deprecate this package. We have been maintaining this for a long time and it very often requires effort to keep it up.
The package (imageio)[https://github.com/imageio/imageio] seems well documented and maintained, and contains a plugin for reading video files through (ffmpeg in frames)[https://github.com/imageio/imageio-ffmpeg], as we do, aside supporting all video formats we do, via the same, or similar libraries.
I propose we just move the effort to maintaining the conda-forge feedstocks (https://github.com/conda-forge/imageio-feedstock/, https://github.com/conda-forge/imageio-ffmpeg-feedstock/) instead.
Please comment.https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/40Nightlies are failing because of this package2021-11-30T14:05:42ZTiago de Freitas PereiraNightlies are failing because of this packageCheck here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ======...Check here
https://gitlab.idiap.ch/bob/nightlies/-/jobs/250661
and
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/250818
This is blocking the development of the upper stack.
```
=================================== FAILURES ===================================
______________________ test_dataset_pipeline_with_dask_ml ______________________
def test_dataset_pipeline_with_dask_ml():
scaler = dask_ml.preprocessing.StandardScaler()
pca = dask_ml.decomposition.PCA(n_components=3, random_state=0)
clf = SGDClassifier(random_state=0, loss="log", penalty="l2", tol=1e-3)
clf = dask_ml.wrappers.Incremental(clf, scoring="accuracy")
iris_ds = _build_iris_dataset(shuffle=True)
estimator = mario.xr.DatasetPipeline(
[
dict(
estimator=scaler,
output_dims=[("feature", None)],
input_dask_array=True,
),
dict(
estimator=pca,
output_dims=[("pca_features", 3)],
input_dask_array=True,
),
dict(
estimator=clf,
fit_input=["data", "target"],
output_dims=[],
input_dask_array=True,
fit_kwargs=dict(classes=range(3)),
),
]
)
with dask.config.set(scheduler="synchronous"):
> estimator = estimator.fit(iris_ds)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/tests/test_xarray.py:260:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:551: in fit
self._transform(ds, do_fit=True)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:510: in _transform
block.estimator_ = _fit(*args, block=block)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/bob/pipelines/xarray.py:243: in _fit
block.estimator.fit(*args, **block.fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:495: in fit
self._fit_for_estimator(estimator, X, y, **fit_kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/wrappers.py:479: in _fit_for_estimator
result = fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:139: in fit
return value.compute()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:553: in get_sync
return get_async(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:234: in <listcomp>
return [execute_task(*a) for a in it]
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/dask_ml/_partial.py:17: in _partial_fit
model.partial_fit(x, y, **kwargs)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:841: in partial_fit
return self._partial_fit(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:572: in _partial_fit
X, y = self._validate_data(
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/base.py:576: in _validate_data
X, y = check_X_y(X, y, **check_params)
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:956: in check_X_y
X = check_array(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
array = ('pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48', 1, 0)
accept_sparse = 'csr'
def check_array(
array,
accept_sparse=False,
*,
accept_large_sparse=True,
dtype="numeric",
order=None,
copy=False,
force_all_finite=True,
ensure_2d=True,
allow_nd=False,
ensure_min_samples=1,
ensure_min_features=1,
estimator=None,
):
"""Input validation on an array, list, sparse matrix or similar.
By default, the input is checked to be a non-empty 2D array containing
only finite values. If the dtype of the array is object, attempt
converting to float, raising on failure.
Parameters
----------
array : object
Input object to check / convert.
accept_sparse : str, bool or list/tuple of str, default=False
String[s] representing allowed sparse matrix formats, such as 'csc',
'csr', etc. If the input is sparse but not in the allowed format,
it will be converted to the first listed format. True allows the input
to be any format. False means that a sparse matrix input will
raise an error.
accept_large_sparse : bool, default=True
If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by
accept_sparse, accept_large_sparse=False will cause it to be accepted
only if its indices are stored with a 32-bit dtype.
.. versionadded:: 0.20
dtype : 'numeric', type, list of type or None, default='numeric'
Data type of result. If None, the dtype of the input is preserved.
If "numeric", dtype is preserved unless array.dtype is object.
If dtype is a list of types, conversion on the first type is only
performed if the dtype of the input is not in the list.
order : {'F', 'C'} or None, default=None
Whether an array will be forced to be fortran or c-style.
When order is None (default), then if copy=False, nothing is ensured
about the memory layout of the output array; otherwise (copy=True)
the memory layout of the returned array is kept as close as possible
to the original array.
copy : bool, default=False
Whether a forced copy will be triggered. If copy=False, a copy might
be triggered by a conversion.
force_all_finite : bool or 'allow-nan', default=True
Whether to raise an error on np.inf, np.nan, pd.NA in array. The
possibilities are:
- True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- 'allow-nan': accepts only np.nan and pd.NA values in array. Values
cannot be infinite.
.. versionadded:: 0.20
``force_all_finite`` accepts the string ``'allow-nan'``.
.. versionchanged:: 0.23
Accepts `pd.NA` and converts it into `np.nan`
ensure_2d : bool, default=True
Whether to raise a value error if array is not 2D.
allow_nd : bool, default=False
Whether to allow array.ndim > 2.
ensure_min_samples : int, default=1
Make sure that the array has a minimum number of samples in its first
axis (rows for a 2D array). Setting to 0 disables this check.
ensure_min_features : int, default=1
Make sure that the 2D array has some minimum number of features
(columns). The default value of 1 rejects empty datasets.
This check is only enforced when the input data has effectively 2
dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0
disables this check.
estimator : str or estimator instance, default=None
If passed, include the name of the estimator in warning messages.
Returns
-------
array_converted : object
The converted and validated array.
"""
if isinstance(array, np.matrix):
warnings.warn(
"np.matrix usage is deprecated in 1.0 and will raise a TypeError "
"in 1.2. Please convert to a numpy array with np.asarray. For "
"more information see: "
"https://numpy.org/doc/stable/reference/generated/numpy.matrix.html", # noqa
FutureWarning,
)
# store reference to original array to check if copy is needed when
# function returns
array_orig = array
# store whether originally we wanted numeric dtype
dtype_numeric = isinstance(dtype, str) and dtype == "numeric"
dtype_orig = getattr(array, "dtype", None)
if not hasattr(dtype_orig, "kind"):
# not a data type (e.g. a column named dtype in a pandas DataFrame)
dtype_orig = None
# check if the object contains several dtypes (typically a pandas
# DataFrame), and store them. If not, store None.
dtypes_orig = None
has_pd_integer_array = False
if hasattr(array, "dtypes") and hasattr(array.dtypes, "__array__"):
# throw warning if columns are sparse. If all columns are sparse, then
# array.sparse exists and sparsity will be preserved (later).
with suppress(ImportError):
from pandas.api.types import is_sparse
if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas.DataFrame with sparse columns found."
"It will be converted to a dense numpy array."
)
dtypes_orig = list(array.dtypes)
# pandas boolean dtype __array__ interface coerces bools to objects
for i, dtype_iter in enumerate(dtypes_orig):
if dtype_iter.kind == "b":
dtypes_orig[i] = np.dtype(object)
elif dtype_iter.name.startswith(("Int", "UInt")):
# name looks like an Integer Extension Array, now check for
# the dtype
with suppress(ImportError):
from pandas import (
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
)
if isinstance(
dtype_iter,
(
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
),
):
has_pd_integer_array = True
if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
dtype_orig = np.result_type(*dtypes_orig)
if dtype_numeric:
if dtype_orig is not None and dtype_orig.kind == "O":
# if input is object, convert to float.
dtype = np.float64
else:
dtype = None
if isinstance(dtype, (list, tuple)):
if dtype_orig is not None and dtype_orig in dtype:
# no dtype conversion required
dtype = None
else:
# dtype conversion required. Let's select the first element of the
# list of accepted types.
dtype = dtype[0]
if has_pd_integer_array:
# If there are any pandas integer extension arrays,
array = array.astype(dtype)
if force_all_finite not in (True, False, "allow-nan"):
raise ValueError(
'force_all_finite should be a bool or "allow-nan". Got {!r} instead'.format(
force_all_finite
)
)
if estimator is not None:
if isinstance(estimator, str):
estimator_name = estimator
else:
estimator_name = estimator.__class__.__name__
else:
estimator_name = "Estimator"
context = " by %s" % estimator_name if estimator is not None else ""
# When all dataframe columns are sparse, convert to a sparse array
if hasattr(array, "sparse") and array.ndim > 1:
# DataFrame.sparse only supports `to_coo`
array = array.sparse.to_coo()
if array.dtype == np.dtype("object"):
unique_dtypes = set([dt.subtype.name for dt in array_orig.dtypes])
if len(unique_dtypes) > 1:
raise ValueError(
"Pandas DataFrame with mixed sparse extension arrays "
"generated a sparse matrix with object dtype which "
"can not be converted to a scipy sparse matrix."
"Sparse extension arrays should all have the same "
"numeric type."
)
if sp.issparse(array):
_ensure_no_complex_data(array)
array = _ensure_sparse_format(
array,
accept_sparse=accept_sparse,
dtype=dtype,
copy=copy,
force_all_finite=force_all_finite,
accept_large_sparse=accept_large_sparse,
)
else:
# If np.array(..) gives ComplexWarning, then we convert the warning
# to an error. This is needed because specifying a non complex
# dtype to the function converts complex to real dtype,
# thereby passing the test made in the lines following the scope
# of warnings context manager.
with warnings.catch_warnings():
try:
warnings.simplefilter("error", ComplexWarning)
if dtype is not None and np.dtype(dtype).kind in "iu":
# Conversion float -> int should not contain NaN or
# inf (numpy#14412). We cannot use casting='safe' because
# then conversion float -> int would be disallowed.
array = np.asarray(array, order=order)
if array.dtype.kind == "f":
_assert_all_finite(array, allow_nan=False, msg_dtype=dtype)
array = array.astype(dtype, casting="unsafe", copy=False)
else:
> array = np.asarray(array, order=order, dtype=dtype)
E ValueError: could not convert string to float: 'pca.transform-98eb05bfe3c4e482e6896d5f42ca3d48'
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/python3.8/site-packages/sklearn/utils/validation.py:738: ValueError
```https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/39Passing "resources" to dask_jobqueue.core.Job raises an exception2021-11-29T17:13:12ZManuel Günthersiebenkopf@googlemail.comPassing "resources" to dask_jobqueue.core.Job raises an exceptionWhen loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracin...When loading the resource `sge`, the following error is thrown:
```
File ".../bob/pipelines/distributed/sge.py", line 57, in __init__
super().__init__(
TypeError: __init__() got an unexpected keyword argument 'resources'
```
Tracing down the error, it seems that you are passing the `resources`: https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L347
as a `kwargs` to `__init__`, which are simply passed on to the base class constructor:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/distributed/sge.py#L58
I would recommend to have `resources` as a regular parameter in `__init__` so that it is not passed on to the base class constructor.https://gitlab.idiap.ch/bob/bob.bio.base/-/issues/168resources.py does not list dask clients2021-11-30T15:17:25ZManuel Günthersiebenkopf@googlemail.comresources.py does not list dask clientsWhile all other parts of the pipeline can be listed through `resources.py`, this is not the case for registered `dask` clients. When running `bob bio pipelines vanilla-biometrics -h` we can see the option `-l, --dask-client`, but curren...While all other parts of the pipeline can be listed through `resources.py`, this is not the case for registered `dask` clients. When running `bob bio pipelines vanilla-biometrics -h` we can see the option `-l, --dask-client`, but currently there is no simple way of listing which clients are registered.Manuel Günthersiebenkopf@googlemail.comManuel Günthersiebenkopf@googlemail.comhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/38local-parallel queue is not setup well2021-12-06T11:09:29ZManuel Günthersiebenkopf@googlemail.comlocal-parallel queue is not setup wellThe setup of the current `local-parallel` configuration does not work as expected, for several reasons:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/config/distributed/local_para...The setup of the current `local-parallel` configuration does not work as expected, for several reasons:
https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/d8162ffc4fa072a14a8a4d7ac3b558de464a56ef/bob/pipelines/config/distributed/local_parallel.py#L10
1. When we set `processes=False`, we will only use the python threading module, which will effectively limit the CPU usage to around 100% (i.e., one core), no matter how many cores we use. Only with `processes=True`, we will get real parallelization.
2. Selecting all possible CPUs via `cpu_count()` by default does not work well. I have a machine with 128 CPU cores, so setting up all 128 cores takes longer than an experiment -- especially when using `processes=False` above, I commonly get a timeout error.
Before, we had something like `local-p4` with 4 parallel cores, and alike. I think it would be a good idea to incorporate several of these here. Are there any objections?https://gitlab.idiap.ch/bob/bob.bio.base/-/issues/167Algorithms with training that requires split by class don't seem to work2021-12-13T08:13:16ZManuel Günthersiebenkopf@googlemail.comAlgorithms with training that requires split by class don't seem to workWhen running a small baseline algorithm, such as `lda`, it seems that the required classes for the training samples is not forwarded to the training algorithm:
```
$ bob bio pipelines vanilla-biometrics -vv atnt lda
...
File ".../bob...When running a small baseline algorithm, such as `lda`, it seems that the required classes for the training samples is not forwarded to the training algorithm:
```
$ bob bio pipelines vanilla-biometrics -vv atnt lda
...
File ".../bob.bio.base/bob/bio/base/transformers/algorithm.py", line 62, in fit
training_data = split_X_by_y(X, y)
File ".../bob.bio.base/bob/bio/base/transformers/__init__.py", line 6, in split_X_by_y
for x1, y1 in zip(X, y):
TypeError: 'NoneType' object is not iterable
```
I have checked what is going on, and it seems that `y=None` in: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/1c3f542ee4d77592146ddc54aa8a51194a853745/bob/bio/base/transformers/__init__.py#L4
called by:
https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/1c3f542ee4d77592146ddc54aa8a51194a853745/bob/bio/base/transformers/algorithm.py#L61
Unfortunately, I cannot trace the issue back further since my experience in debugging `dask` is very limited.
Maybe we should allow to run the pipeline without `dask` -- as far as I understood, the dask-pipeline is only a wrapper around the whole pipeline. Is it possible to skip using the `dask` wrapper and run everything local in a single thread? This would make debugging much easier.
Actually, I wanted to try out the above pipeline to debug my `dask` setup, which does not work.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.demographics/-/issues/2A lot of work to be done in this package2021-11-26T20:18:30ZTiago de Freitas PereiraA lot of work to be done in this packageThis package has the purpose to be the main back for fairness in biometrics.
However, several bits are missing here.
Follow below a TODO list.
- [ ] Proper user guide and documentation
- [ ] Implementation of a pipeline that does PlatSc...This package has the purpose to be the main back for fairness in biometrics.
However, several bits are missing here.
Follow below a TODO list.
- [ ] Proper user guide and documentation
- [ ] Implementation of a pipeline that does PlatScalling and calibration by group
- [ ] Port the regularization strategies that it's implemented on TF to PyTorch.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/166Score normalization pipeline needs some redesign2022-01-13T10:03:34ZTiago de Freitas PereiraScore normalization pipeline needs some redesign`bob.bio.base` implements a pipeline that does several types of score normalization in one shot:
- Z-Norm
- T-Norm
- S-Norm
- ZT-Norm
- Some variations of the adaptative norm.
Although logic (they are all variations of the same thi...`bob.bio.base` implements a pipeline that does several types of score normalization in one shot:
- Z-Norm
- T-Norm
- S-Norm
- ZT-Norm
- Some variations of the adaptative norm.
Although logic (they are all variations of the same thing), this structure doesn't seem to scale to datasets where the number of comparisons explodes to millions of comparisons.
I often face `MemoryError` issues that are super tough to track down (dask memory error).
Furthermore, the code is a bit convoluted. I think we need to break this down into small pieces.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/165Move some features from bob.measure to bob.bio.base2021-11-26T17:45:12ZTiago de Freitas PereiraMove some features from bob.measure to bob.bio.baseAs we've discussed in the last bob meeting we decided to move some biometric specific features from `bob.measure` to `bob.bio.base`
The candidate features are the following (including the plot extension of it):
- bob.measure.cmc
- bo...As we've discussed in the last bob meeting we decided to move some biometric specific features from `bob.measure` to `bob.bio.base`
The candidate features are the following (including the plot extension of it):
- bob.measure.cmc
- bob.measure.dir
- bob.measure.epc
- ANYTHING ELSE?
Depends on: https://gitlab.idiap.ch/bob/bob.measure/-/merge_requests/103
ping @amohammadi @lcolbois @ydayer
Thankshttps://gitlab.idiap.ch/bob/bob.measure/-/issues/66Issue with ROC curve2021-11-29T11:24:22ZTiago de Freitas PereiraIssue with ROC curveHi @andre.anjos,
You've mentioned in our last Bob meeting that there's an issue with the ROC plot in some special cases.
Do you have any file containing the true values and prediction scores that triggers the issue?
ThanksHi @andre.anjos,
You've mentioned in our last Bob meeting that there's an issue with the ROC plot in some special cases.
Do you have any file containing the true values and prediction scores that triggers the issue?
Thankshttps://gitlab.idiap.ch/bob/bob.bio.base/-/issues/164Temporary files and caches are written into the result directory2021-12-14T17:58:32ZManuel Günthersiebenkopf@googlemail.comTemporary files and caches are written into the result directoryIn the old version, we had two different directories to store elements: the `result_directory` and the `temp_directory`. These two directories where there for a purpose: anything inside of `temp` could be easily removed after the experim...In the old version, we had two different directories to store elements: the `result_directory` and the `temp_directory`. These two directories where there for a purpose: anything inside of `temp` could be easily removed after the experiments have finished, while important results were stored in the `results` directory. This separation also allowed to have the temporary files on a local disk -- with much faster access and without backup -- and only the result files in a directory with backups.
Unfortunately, this split has gone in the new version -- for no obvious reasons other than laziness IMHO. Is there any possibility to have some kind of mechanism to have files in `tmp` and cached files in `sampleswrapper`, `biometric_references` and `scores` to be placed in a different directory than the `--output`? Or is there any particular reason that you only want to have a single directory for all output that I am overlooking here?