Skip to content

Algorithms with training that requires split by class don't seem to work

When running a small baseline algorithm, such as lda, it seems that the required classes for the training samples is not forwarded to the training algorithm:

$ bob bio pipelines vanilla-biometrics -vv atnt  lda

...

File ".../bob.bio.base/bob/bio/base/transformers/algorithm.py", line 62, in fit
    training_data = split_X_by_y(X, y)
  File ".../bob.bio.base/bob/bio/base/transformers/__init__.py", line 6, in split_X_by_y
    for x1, y1 in zip(X, y):
TypeError: 'NoneType' object is not iterable

I have checked what is going on, and it seems that y=None in: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/1c3f542ee4d77592146ddc54aa8a51194a853745/bob/bio/base/transformers/__init__.py#L4 called by: https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/1c3f542ee4d77592146ddc54aa8a51194a853745/bob/bio/base/transformers/algorithm.py#L61

Unfortunately, I cannot trace the issue back further since my experience in debugging dask is very limited.

Maybe we should allow to run the pipeline without dask -- as far as I understood, the dask-pipeline is only a wrapper around the whole pipeline. Is it possible to skip using the dask wrapper and run everything local in a single thread? This would make debugging much easier.

Actually, I wanted to try out the above pipeline to debug my dask setup, which does not work.