bob.pipelines issueshttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues2020-07-22T14:24:52Zhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/12Relative paths in filelist database causes issues in temp directory2020-07-22T14:24:52ZManuel Günthersiebenkopf@googlemail.comRelative paths in filelist database causes issues in temp directoryAs reported here: https://groups.google.com/forum/#!topic/bob-devel/ESl9AWyJmbA, when using a relative path including `..` in the file lists, there seem to be an issue in the temporary files. Indeed, `biofile.make_path` (see: https://git...As reported here: https://groups.google.com/forum/#!topic/bob-devel/ESl9AWyJmbA, when using a relative path including `..` in the file lists, there seem to be an issue in the temporary files. Indeed, `biofile.make_path` (see: https://gitlab.idiap.ch/bob/bob.db.base/blob/master/bob/db/base/file.py#L65) simply merges paths, which might end up in wrong paths when the `self.path` includes `..`.
I am not quite sure how to tackle this issue.https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/1Possible changing plan2020-03-13T08:32:08ZTiago de Freitas PereiraPossible changing planI will keep this issue opened to keep track of a possible changing plan for Bob.
For the moment, follow below possible candidates that we could archive.
We have 116 packages in total,
- [ ] bob.buildout
- [ ] bob.extension
- [ ] bob.bl...I will keep this issue opened to keep track of a possible changing plan for Bob.
For the moment, follow below possible candidates that we could archive.
We have 116 packages in total,
- [ ] bob.buildout
- [ ] bob.extension
- [ ] bob.blitz
- [ ] bob.core
- [ ] bob.io.base
- [ ] bob.math
- [ ] bob.measure
- [ ] bob.io.image
- [ ] bob.db.base
- [ ] bob.io.video
- [x] bob.io.matlab
- [ ] bob.io.audio
- [ ] bob.sp
- [ ] bob.ap
- [ ] bob.ip.base
- [ ] bob.ip.color
- [x] bob.ip.draw
- [ ] bob.ip.gabor
- [ ] bob.learn.activation
- [x] bob.learn.libsvm
- [ ] bob.learn.linear
- [x] bob.learn.mlp
- [x] bob.learn.boosting
- [ ] bob.db.iris
- [ ] bob.learn.em
- [x] bob.db.wine
- [ ] bob.db.mnist
- [ ] bob.db.atnt
- [ ] bob.ip.facedetect
- [ ] bob.ip.optflow.hornschunck
- [ ] bob.ip.optflow.liu
- [ ] bob.ip.flandmark
- [ ] gridtk
- [ ] bob.ip.qualitymeasure
- [ ] bob.ip.skincolorfilter
- [x] bob.ip.facelandmarks
- [ ] bob.ip.dlib
- [ ] bob.db.arface
- [ ] bob.db.asvspoof
- [ ] bob.db.asvspoof2017
- [ ] bob.db.atvskeystroke
- [ ] bob.db.avspoof
- [x] bob.db.banca
- [x] bob.db.biosecure
- [x] bob.db.biosecurid.face
- [ ] bob.db.casia_fasd
- [x] bob.db.casme2
- [x] bob.db.caspeal
- [x] bob.db.cohface
- [x] bob.db.frgc
- [x] bob.db.gbu
- [x] bob.db.hci_tagging
- [x] bob.db.ijba
- [ ] bob.db.ijbc
- [ ] bob.db.kboc16
- [ ] bob.db.lfw
- [ ] bob.db.livdet2013
- [ ] bob.db.mobio
- [ ] bob.db.msu_mfsd_mod
- [ ] bob.db.multipie
- [ ] bob.db.nist_sre12
- [ ] bob.db.putvein
- [ ] bob.db.replay
- [ ] bob.db.replaymobile
- [x] bob.db.scface
- [ ] bob.db.utfvp
- [ ] bob.db.verafinger
- [ ] bob.db.fv3d
- [ ] bob.db.hkpu
- [ ] bob.db.thufvdt
- [ ] bob.db.mmcbnu6k
- [ ] bob.db.hmtvein
- [ ] bob.db.voicepa
- [x] bob.db.xm2vts
- [ ] bob.db.youtube
- [x] bob.db.pericrosseye
- [ ] bob.db.maskattack
- [ ] bob.db.casiasurf
- [ ] bob.db.fargo
- [ ] bob.bio.base
- [ ] bob.bio.gmm
- [ ] bob.bio.face
- [ ] bob.bio.spear
- [ ] bob.bio.video
- [ ] bob.bio.vein
- [ ] bob.db.voxforge
- [ ] bob.rppg.base
- [ ] bob.pad.base
- [ ] bob.pad.face
- [ ] bob.pad.voice
- [ ] bob.pad.vein
- [ ] bob.fusion.base
- [ ] bob.db.oulunpu
- [ ] bob.db.uvad
- [ ] bob.db.swan
- [ ] bob.db.cuhk_cufs
- [ ] bob.db.cbsr_nir_vis_2
- [ ] bob.db.nivl
- [ ] bob.db.pola_thermal
- [ ] bob.db.cuhk_cufsf
- [ ] bob.db.ldhf
- [ ] bob.ip.tensorflow_extractor
- [ ] bob.learn.tensorflow
- [ ] bob.learn.pytorch
- [ ] bob.bio.face_ongoing
- [ ] bob.bio.htface
- [ ] bob.db.drive
- [ ] bob.db.stare
- [ ] bob.db.chasedb1
- [ ] bob.db.iostar
- [ ] bob.db.hrf
- [ ] bob.db.rimoner3
- [ ] bob.db.drionsdb
- [ ] bob.db.refuge
- [ ] bob.db.drishtigs1
- [ ] bob.ip.binseghttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/2Using scikit learn pipelines with Dask2020-03-02T14:00:48ZTiago de Freitas PereiraUsing scikit learn pipelines with DaskOpening this issue just as a note for posterity.
Today I've done an exercise using scikit-learn pipelines and dask https://ml.dask.org/compose.html
We could leverage from the scikit-api and benefit from its caching mechanism too.
You c...Opening this issue just as a note for posterity.
Today I've done an exercise using scikit-learn pipelines and dask https://ml.dask.org/compose.html
We could leverage from the scikit-api and benefit from its caching mechanism too.
You can check the small snipped below how I use it (I made an adaptor to transform our algorithms in scikit-estimators).
Two things to be observed. I couldn't use the `cache` since most of our stuff is C++ based (not picklable).
And since things are not picklable, I made a very shitty job with the adaptor in order to integrate it with Dask as you can see in code.
In order to have the instance creation of Bob objects in the Worker (like @andre.anjos is doing with the SampleLoader (probably for the same reason (there's another reason for that too))), the method `fit` creates the Bob object and returns itself.
I think the current design is cleaner. I will give up this one.
ping @amohammadi
```python
from sklearn.pipeline import Pipeline
# Local client
import dask.bag
from dask.distributed import Client, LocalCluster
import bob.bio.base
import bob.bio.face
import numpy
cache_dir = "./cache"
from sklearn.base import BaseEstimator
class Scikit2BobEstimator(BaseEstimator):
"""
Base class to adapt from bob algorithms to scikit estimators
Check here for more info:
https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html
"""
def __init__(self, bob_object):
self.bob_class = bob_object
def fit(self, X, y, **kwargs):
self.bob_object = self.bob_class(**kwargs)
return self
def transform(self, X, **kwargs):
"""
Here `X` can be our samples where the annotations can be shipped.
"""
annotations = {"leye": (10, 10), "reye": (20, 10)}
return [self.bob_object(x, annotations=annotations) for x in X]
### Starting the client
cluster = LocalCluster(nanny=False, processes=False, n_workers=1, threads_per_worker=1)
cluster.scale_up(1)
client = Client(cluster)
####### PREPROCESSOR #########
# Using face crop
CROPPED_IMAGE_HEIGHT = 80
CROPPED_IMAGE_WIDTH = CROPPED_IMAGE_HEIGHT * 4 // 5
## eye positions for frontal images
RIGHT_EYE_POS = (CROPPED_IMAGE_HEIGHT // 5, CROPPED_IMAGE_WIDTH // 4 - 1)
LEFT_EYE_POS = (CROPPED_IMAGE_HEIGHT // 5, CROPPED_IMAGE_WIDTH // 4 * 3)
import functools
preprocessor = Scikit2BobEstimator(
functools.partial(
bob.bio.face.preprocessor.FaceCrop,
cropped_image_size=(CROPPED_IMAGE_HEIGHT, CROPPED_IMAGE_WIDTH),
cropped_positions={"leye": LEFT_EYE_POS, "reye": RIGHT_EYE_POS},
)
)
### EXTRACTOR #######
extractor = Scikit2BobEstimator(bob.bio.base.extractor.Linearize)
estimators = [("preprocess", preprocessor), ("extractor", extractor)]
#### HERE I COULD CACHE IT #####
# pipeline = Pipeline(estimators, memory=cache_dir)
pipeline = Pipeline(estimators)
X = [numpy.random.rand(3, 100, 100) for _ in range(100)]
db = dask.bag.from_sequence(X)
db = db.map_partitions(pipeline.fit_transform)
print(db.compute(scheduler=client))
client.shutdown()
```https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/3Dask Adaptative scheduling2020-04-17T11:45:21ZTiago de Freitas PereiraDask Adaptative schedulinghttps://distributed.dask.org/en/latest/_modules/distributed/deploy/adaptive.html
ping @andre.anjos @amohammadihttps://distributed.dask.org/en/latest/_modules/distributed/deploy/adaptive.html
ping @andre.anjos @amohammadihttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/4Implement checkpointable processors2020-03-09T15:49:22ZAmir MOHAMMADIImplement checkpointable processorsCheckpointable process can be integrated with the Sample class and automaticall cache/save their results.Checkpointable process can be integrated with the Sample class and automaticall cache/save their results.https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/5Dask Mixin classes for pipelines2020-03-23T17:21:51ZAmir MOHAMMADIDask Mixin classes for pipelinesIt would be a good idea to have mixin classes to turn transformers dask aware.
This was proposed by @andre.anjos. @tiago.pereira and I have discussed this and see the comments for what we came up with.It would be a good idea to have mixin classes to turn transformers dask aware.
This was proposed by @andre.anjos. @tiago.pereira and I have discussed this and see the comments for what we came up with.Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/6Follow-up from "WIP: Make scikit operations daskable"2020-05-05T07:38:59ZTiago de Freitas PereiraFollow-up from "WIP: Make scikit operations daskable"The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50170): (+2 comments)
> Is it possible to put this inside a scikit.pipe...The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50170): (+2 comments)
> Is it possible to put this inside a scikit.pipeline?Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/7Follow-up from "WIP: Make scikit operations daskable"2020-05-05T07:39:10ZTiago de Freitas PereiraFollow-up from "WIP: Make scikit operations daskable"The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50099): (+2 comments)
> `features_dir` was optional, please revert this.The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50099): (+2 comments)
> `features_dir` was optional, please revert this.Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/8Follow-up from "WIP: Make scikit operations daskable"2020-05-05T07:39:23ZTiago de Freitas PereiraFollow-up from "WIP: Make scikit operations daskable"The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50104): (+3 comments)
> All these dynamic object creation is going to m...The following discussion from !5 should be addressed:
- [ ] @amohammadi started a [discussion](https://gitlab.idiap.ch/bob/bob.pipelines/merge_requests/5#note_50104): (+3 comments)
> All these dynamic object creation is going to make debugging a hell, wouldn't it? Could you print a traceback here when something fails in sklearn estimator?https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/9SampleSet.insert not accepting DelayedSample objects as item2020-04-06T06:25:29ZYannick DAYERSampleSet.insert not accepting DelayedSample objects as itemIn [`bob/pipelines/sample.py`](https://gitlab.idiap.ch/bob/bob.pipelines/blob/master/bob/pipelines/sample.py#L78), in the `SampleSet` class, the `insert(self, index, item)` does not accept `DelayedSample` objects.
(Because of the test: ...In [`bob/pipelines/sample.py`](https://gitlab.idiap.ch/bob/bob.pipelines/blob/master/bob/pipelines/sample.py#L78), in the `SampleSet` class, the `insert(self, index, item)` does not accept `DelayedSample` objects.
(Because of the test: `if not isinstance(item, Sample):`.)
To solve:
- either make `DelayedSample` and `Sample` inherit from a `SampleBase` class,
- change the tests in `SampleSet.insert` to accept the base class an thus all Sample-like classes;
or:
- change the tests in `SampleSet.insert` to accept both `Sample` and `DelayedSample`.
(Could be the same with `__setitem__`.)Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/10Conflicts ad eternum2020-04-17T12:31:42ZTiago de Freitas PereiraConflicts ad eternumHi guys,
I'm facing some problems with our CI and I need some light.
It's been a while that I'm having some enigmatic conflicts, check here one example.
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/195496/raw
`bdt build` doesn't...Hi guys,
I'm facing some problems with our CI and I need some light.
It's been a while that I'm having some enigmatic conflicts, check here one example.
https://gitlab.idiap.ch/bob/bob.pipelines/-/jobs/195496/raw
`bdt build` doesn't work at all on the CI and on my computers.
I'm wondering if it's something with Numpy, because of this message at the logs:
```sh
Attempting to finalize metadata for bob.pipelines
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Adding .* to spec 'numpy 1.16.6' to ensure satisfiability. Please consider putting {{ var_name }}.* or some relational operator (>/</>=/<=) on this spec in meta.yaml, or if req is also a build req, using {{ pin_compatible() }} jinja2 function instead. See https://conda.io/docs/user-guide/tasks/build-packages/variants.html#pinning-at-the-variant-level
WARNING conda_build.utils:ensure_valid_spec(1749): Adding .* to spec 'numpy 1.16.6' to ensure satisfiability. Please consider putting {{ var_name }}.* or some relational operator (>/</>=/<=) on this spec in meta.yaml, or if req is also a build req, using {{ pin_compatible() }} jinja2 function instead. See https://conda.io/docs/user-guide/tasks/build-packages/variants.html#pinning-at-the-variant-level
INFO:bob.devtools.scripts.build@2020-04-17 09:24:21,641: Building bob.pipelines-0.0.1b0-py37 (build: 12) for linux-64
INFO:bob.devtools.bootstrap@2020-04-17 09:24:21,641: environ["BOB_BUILD_NUMBER"] = 12
/scratch/builds/bob/bob.pipelines/miniconda/lib/python3.7/site-packages/conda_build/environ.py:427: UserWarning: The environment variable 'DOCSERVER' is being passed through with value 'http://www.idiap.ch'. If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
UserWarning
/scratch/builds/bob/bob.pipelines/miniconda/lib/python3.7/site-packages/conda_build/environ.py:427: UserWarning: The environment variable 'NOSE_EVAL_ATTR' is being passed through with value ''. If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
UserWarning
BUILD START: ['bob.pipelines-0.0.1b0-py37h9f5372d_12.conda']
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
```
However, if I try to install the `bob.pipelines` setup, everything works fine.
```sh
conda install bob-devel==2020.03.30 dask dask-jobqueue bob.extension bob.io.base -c https://www.idiap.ch/software/bob/conda/label/beta/ --dry-run
```
Can I have some light?
Thanks
ping @andre.anjos @amohammadihttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/11Mixin classes for sklearn estimators should not have an __init__ method2020-04-29T08:05:41ZAmir MOHAMMADIMixin classes for sklearn estimators should not have an __init__ methodI was thinking that we can get away with this but apparently we cannot. Because of the way that BaseEstimator handles params, providing an extra `__init__` method in mixins will break the estimator.
Here is an example:
```python
In [2]:...I was thinking that we can get away with this but apparently we cannot. Because of the way that BaseEstimator handles params, providing an extra `__init__` method in mixins will break the estimator.
Here is an example:
```python
In [2]: from sklearn.svm import SVC
...: from bob.pipelines.mixins import CheckpointMixin, SampleMixin
...: class CheckpointSampleSVC(CheckpointMixin, SampleMixin, SVC):
...: pass
...:
In [8]: original_estimator = SVC()
In [9]: original_estimator
Out[9]:
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
In [10]: original_estimator.set_params(C=2)
Out[10]:
SVC(C=2, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
In [11]: checkpointing_sample_estimator = CheckpointSampleSVC()
In [12]: checkpointing_sample_estimator
Out[12]:
CheckpointSampleSVC(extension='.h5', features_dir=None,
load_func=<function load at 0x7f1ce85e5290>,
model_path=None,
save_func=<function save at 0x7f1ce85e53b0>)
In [13]: checkpointing_sample_estimator.set_params(C=2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-bbed69696a06> in <module>
----> 1 checkpointing_sample_estimator.set_params(C=2)
conda/envs/dask/lib/python3.7/site-packages/sklearn/base.py in set_params(self, **params)
234 'Check the list of available parameters '
235 'with `estimator.get_params().keys()`.' %
--> 236 (key, self))
237
238 if delim:
ValueError: Invalid parameter C for estimator CheckpointSampleSVC(extension='.h5', features_dir=None,
load_func=<function load at 0x7f1ce85e5290>,
model_path=None,
save_func=<function save at 0x7f1ce85e53b0>). Check the list of available parameters with `estimator.get_params().keys()`.
```
`set_params` is important because it is used in classes like https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCVAmir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/13do not propogate _ variables when config chain loading2020-10-16T14:14:36ZAmir MOHAMMADIdo not propogate _ variables when config chain loadingThis is to remind me that when we move config chain loading from bob.extension to here.This is to remind me that when we move config chain loading from bob.extension to here.Bob 9.0.0Amir MOHAMMADIAmir MOHAMMADIhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/14Stacking raw data2020-05-05T15:29:09ZTiago de Freitas PereiraStacking raw dataHi,
I haven't noticed that before merging, but the way this `DelayedSamplesCall` handles data is not super convenient in terms of memory usage, don't you think? https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/bob/pipelines/wrapp...Hi,
I haven't noticed that before merging, but the way this `DelayedSamplesCall` handles data is not super convenient in terms of memory usage, don't you think? https://gitlab.idiap.ch/bob/bob.pipelines/-/blob/master/bob/pipelines/wrappers.py#L57
Data are stacked at the very beginning of the pipeline, which can be huge in size (e.g raw images/video).
The way we had it before https://gitlab.idiap.ch/bob/bob.pipelines/-/merge_requests/26, was more convenient.
This stacking was done only when necessary (once data is "more preprocessed").
Is there any reason to be like that?
Thankshttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/15Sample-based pipelines inefficiencies2020-06-02T08:04:29ZAmir MOHAMMADISample-based pipelines inefficienciesThis is a generic issue that I am raising that I believe we will face moving forward.
The biggest issue that I have found with our sample-based approach is when you have to concatenate samples to make a big array for processing steps suc...This is a generic issue that I am raising that I believe we will face moving forward.
The biggest issue that I have found with our sample-based approach is when you have to concatenate samples to make a big array for processing steps such as `.fit` methods.
The reason for this is that we are looking at samples individually, **even though they might have come from a bigger array**.
Let me demonstrate this with an example:
[sample_stacking_issue.html](/uploads/689b77611dd7d51685b43862be9c2686/sample_stacking_issue.html)
or [sample_stacking_issue.ipynb](/uploads/d2139734a2dae8833104c46b0022b2eb/sample_stacking_issue.ipynb)https://gitlab.idiap.ch/bob/bob.pipelines/-/issues/17Memory error during serialization of large objects2020-05-25T08:48:53ZTiago de Freitas PereiraMemory error during serialization of large objectsThis is an issue that I'm facing for a while.
Now we are running our pipelines in large scale experiments (several thousands of images), the list of SampleSets that we are generating during `pipeline.transform` are getting BIG (>1GB) an...This is an issue that I'm facing for a while.
Now we are running our pipelines in large scale experiments (several thousands of images), the list of SampleSets that we are generating during `pipeline.transform` are getting BIG (>1GB) and this is raising some MemoryError Exceptions during serialization (even when we have enough memory).
This is very annoying, basically, I can't work with large datasets.
I managed to generate a very simple example describing this issue here: https://github.com/dask/distributed/issues/3806
I know we can change the serializer` dask-distributed` uses (https://distributed.dask.org/en/latest/serialization.html#use), but I'm not sure if this is the real problem.
However, I would like to propose a workaround that will slow down a bit the execution of experiments, but, at least, the code will not crash.
I would like to change the serialization behavior of DelayedSamples to this.
```python
class DelayedSample(_ReprMixin):
def __init__(self, load, parent=None, **kwargs):
self.load = load
if parent is not None:
_copy_attributes(self, parent.__dict__)
_copy_attributes(self, kwargs)
self._data = None
@property
def data(self):
"""Loads the data from the disk file."""
if self._data is None:
self._data = self.load()
return self._data
def __getstate__(self):
self._data = None
d = dict(self.__dict__)
return d
```
What do you think? ping @andre.anjos @amohammadi
ping @ydayer
thankshttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/18SampleBatch design issues2020-07-22T14:26:59ZTiago de Freitas PereiraSampleBatch design issuesHi,
Although `SampleBatch` brings convenience and efficiency, it forces us to develop transformers that are compatible with it.
Imagine the simple transformer bellow:
```python
class FakeTransformer(TransformerMixin, BaseEstimator):
...Hi,
Although `SampleBatch` brings convenience and efficiency, it forces us to develop transformers that are compatible with it.
Imagine the simple transformer bellow:
```python
class FakeTransformer(TransformerMixin, BaseEstimator):
def fit(self, X, y=None):
return self
def transform(self, X):
return X + 1
def _more_tags(self):
return {"stateless": True, "requires_fit": False}
```
I can easily use it with numpy arrays as input.
```python
transformer = FakeTransformer()
X = np.zeros(shape=(3, 160, 160))
transformed_X = transformer.transform(X)
```
However, I run into problems once I wrap it as a sample
```python
sample = Sample(X)
transformer_sample = wrap(["sample"], transformer)
my_beautiful_sample = [s.data for s in transformer_sample.transform([sample])]
# THIS DOESN'T WORK
```
With this wrap, the input `X` of `FakeTransformer.transform` will be `SampleBatch` and not numpy array.
Hence, I can't do `X+1`.
I can approach this issue in my transformer by doing this:
```python
def transform(self, X):
X = np.asarray(X)
return X + 1
```
However, this is a blocker if we want to use estimators developed by other people outside of our circle.
Do you think it is sensible to have `X` wrapped as a `SampleBatch` once SampleTransform is used?
It breaks encapsulation.
Thankshttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/19Dask Client as python resources2020-10-12T14:19:51ZTiago de Freitas PereiraDask Client as python resourcesWe should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.We should put the Dask Clients from here: https://gitlab.idiap.ch/bob/bob.pipelines/-/tree/master/bob/pipelines/config/distributed
as python resources.Bob 9.0.0Yannick DAYERYannick DAYERhttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/20Problem while using `sge_default` dask client2020-10-12T11:25:50ZVictor BROSProblem while using `sge_default` dask clientFOr some reason `bob.pipelines.distributed.sge.SGEIdiapJob` is requiring a class variable `config_name`.
I've patched myself to make it work, but this needs a proper fix
```
bob@2020-10-09 14:36:37,427 -- DEBUG: Logging of the `bob' log...FOr some reason `bob.pipelines.distributed.sge.SGEIdiapJob` is requiring a class variable `config_name`.
I've patched myself to make it work, but this needs a proper fix
```
bob@2020-10-09 14:36:37,427 -- DEBUG: Logging of the `bob' logger was set to 3
bob.extension.config@2020-10-09 14:36:37,430 -- DEBUG: Loading configuration file `./experiments/vera-finger/veradb.py'...
bob.extension.config@2020-10-09 14:36:38,765 -- DEBUG: Loading configuration file `./experiments/vera-finger/vera_miura.py'...
bob.bio.base@2020-10-09 14:36:39,001 -- INFO: Using `bob.bio.base` legacy algorithm <class 'bob.bio.vein.algorithm.MiuraMatch'>(ch=80, cw=90, multiple_model_scoring='average', multiple_probe_scoring='average')
bob.extension.config@2020-10-09 14:36:39,002 -- DEBUG: Loading configuration file `./src/bob.pipelines/bob/pipelines/config/distributed/sge_default.py'...
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f92cb83db50>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/distributed-2.30.0-py3.7.egg/distributed/deploy/spec.py:320> exception=ValueError("The class <class 'bob.pipelines.distributed.sge.SGEIdiapJob'> is required to have a 'config_name' class variable.\nIf you have created this class, please add a 'config_name' class variable.\nIf not this may be a bug, feel free to create an issue at: https://github.com/dask/dask-jobqueue/issues/new")>)
Traceback (most recent call last):
File "/idiap/temp/vbros/miniconda3/envs/bob.bio.vein/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/idiap/temp/vbros/miniconda3/envs/bob.bio.vein/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
future.result()
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/distributed-2.30.0-py3.7.egg/distributed/deploy/spec.py", line 348, in _correct_state_internal
worker = cls(self.scheduler.address, **opts)
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/src/bob.pipelines/bob/pipelines/distributed/sge.py", line 56, in __init__
super().__init__(*args, config_name=config_name, death_timeout=10000, **kwargs)
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/dask_jobqueue-0.7.1-py3.7.egg/dask_jobqueue/core.py", line 156, in __init__
default_config_name = self.default_config_name()
File "/remote/idiap.svm/temp.biometric01/vbros/bob_vein/bob.bio.vein/eggs/dask_jobqueue-0.7.1-py3.7.egg/dask_jobqueue/core.py", line 260, in default_config_name
"https://github.com/dask/dask-jobqueue/issues/new".format(cls)
ValueError: The class <class 'bob.pipelines.distributed.sge.SGEIdiapJob'> is required to have a 'config_name' class variable.
```Bob 9.0.0Tiago de Freitas PereiraTiago de Freitas Pereirahttps://gitlab.idiap.ch/bob/bob.pipelines/-/issues/21Who is Mario?2024-01-08T13:58:44ZTiago de Freitas PereiraWho is Mario?There are several places in the code where we alias `bob.pipelines` as Mario.
Well, it was a good joke at the beginning but now we need to think if we want to keep this.There are several places in the code where we alias `bob.pipelines` as Mario.
Well, it was a good joke at the beginning but now we need to think if we want to keep this.Tiago de Freitas PereiraTiago de Freitas Pereira