Tags give the ability to mark specific points in history as being important
  • v1.0.0
    Release v1.0.0
    • !37 Revert "For some reason, the class information is not passed in the sample wrapper": This reverts merge request !36
    • !38 [sge] In dask some sublacessd classes need a config name. Fixes #20
    • !40 Add dask-client configurations as resources: Fixes #19 Removes the sge-demanding configuration as all nodes at Idiap have a fast connection now. Depends on bob.bio.base!201
    • !39 [dask][sge] Added the variables idle_timeout and allowed_failures as: part of our .bobrc and added better defaults
    • !41 Added a GPU queue that defaults to short_gpu
    • !43 Allow setting specific attributes of sample: Specify the sample attribute to assign the output of an estimator to, instead of 'data' in SampleWrapper. Specify the attribute of sample to save and load in CheckpointWrapper.
    • !44 Fix sphinx warnings
    • !45 Multiple Changes: * When checkpointing, checkpoing all steps in a pipeline * Better names in dask graph for FunctionTransformer * [xarray] Allow for multi argument transformers * SampleBatch in public API
    • !46 move vstack_features to bob.io.base
    • !48 Improvements on CheckpointWrapper: Added the optional argument hash_fn in the CheckpointWrapper class. Once this is set, sample.key generates a hash code and this hash code is used to compose the final path where sample will be checkpointed. This is optional and generic enough for our purposes. This hash function can be shipped in the database interface. Closes #25
    • !47 Multiple changes: * [DelayedSample] Allow for arbitrary delayed attributes * [SampleBatch] Allow other attributes than data Fixes #26 #24
    • !49 [DelayedSample] Fix issues when an attribute was set
    • !50 [DelayedSample(Set)] make load and delayed_attributes private: This removes the need for a lot of guessing in downstream packages as they can start removing all keys that start with _ when access of the sample's attribute is needed.
    • !51 [dask][sge] Multiqueue updates: In this merge request I: - Simplified the way multi-queue is set in our scripts - Updated our Dask documentation Example ------- Setting the fit method to run on q_short_gpu python pipeline = mario.wrap( ["sample", "checkpoint", "dask"], pipeline, model_path=model_path, fit_tag="q_short_gpu", ) You have to explicitly set the list of resource tags available. python pipeline.fit_transform(...).compute( scheduler=dask_client, resources=cluster.get_sge_resources()
    • !53 Updates: Implemented two updates in this MR - Removed the random behavior on the hash_string function (i had some problems in large scale tests). - Implemented the DelayedSampleSetCached. I need this behavior to speed-up the score computation.
    • !52 [CheckpointWrapper] Allow custom save and load functions through estimator tags
    • !54 Fixed multiqueue: Hi @amohammadi @ydayer I'm fixing here the issue raised with the multiqueue. I was wrongly setting all tasks to run in a particular resource restriction. Now the problem is fixed. To get it running you have to wrap your pipeline in the same way as before and fetch the resources like this python pipeline = bob.pipelines.wrap( ["sample", "checkpoint", "dask"], pipeline, model_path="./", transform_extra_arguments=(("metadata", "metadata"),), fit_tag="q_short_gpu", ) from bob.pipelines.distributed.sge import get_resource_requirements resources = get_resource_requirements(pipeline) pipeline.fit_transform(X_as_sample).compute( scheduler=client, resources=resources )
    • !56 Two new features: - Moved dask_get_partition_size from bob.bio.base to bob.pipelines - Updated the target duration of a task to 10s. Being very aggressive in scale-up
    • !58 Moved the CSVBaseSampleLoader from bob.bio.base to bob.pipelines. This is a general function
    • !55 Moved VALID_DASK_CLIENT_STRINGS to bob.pipelines
    • !59 Dask client names
    • !60 CSVSampleLoaders as transformers: Made CSVSampleLoaders as scikit-learn transformers This is a good idea indeed. I made to classes. The CSVToSampleLoader converts one line to one sample; and AnnotationsLoader that aggregates from CSVToSampleLoader to read annotations using bob.db.base.read_anno.... This is delayed. I'm already porting this stuff on bob.bio.base. Code is way more cleaner. ping @amohammadi @ydayer Closes #30
    • !61 Fixed modules: config files from here are not available once conda install bob.pipelines
    • !62 Implement a new simple generic csv-based database interface: Depends on bob.extension!126
  • v0.0.1b0   First beta [skip-ci]