Evidence collection
Release notes
-
!109: Add a way to retrieve protocol definition files
Removes
bob.extension
'sget_file()
. -
Be more lenient with the dependencies version pinning.
Evidence collection
Release notes
-
!99: remove sampleloaders and prepare for bob.bio.base!300
Needed for bob.bio.base!300
-
!102: Ci refactoring
Refactoring of the CI process.
Linked to Issue #42
-
!104: Bob extension replacement
Part of the replacement of bob.extension to exposed and auto-intersphinx. Related to #43.
-
!103: Add protocols as classmethod for FileListDatabase
Allows inheriting classes to retrieve a default protocols definition file and list protocols.
-
!105: [utils.py] changed return type in check_parameters_for_validity to ensure that a list is returned
Closes #44
-
!106: Update deprecated dask-jobqueue names
Parameters from dask-jobqueue's classes (
Job
andJobQueueCluster
) will change name soon. This follows those changes (job_extra
tojob_extra_directives
andenv_extra
tojob_script_prologue
.A config option (
jobqueue.sge.job-extra
) became invalid due to the name changes and returnedNone
which was not handled correctly, making the submit commands fail silently and the scheduler waiting for the jobs. The option is now renamed. -
!108: Change UserDefaults calls to match last implementation
Changed UserDefaults calls to match the last implementation done in the package exposed.
-
!109: Add a way to retrieve protocol definition files
Removes
bob.extension
'sget_file()
. -
!110: Modifying rc file name to bobrc.toml
-
!111: [pyproject.toml] Changing documentation link to master/sphinx
-
!112: Replace clapp by clapper.
-
!113: meta [entry-points]: Revert dask.client group name
Switch back to
dask.client
instead ofbob.pipelines.dask.client
forthe dask Client entry-points group name in
pyproject.toml
.Fixes #46.
-
!114: meta [readme]: Switch the README.rst to markdown
Renames README.rst to README.md to be supported by the release script.
-
!115: meta(deps): add bob as dependency in new structure
Adapt to the new structure of bob with
bob/bob
on top.
Evidence collection
Release notes
- !101 Pin numpy on the minor version: Prevents increment of numpy minor version over the bob.devtools defined pin.
Evidence collection
Release notes
-
!97 pipeline wrappers tweaks: 1.
SampleWrapper
to be able choose the type of output:Sample
vsDelayedSample
2.SampleWrapper
to make sure there is no invalid samples when callingfit
3.DaskWrapper
to avoid callingfit
multiple times - !100 Fix the doctest of xarray failing on python 3.8
-
!98 DelayedSample tweak: 1. Make
kwargs
take precedence over parents'delayed_attributes
. This change is made to follow more closely the implementation of theSample
class. 2. Make sure an attribute is not present in bothdelayed_attributes
andkwargs
of__init__
function. Which is semantically not sound.
Evidence collection
Release notes
- !96 Fix Dask documentation: This MR fixes the issues with the Dask documentation
Evidence collection
Release notes
- !79 Fixing compatibility issues with dask_jobqueue=0.7.2: closes #37 Unfortunately we can't test this on the CI (there's no SGE there)
- !78 pin dask versions more strictly: Fixes #40
- !77 Resolve "local-parallel queue is not setup well": Closes #38
-
!80 Implemented a force mechanism: Created a
force
option for the CheckpointWrapper Related to: bob.bio.base#173 -
!81 Fix get_bob_tags to return default tags: When passing
None
as estimator to get_bob_tags, returns the default tags. -
!84 Created a function checking if a Scikit learn pipeline is wrapped: Created the function
is_estimator_wrapped
-
!70 Handle estimator tags in wrapper classes: Allows setting some parameters of the
SampleWrapper
andCheckpointWrapper
via estimator tags. bob.bio.base#143 - !83 [dask] Convert dask bags to arrays more efficiently: Most inefficiencies were coming from that fact that we were creating a dask array with each sample as a separate chunk.
- !82 breaking: checkpoint the inner estimator only
-
!85 Prevent a reference invalidation when wrapped with sample and checkpoint.: Prevents creating a new estimator when loading a sample-wrapped estimator with
CheckpointWrapper
(continuation of !82 which prevented the creation of a new estimator right "below"). This now checks if the estimator is wrapped withSampleWrapper
and updates the estimator at that level. Fixes bob.bio.gmm#30. - !86 Fix fit extra parameters: Allowed extra fit parameters to be non-array (e.g. str). Added a tag to prevent stacking of the input array of the fit method if it expects partitioned data.
- !87 Add a non-adaptive io-big queue
-
!88 Add support for fitting estimators on dask bags: The estimators that can handle dask bags should set the
bob_fit_supports_dask_bag
as True. This commit also includes * Adds a new tag:bob_fit_supports_dask_bag
* Adds a new tag:bob_checkpoint_features
for when you want to always avoid checkpointing features for a specific estimator. * Expose dask_tags, get_bob_tags in the main API * The SampleWrapper was modified to supportbob_fit_supports_dask_bag
* The CheckpointWrapper now loads estimators without losing references correctly. - !89 Load checkpointed estimators inside the scheduler: Also adds resilience to loading checkpointed samples
- !90 replace is_estimator_stateless with estimator_requires_fit: The actual code before meant to check if an estimator requires fit or not while the function was named is_estimator_stateless.
- !92 better logging overall
- !91 Many API changes: Expose utils API in the root API. Fix the docs API. Remove unused transformers. Fix SGE GPU submissions.
- !93 Add documentation for CSV databases
Release notes
- !63 Implemented a mechanism in the Checkpoint wrapper that asserts if data was...: Implemented a mechanism in the Checkpoint wrapper that asserts if data was properly written in the disk Closes #31
- !66 Handled failed processing (Failure to Acquire) in the wrappers: Fixes #32
- !67 Some minor updates on the checkpoint wrapper and SGE
-
!68 Fix parent's delayed_attributes modified by child: A
DelayedSample
child'sdelayed_attributes
is no longer referencing the parent'sdelayed_attributes
. - !69 [SampleSet] Do not load delayed attributes by not copying them over
- !71 [CheckpointWrapper] Use atomic writing when saving features
-
!72 + breakdown_SampleSet: As desciribed in #33, a function is added which takes as input a
SampleSet
withN
samples and outputsN
SampleSets
with 1Sample
each. ping @tiago.pereira Closes #33 - !73 Remove samples_to_hdf5 methods: These methods were not used anywhere.
- !74 Add a DelayedSample.from_sample classmethod: This method can be used to transparently create new DelayedSamples from either Samples or DelayedSamples without loading delayed attributes and data
-
!64 Fix delayed attributes: Delayed attributes are no longer loaded when
_copy_attributes
is called to create a DelayedSample. -
!75 Add worker Time To Live limitation: Hello, I have regularly been annoyed by Dask runs that hang indefinitely because of some workers being disconnected from the scheduler. In this case, the scheduler actually assumes the worker must still be doing its job so it doesn't reassign the task, leading to a completely blocked run that needs to be interrupted by hand. This typically happens on very heavy experiments e.g. on IJBC, FRGC. From what I understand this can be handled using the
worker_ttl
parameter of the scheduler, which puts a limit on how long a worker can be unseen by the scheduler before being killed and reassigning its task. It isNone
by default, I have been working for a while on a local branch where I set the default to 60s, it helped quite a lot. I am proposing to merge this change, however I wanted to know what you think of it. My main concern is that it might be hiding some underlying issue (why do the workers actually disconnect ?), so I am not 100% sure it's a good change to make. ping @tiago.pereira @amohammadi - !76 [docs] update docs to match new API of xarray: Fixes #35 Disabled testing Sphinx docs on mac builds.
Release notes
- !37 Revert "For some reason, the class information is not passed in the sample wrapper": This reverts merge request !36
- !38 [sge] In dask some sublacessd classes need a config name. Fixes #20
- !40 Add dask-client configurations as resources: Fixes #19 Removes the sge-demanding configuration as all nodes at Idiap have a fast connection now. Depends on bob.bio.base!201
-
!39 [dask][sge] Added the variables
idle_timeout
andallowed_failures
as: part of our.bobrc
and added better defaults - !41 Added a GPU queue that defaults to short_gpu
- !43 Allow setting specific attributes of sample: Specify the sample attribute to assign the output of an estimator to, instead of 'data' in SampleWrapper. Specify the attribute of sample to save and load in CheckpointWrapper.
- !44 Fix sphinx warnings
- !45 Multiple Changes: * When checkpointing, checkpoing all steps in a pipeline * Better names in dask graph for FunctionTransformer * [xarray] Allow for multi argument transformers * SampleBatch in public API
- !46 move vstack_features to bob.io.base
-
!48 Improvements on CheckpointWrapper: Added the optional argument
hash_fn
in theCheckpointWrapper
class. Once this is set,sample.key
generates a hash code and this hash code is used to compose the final path wheresample
will be checkpointed. This is optional and generic enough for our purposes. This hash function can be shipped in the database interface. Closes #25 - !47 Multiple changes: * [DelayedSample] Allow for arbitrary delayed attributes * [SampleBatch] Allow other attributes than data Fixes #26 #24
- !49 [DelayedSample] Fix issues when an attribute was set
-
!50 [DelayedSample(Set)] make load and delayed_attributes private: This removes the need for a lot of guessing in downstream packages as they can start removing all keys that start with
_
when access of the sample's attribute is needed. -
!51 [dask][sge] Multiqueue updates: In this merge request I: - Simplified the way multi-queue is set in our scripts - Updated our Dask documentation Example ------- Setting the
fit
method to run onq_short_gpu
python pipeline = mario.wrap( ["sample", "checkpoint", "dask"], pipeline, model_path=model_path, fit_tag="q_short_gpu", )
You have to explicitly set the list of resource tags available.python pipeline.fit_transform(...).compute( scheduler=dask_client, resources=cluster.get_sge_resources()
-
!53 Updates: Implemented two updates in this MR - Removed the random behavior on the hash_string function (i had some problems in large scale tests). - Implemented the
DelayedSampleSetCached
. I need this behavior to speed-up the score computation. - !52 [CheckpointWrapper] Allow custom save and load functions through estimator tags
-
!54 Fixed multiqueue: Hi @amohammadi @ydayer I'm fixing here the issue raised with the multiqueue. I was wrongly setting all tasks to run in a particular resource restriction. Now the problem is fixed. To get it running you have to wrap your pipeline in the same way as before and fetch the resources like this
python pipeline = bob.pipelines.wrap( ["sample", "checkpoint", "dask"], pipeline, model_path="./", transform_extra_arguments=(("metadata", "metadata"),), fit_tag="q_short_gpu", ) from bob.pipelines.distributed.sge import get_resource_requirements resources = get_resource_requirements(pipeline) pipeline.fit_transform(X_as_sample).compute( scheduler=client, resources=resources )
- !56 Two new features: - Moved dask_get_partition_size from bob.bio.base to bob.pipelines - Updated the target duration of a task to 10s. Being very aggressive in scale-up
- !58 Moved the CSVBaseSampleLoader from bob.bio.base to bob.pipelines. This is a general function
- !55 Moved VALID_DASK_CLIENT_STRINGS to bob.pipelines
- !59 Dask client names
-
!60 CSVSampleLoaders as transformers: Made CSVSampleLoaders as scikit-learn transformers This is a good idea indeed. I made to classes. The
CSVToSampleLoader
converts one line to one sample; andAnnotationsLoader
that aggregates fromCSVToSampleLoader
to read annotations usingbob.db.base.read_anno...
. This is delayed. I'm already porting this stuff onbob.bio.base
. Code is way more cleaner. ping @amohammadi @ydayer Closes #30 -
!61 Fixed modules: config files from here are not available once
conda install bob.pipelines
- !62 Implement a new simple generic csv-based database interface: Depends on bob.extension!126