Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • bob.bio.base bob.bio.base
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 14
    • Issues 14
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • bobbob
  • bob.bio.basebob.bio.base
  • Issues
  • #106
Closed
Open
Issue created Mar 29, 2018 by Jaden DIEFENBAUGH@jdiefenbaughContributor

Using Bob as a library: Don't force HDF5 serialization

There are many, many places in bob.bio.base & the associated ecosystem where it is assumed the user wants to serialize information to an HDF5 file (for example, bob.bio.base.PCA's train_projector() always writes to an HDF5 file). This is an issue when using Bob tools in different use-cases & environments, as there's no guarantee that a user wants to write to an HDF5 file. Sometimes the user can't write to files, such as in BEAT, which is the specific use-case that concerns me.

(Disk) serialization should at least be opt-in, and the data that was previously saved to disk by default should be returned by the function instead. For the above PCA example, this would change train_projector() to return the variances by default, and optionally write them to disk. Changes like this is the bare minimum needed to use these Bob tools in BEAT.

Honestly, though, serialization endpoints (disk, network, whatever) in general should be separated from individual Bob tools. A preprocessor/extractor/algorithm/whatever should have a method for general serialization as well as a method for rehydrating the instance using this data (this is already present in many places, but is just hard-coded to write to an HDF5 file). Some bob.serialization package could handle writing this data to disks/caches/networks/whatever.

What does everyone think?

Assignee
Assign to
Time tracking