Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in
bob.bio.base
bob.bio.base
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 14
    • Issues 14
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • bob
  • bob.bio.basebob.bio.base
  • Issues
  • #106

Closed
Open
Opened Mar 29, 2018 by Jaden DIEFENBAUGH@jdiefenbaugh
  • Report abuse
  • New issue
Report abuse New issue

Using Bob as a library: Don't force HDF5 serialization

There are many, many places in bob.bio.base & the associated ecosystem where it is assumed the user wants to serialize information to an HDF5 file (for example, bob.bio.base.PCA's train_projector() always writes to an HDF5 file). This is an issue when using Bob tools in different use-cases & environments, as there's no guarantee that a user wants to write to an HDF5 file. Sometimes the user can't write to files, such as in BEAT, which is the specific use-case that concerns me.

(Disk) serialization should at least be opt-in, and the data that was previously saved to disk by default should be returned by the function instead. For the above PCA example, this would change train_projector() to return the variances by default, and optionally write them to disk. Changes like this is the bare minimum needed to use these Bob tools in BEAT.

Honestly, though, serialization endpoints (disk, network, whatever) in general should be separated from individual Bob tools. A preprocessor/extractor/algorithm/whatever should have a method for general serialization as well as a method for rehydrating the instance using this data (this is already present in many places, but is just hard-coded to write to an HDF5 file). Some bob.serialization package could handle writing this data to disks/caches/networks/whatever.

What does everyone think?

Assignee
Assign to
Bob 9.0.0
Milestone
Bob 9.0.0
Assign milestone
Time tracking
None
Due date
None
1
Labels
enhancement
Assign labels
  • View project labels
Reference: bob/bob.bio.base#106