Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in
beat.core
beat.core
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 14
    • Issues 14
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 3
    • Merge Requests 3
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • beat
  • beat.corebeat.core
  • Issues
  • #4

Closed
Open
Opened Aug 26, 2014 by André Anjos@andre.anjos💬
  • Report abuse
  • New issue
Report abuse New issue

Binary data format (migrated from github)

The current data format of choice uses the Python pickling system. This is highly subjective to the project module structure. I'm not sure it is a good idea to keep it hanging around for long.

Our group had a (very good) experience with HDF5. This is a binary format that is quite flexible, fast to read, compact and universally compatible with all major software suites. I believe we should port the data format system to use data instead of Python pickle.

To get started, I'd go for incorporating a dependence on the excellent python package h5py. It does already bring support for all Python/NumPy basic types and is extensible to different data types.

At this point, it would be also nice to introduce data versioning somewhere. Maybe on the "dataformat" descriptors (not sure).

Note: Philip/François are not sure on HDF5. Before discarding it the possibility to use it, it would be interesting to understand why.

Note: Another useful tool would be an automatic converter, that takes HDF5 files and transforms that into JSON descriptors of that format. One can convert HDF5 files to simple data descriptors and, from there, into BEAT's JSON format if necessary.

Note: While debugging, it should be possible to inspect the cache. We currently have no tools to do so, but if we address this item with HDF5, then this should come for free.

Assignee
Assign to
Second BEAT Review
Milestone
Second BEAT Review
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: beat/beat.core#4