Binary data format (migrated from github)
The current data format of choice uses the Python pickling system. This is highly subjective to the project module structure. I'm not sure it is a good idea to keep it hanging around for long.
Our group had a (very good) experience with HDF5. This is a binary format that is quite flexible, fast to read, compact and universally compatible with all major software suites. I believe we should port the data format system to use data instead of Python pickle.
To get started, I'd go for incorporating a dependence on the excellent python package h5py
. It does already bring support for all Python/NumPy basic types and is extensible to different data types.
At this point, it would be also nice to introduce data versioning somewhere. Maybe on the "dataformat" descriptors (not sure).
Note: Philip/François are not sure on HDF5. Before discarding it the possibility to use it, it would be interesting to understand why.
Note: Another useful tool would be an automatic converter, that takes HDF5 files and transforms that into JSON descriptors of that format. One can convert HDF5 files to simple data descriptors and, from there, into BEAT's JSON format if necessary.
Note: While debugging, it should be possible to inspect the cache. We currently have no tools to do so, but if we address this item with HDF5, then this should come for free.