beat.core issueshttps://gitlab.idiap.ch/beat/beat.core/-/issues2017-08-06T11:17:04Zhttps://gitlab.idiap.ch/beat/beat.core/-/issues/22Analyzers with the same content but different versions share the same hash2017-08-06T11:17:04ZLaurent EL SHAFEYAnalyzers with the same content but different versions share the same hashIn our current design, analyzers with the same content but different versions share the same hash.
This is problematic, since the cache files for the analyzers include the 'algorithm_name/version' in their headers. Since they share th...In our current design, analyzers with the same content but different versions share the same hash.
This is problematic, since the cache files for the analyzers include the 'algorithm_name/version' in their headers. Since they share the same hash, loading a cache file for a requested 'algorithm_name/version1' may lead to loading a cache file with 'algorithm_name/version2' hardcoded in the header, which will fail, because of the 'dataformat validation'.
To avoid this problem, one possible solution is to change the way hash are generated for analyzer blocks.Second BEAT Reviewhttps://gitlab.idiap.ch/beat/beat.core/-/issues/20Reliability of data I/O2017-08-06T11:17:04ZAndré AnjosReliability of data I/OSimilarly to what we did for the indexes, we should also promote some check-summing capabilities for the data which is produced and consumed in our environments. I leave this open for discussion here.Similarly to what we did for the indexes, we should also promote some check-summing capabilities for the data which is produced and consumed in our environments. I leave this open for discussion here.Second BEAT ReviewLaurent EL SHAFEYLaurent EL SHAFEYhttps://gitlab.idiap.ch/beat/beat.core/-/issues/4Binary data format (migrated from github)2018-07-30T08:57:48ZAndré AnjosBinary data format (migrated from github)The current data format of choice uses the Python pickling system. This is highly subjective to the project module structure. I'm not sure it is a good idea to keep it hanging around for long.
Our group had a (very good) experience wi...The current data format of choice uses the Python pickling system. This is highly subjective to the project module structure. I'm not sure it is a good idea to keep it hanging around for long.
Our group had a (very good) experience with HDF5. This is a binary format that is quite flexible, fast to read, compact and universally compatible with all major software suites. I believe we should port the data format
system to use data instead of Python pickle.
To get started, I'd go for incorporating a dependence on the excellent python package `h5py`. It does already bring support for all Python/NumPy basic types and is extensible to different data types.
At this point, it would be also nice to introduce data versioning somewhere. Maybe on the "dataformat" descriptors (not sure).
> Note: Philip/François are not sure on HDF5. Before discarding it the possibility to use it, it would be interesting to
> understand why.
> Note: Another useful tool would be an automatic converter, that takes HDF5 files and transforms that into JSON
> descriptors of that format. One can convert HDF5 files to simple data descriptors and, from there, into BEAT's
> JSON format if necessary.
> Note: While debugging, it should be possible to inspect the cache. We currently have no tools to do so,
> but if we address this item with HDF5, then this should come for free.
Second BEAT ReviewLaurent EL SHAFEYLaurent EL SHAFEY