Database schema and runtime improvements
Databases were originally designed monolithically, assuming the environment where packages necessary to read their contents is fixed (and does not change), and the location of files to be used for the readout is fixed. As an after thought, templates were introduced without a proper formalisation and, as a consequence, templates with the same names may appear in various database JSON declarations, w/o necessarily being the same. This issue supersedes #25 (closed) and gathers all changes required for a revamp in this area:
-
Protocol and set templates should be separated from the database view declaration to avoid repetition and centralise "task-related" declarations. This will effectively simplify the declaration of new datasets -
The root_db
parameter (maybe misspelled here) needs to be externalised as a runtime parameter during the run, as it is currently the case for other runtime prefixes such as algorithm caches. Currently, if one downloads the database view, they need to change this parameter by hand -
The environment required to run the database view to provide the data for the experiment needs to be configured and have an entry on the database JSON declaration. This ensures that changes in the environment (docker image), will imply in new caches being generated at all times. As of today, it can happen that hashes are not regenerated even if the environment changes completely. -
A default db env docker image (possibly named beat.env.databases
) should be provided on docker hub. This avoids conflicts in future because when using multiple databases in one experiment, only one image can be used. -
The prototypes in beat.core
should come configured to use this image. -
This new JSON entry must be documented in beat/docs
and users should be recommended to use this image.
-
Edited by Samuel GAIST