Decoupling data and annotation loading of BioFile's from database interfaces

As originally proposed in bob.db.base#22 (closed) I think it would make sense to separate data loading and annotation loading from the database object and put the functionality in BioFiles. This effort would also be aligned with our idea of samples in bob.pipelines. Currently I have this:

class BioFile(bob.db.base.File, _ReprMixin):
    """
    A simple base class that defines basic properties of File object for the use
    in verification experiments

    Attributes
    ----------
    client_id : str or int
        The id of the client this file belongs to.
        Its type depends on your implementation.
        If you use an SQL database, this should be an SQL type like Integer or
        String.
    path : object
        see :py:class:`bob.db.base.File` constructor
    file_id : object
        see :py:class:`bob.db.base.File` constructor
    original_directory : str or None
        The path to the original directory of the file
    original_extension : str or None
        The extension of the original files. This attribute is deprecated.
        Please try to include the extension in the ``path`` attribute
    annotation_directory : str or None
        The path to the directory of the annotations
    annotation_extension : str or None
        The extension of annotation files. Default is ``.json``
    annotation_type : str or None
        The type of the annotation file, see
        :any:`bob.db.base.annotations.read_annotation_file`. Default is
        ``json``.
    """

    def __init__(
        self,
        client_id,
        path,
        file_id=None,
        original_directory=None,
        original_extension=None,
        annotation_directory=None,
        annotation_extension=None,
        annotation_type=None,
        **kwargs,
    ):
        super(BioFile, self).__init__(path, file_id, **kwargs)

        # just copy the information
        self.client_id = client_id
        """The id of the client, to which this file belongs to."""
        self.original_directory = original_directory
        self.original_extension = original_extension
        self.annotation_directory = annotation_directory
        self.annotation_extension = annotation_extension or ".json"
        self.annotation_type = annotation_type or "json"

    def load(self):
        """Loads the data at the specified location and using the given extension.
        Override it if you need to load differently.

        Returns
        -------
        object
            The loaded data (normally :py:class:`numpy.ndarray`).
        """
        # get the path
        path = self.make_path(
            self.original_directory or "", self.original_extension or ""
        )
        return bob.io.base.load(path)

    @property
    def annotations(self):
        path = self.make_path(self.annotation_directory or "", self.annotation_extension or "")
        return read_annotation_file(path, annotation_type=self.annotation_type)

which requires a refactoring of our high-level db interfaces (we will not touch low-level db interfaces).

What do you think?

Of course, the load and annotations methods can be overridden per db.

Edited Oct 21, 2020 by Amir MOHAMMADI