Decoupling data and annotation loading of BioFile's from database interfaces
As originally proposed in bob.db.base#22 (closed)
I think it would make sense to separate data loading and annotation loading from the database
object and put the functionality in BioFile
s.
This effort would also be aligned with our idea of samples in bob.pipelines. Currently I have this:
class BioFile(bob.db.base.File, _ReprMixin):
"""
A simple base class that defines basic properties of File object for the use
in verification experiments
Attributes
----------
client_id : str or int
The id of the client this file belongs to.
Its type depends on your implementation.
If you use an SQL database, this should be an SQL type like Integer or
String.
path : object
see :py:class:`bob.db.base.File` constructor
file_id : object
see :py:class:`bob.db.base.File` constructor
original_directory : str or None
The path to the original directory of the file
original_extension : str or None
The extension of the original files. This attribute is deprecated.
Please try to include the extension in the ``path`` attribute
annotation_directory : str or None
The path to the directory of the annotations
annotation_extension : str or None
The extension of annotation files. Default is ``.json``
annotation_type : str or None
The type of the annotation file, see
:any:`bob.db.base.annotations.read_annotation_file`. Default is
``json``.
"""
def __init__(
self,
client_id,
path,
file_id=None,
original_directory=None,
original_extension=None,
annotation_directory=None,
annotation_extension=None,
annotation_type=None,
**kwargs,
):
super(BioFile, self).__init__(path, file_id, **kwargs)
# just copy the information
self.client_id = client_id
"""The id of the client, to which this file belongs to."""
self.original_directory = original_directory
self.original_extension = original_extension
self.annotation_directory = annotation_directory
self.annotation_extension = annotation_extension or ".json"
self.annotation_type = annotation_type or "json"
def load(self):
"""Loads the data at the specified location and using the given extension.
Override it if you need to load differently.
Returns
-------
object
The loaded data (normally :py:class:`numpy.ndarray`).
"""
# get the path
path = self.make_path(
self.original_directory or "", self.original_extension or ""
)
return bob.io.base.load(path)
@property
def annotations(self):
path = self.make_path(self.annotation_directory or "", self.annotation_extension or "")
return read_annotation_file(path, annotation_type=self.annotation_type)
which requires a refactoring of our high-level db interfaces (we will not touch low-level db interfaces).
What do you think?
Of course, the load and annotations methods can be overridden per db.