Skip to content
Snippets Groups Projects
Commit 34a0feff authored by André Anjos's avatar André Anjos :speech_balloon:
Browse files

[doc] Improve data-model description

parent 05e4c021
No related branches found
No related tags found
No related merge requests found
Pipeline #91257 passed
...@@ -38,18 +38,27 @@ user A may be under ``/home/user-a/databases/database-1`` and ...@@ -38,18 +38,27 @@ user A may be under ``/home/user-a/databases/database-1`` and
Sample Sample
------ ------
The in-memory representation of the raw database samples. In this package, it The in-memory representation of the raw database ``Sample``. It is specified
is specified as a two-tuple with a tensor (or a dictionary with multiple as a dictionary containing at least the following keys:
tensors), and metadata (typically label, name, etc.).
* ``image`` (:py:class:`torch.Tensor`): the image to be analysed
* ``target`` (:py:class:`torch.Tensor`): the target for the current task
* ``name`` (:py:class:`str`): a unique name for this sample
Optionally, depending on the task, the following keys may also be present:
* ``mask`` (:py:class:`torch.Tensor`): an inclusion mask for the input image
and targets. If set, then it is used to evaluate errors only within the
masked area.
RawDataLoader RawDataLoader
------------- -------------
A concrete "functor" that allows one to load the raw data and associated A callable object that allows one to load the raw data and associated metadata,
metadata, to create a in-memory Sample representation. RawDataLoaders are to create a in-memory ``Sample`` representation. Concrete ``RawDataLoader``\s
typically Database-specific due to raw data and metadata encoding varying quite are typically database-specific due to raw data and metadata encoding varying
a lot on different databases. RawDataLoaders may also embed various quite a lot on different databases. ``RawDataLoader``\s may also embed various
pre-processing transformations to render data readily usable such as pre-processing transformations to render data readily usable such as
pre-cropping of black pixel areas, or 16-bit to 8-bit auto-level conversion. pre-cropping of black pixel areas, or 16-bit to 8-bit auto-level conversion.
...@@ -57,27 +66,35 @@ pre-cropping of black pixel areas, or 16-bit to 8-bit auto-level conversion. ...@@ -57,27 +66,35 @@ pre-cropping of black pixel areas, or 16-bit to 8-bit auto-level conversion.
TransformSequence TransformSequence
----------------- -----------------
A sequence of callables that allows one to transform torch.Tensor objects into A sequence of callables that allows one to transform :py:class:`torch.Tensor`
other torch.Tensor objects, typically to crop, resize, convert Color-spaces, objects into other :py:class:`torch.Tensor` objects, typically to crop, resize,
and the such on raw-data. convert color-spaces, and the such on raw-data. TransformSequences are used in
two main parts of this library: to power raw-data loading and transformations
required to fit data into a model (e.g. ensuring images are grayscale or
resized to a certain size), and to implement data-augmentations for
training-time usage.
DatabaseSplit DatabaseSplit
------------- -------------
A dictionary that represents an organization of the available raw data in the A dictionary-like object that represents an organization of the available raw
database to perform an evaluation protocol (e.g. train, validation, test) data in the database to perform an evaluation protocol (e.g. train, validation,
through datasets (or subsets). It is represented as dictionary mapping dataset test) through datasets (or subsets). It is represented as dictionary mapping
names to lists of "raw-data" sample representations, which vary in format dataset names to lists of "raw-data" ``Sample`` representations, which vary in
depending on Database metadata availability. RawDataLoaders receive this raw format depending on Database metadata availability. ``RawDataLoaders`` receive
representations and can convert these to in-memory Sample's. this raw representations and can convert these to in-memory ``Sample``\s. The
:py:class:`mednet.data.split.JSONDatabaseSplit` is concrete example of a
``DatabaseSplit`` implementation that can read the split definition from JSON
files, and is thoroughly at the library to represent the various database
splits supported.
ConcatDatabaseSplit ConcatDatabaseSplit
------------------- -------------------
An extension of a DatabaseSplit, in which the split can be formed by An extension of a ``DatabaseSplit``, in which the split can be formed by
cannibalising various other DatabaseSplits to construct a new evaluation reusing various other ``DatabaseSplit``\s to construct a new evaluation
protocol. Examples of this are cross-database tests, or the construction of protocol. Examples of this are cross-database tests, or the construction of
multi-Database training and validation subsets. multi-Database training and validation subsets.
...@@ -85,20 +102,22 @@ multi-Database training and validation subsets. ...@@ -85,20 +102,22 @@ multi-Database training and validation subsets.
Dataset Dataset
------- -------
An iterable object over in-memory Samples, inherited from the pytorch Dataset An iterable object over in-memory ``Sample``\s, inherited from the
definition. A dataset in our framework may be completely cached in memory or :py:class:`.torch.utils.data.Dataset`. A ``Dataset`` in this framework may be
have in-memory representation of samples loaded on demand. After data loading, completely cached in memory, or have in-memory representation of ``Sample``\s
our datasets can optionally apply a TransformSequence, composed of loaded on demand. After data loading, ``Dataset``\s can optionally apply a
pre-processing steps defined on a per-model level before optionally caching ``TransformSequence``, composed of pre-processing steps defined on a per-model
in-memory Sample representations. The "raw" representation of a dataset are the level before optionally caching in-memory ``Sample`` representations. The "raw"
split dictionary values (ie. not the keys). representation of a ``Dataset`` are the split dictionary values (ie. not the
keys).
DataModule DataModule
---------- ----------
A DataModule aggregates Splits and RawDataLoaders to provide lightning a A ``DataModule`` aggregates ``DatabaseSplit``\s and ``RawDataLoader``\s to
known-interface to the complete evaluation protocol (train, validation, provide lightning a known-interface to the complete evaluation protocol (train,
prediction and testing) required for a full experiment to take place. It validation, prediction and testing) required for a full experiment to take
automates control over data loading parallelisation and caching inside our place. It automates control over data loading parallelisation and caching
framework, providing final access to readily-usable pytorch DataLoaders. inside the framework, providing final access to readily-usable pytorch
``DataLoader``\s.
doc/img/data-model-dark.png

86.4 KiB | W: | H:

doc/img/data-model-dark.png

75.2 KiB | W: | H:

doc/img/data-model-dark.png
doc/img/data-model-dark.png
doc/img/data-model-dark.png
doc/img/data-model-dark.png
  • 2-up
  • Swipe
  • Onion skin
doc/img/data-model-lite.png

87.5 KiB | W: | H:

doc/img/data-model-lite.png

78.2 KiB | W: | H:

doc/img/data-model-lite.png
doc/img/data-model-lite.png
doc/img/data-model-lite.png
doc/img/data-model-lite.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -39,7 +39,7 @@ digraph G { ...@@ -39,7 +39,7 @@ digraph G {
] ]
Sample [ Sample [
label = "{Sample (tuple)|+ tensor: torch.Tensor\l+ metadata: dict[str, Any]\l}" label = "{Sample (dict)|+ image: torch.Tensor\l+ target: torch.Tensor\l+ name: str\l}"
] ]
DataLoader [ DataLoader [
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment