diff --git a/doc/data_model.rst b/doc/data_model.rst
new file mode 100644
index 0000000000000000000000000000000000000000..1d698abd2e8420e822f1254fd686990f23a4b935
--- /dev/null
+++ b/doc/data_model.rst
@@ -0,0 +1,69 @@
+.. Copyright © 2023 Idiap Research Institute <contact@idiap.ch>
+..
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _mednet.datamodel:
+
+============
+ Data model
+============
+
+The following describes the various parts of our data model, which are used in this documentation and throughout the codebase.
+
+
+Database
+--------
+Data that is downloaded from a data provider, and contains samples in their raw data format.
+The database may contain both data and metadata, and is supposed to exist on disk (or any other storage device)
+in an arbitrary location that is user-configurable, in the user environment.
+For example, databases 1 and 2 for user A may be under /home/user-a/databases/database-1 and /home/user-a/databases/database-2,
+while for user B, they may sit in /groups/medical-data/DatabaseOne and /groups/medical-data/DatabaseTwo.
+
+
+Sample
+------
+The in-memory representation of the raw database samples.
+In this package, it is specified as a two-tuple with a tensor, and metadata (typically label, name, etc.).
+
+
+RawDataLoader
+-------------
+A concrete "functor" that allows one to load the raw data and associated metadata, to create a in-memory Sample representation.
+RawDataLoaders are typically Database-specific due to raw data and metadata encoding varying quite a lot on different databases.
+RawDataLoaders may also embed various pre-processing transformations to render data readily usable such as pre-cropping of black pixel areas,
+or 16-bit to 8-bit auto-level conversion.
+
+
+TransformSequence
+-----------------
+A sequence of callables that allows one to transform torch.Tensor objects into other torch.Tensor objects,
+typically to crop, resize, convert Color-spaces, and the such on raw-data.
+
+
+DatabaseSplit
+-------------
+A dictionary that represents an organization of the available raw data in the database to perform
+an evaluation protocol (e.g. train, validation, test) through datasets (or subsets).
+It is represented as dictionary mapping dataset names to lists of "raw-data" sample representations, which vary in format
+depending on Database metadata availability. RawDataLoaders receive this raw representations and can convert these to in-memory Sample's.
+
+
+ConcatDatabaseSplit
+-------------------
+An extension of a DatabaseSplit, in which the split can be formed by cannibalising various other DatabaseSplits to construct a new evaluation protocol.
+Examples of this are cross-database tests, or the construction of multi-Database training and validation subsets.
+
+
+Dataset
+-------
+An iterable object over in-memory Samples, inherited from the pytorch Dataset definition.
+A dataset in our framework may be completely cached in memory or have in-memory representation of samples loaded on demand.
+After data loading, our datasets can optionally apply a TransformSequence, composed of pre-processing steps defined on a per-model level
+before optionally caching in-memory Sample representations. The "raw" representation of a dataset are the split dictionary values (ie. not the keys).
+
+
+DataModule
+----------
+A datamodule aggregates Splits and RawDataLoaders to provide lightning a known-interface to the complete evaluation protocol (train, validation, prediction and testing)
+required for a full experiment to take place. It automates control over data loading parallelisation and caching inside our framework,
+providing final access to readily-usable pytorch DataLoaders.
diff --git a/doc/index.rst b/doc/index.rst
index b0126ac0b4051d09395d31a4eb25a88424f8a832..fdf8b6d0c399321e33d89b565702fbc96ed160cc 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -50,6 +50,7 @@ User Guide
    :maxdepth: 2
 
    install
+   data_model
    usage/index
    results/index
    references