Skip to content
Snippets Groups Projects
Commit 17cb2882 authored by André Anjos's avatar André Anjos :speech_balloon:
Browse files

[data] Improve datamodule documentation across different databases

parent 1ebfb876
No related branches found
No related tags found
1 merge request!6Making use of LightningDataModule and simplification of data loading
Pipeline #76698 failed
...@@ -25,14 +25,14 @@ class DataModule(CachingDataModule): ...@@ -25,14 +25,14 @@ class DataModule(CachingDataModule):
in computer-aided diagnosis of pulmonary diseases with a special in computer-aided diagnosis of pulmonary diseases with a special
focus on pulmonary tuberculosis (TB). focus on pulmonary tuberculosis (TB).
* Original resolution (height x width or width x height): more than 1024 x 1024 * Original images PNG, 8-bit grayscale, 1024 x 1024 pixels
* Split reference: [INDIAN-2013]_ with 20% of train set for the validation set * Split reference: [INDIAN-2013]_ with 20% of train set for the validation set
Data specifications: Data specifications:
* Raw data input (on disk): * Raw data input (on disk):
* PNG images (grayscale, encoded as RGB images with "inverted" grayscale scale) * PNG RGB 8-bit depth images with "inverted" grayscale scale
* Variable width and height * Variable width and height
* Output image: * Output image:
...@@ -41,12 +41,14 @@ class DataModule(CachingDataModule): ...@@ -41,12 +41,14 @@ class DataModule(CachingDataModule):
* Load raw PNG with :py:mod:`PIL` * Load raw PNG with :py:mod:`PIL`
* Remove black borders * Remove black borders
* Convert to torch tensor
* Torch center cropping to get square image * Torch center cropping to get square image
* Final specifications: * Final specifications:
* Grayscale, encoded as a single plane image, 8 bits * Grayscale, encoded as a single plane tensor, 32-bit floats,
* Square, with varying resolutions, depending on the input image square, with varying resolutions, depending on the input raw image
* Labels: 0 (healthy), 1 (active tuberculosis)
""" """
def __init__(self, split_filename: str): def __init__(self, split_filename: str):
......
...@@ -120,12 +120,14 @@ class DataModule(CachingDataModule): ...@@ -120,12 +120,14 @@ class DataModule(CachingDataModule):
* Load raw PNG with :py:mod:`PIL` * Load raw PNG with :py:mod:`PIL`
* Remove black borders * Remove black borders
* Convert to torch tensor
* Torch center cropping to get square image * Torch center cropping to get square image
* Final specifications * Final specifications
* Grayscale, encoded as a single plane image, 8 bits * Grayscale, encoded as a single plane tensor, 32-bit floats,
* Square (4020x4020 px) square at 4020 x 4020 pixels
* Labels: 0 (healthy), 1 (active tuberculosis)
""" """
def __init__(self, split_filename: str): def __init__(self, split_filename: str):
......
...@@ -127,7 +127,11 @@ class DataModule(CachingDataModule): ...@@ -127,7 +127,11 @@ class DataModule(CachingDataModule):
CheXNeXt study. CheXNeXt study.
* Reference: [NIH-CXR14-2017]_ * Reference: [NIH-CXR14-2017]_
* Original resolution (height x width): 1024 x 1024 * Raw data input (on disk):
* PNG RGB 8-bit depth images
* Resolution: 1024 x 1024 pixels
* Labels: [CHEXNEXT-2018]_ * Labels: [CHEXNEXT-2018]_
* Split reference: [CHEXNEXT-2018]_ * Split reference: [CHEXNEXT-2018]_
* Protocol ``default``: * Protocol ``default``:
...@@ -141,11 +145,26 @@ class DataModule(CachingDataModule): ...@@ -141,11 +145,26 @@ class DataModule(CachingDataModule):
* Transforms: * Transforms:
* Load raw PNG with :py:mod:`PIL` * Load raw PNG with :py:mod:`PIL`
* Convert to torch tensor
* Final specifications
* Final specifications:
* RGB, encoded as a 3-plane image, 8 bits
* Square (1024x1024 px) * RGB, encoded as a 3-plane tensor, 32-bit floats, square (1024x1024 px)
* Labels in order:
* cardiomegaly
* emphysema
* effusion
* hernia
* infiltration
* mass
* nodule
* atelectasis
* pneumothorax
* pleural thickening
* pneumonia
* fibrosis
* edema
* consolidation
""" """
def __init__(self, split_filename: str): def __init__(self, split_filename: str):
......
...@@ -112,14 +112,14 @@ class DataModule(CachingDataModule): ...@@ -112,14 +112,14 @@ class DataModule(CachingDataModule):
Philips DR Digital Diagnose systems. Philips DR Digital Diagnose systems.
* Database reference: [MONTGOMERY-SHENZHEN-2014]_ * Database reference: [MONTGOMERY-SHENZHEN-2014]_
* Original resolution (height x width or width x height): 3000 x 3000 or less
Data specifications: Data specifications:
* Raw data input (on disk): * Raw data input (on disk):
* PNG images (grayscale, encoded as RGB images with "inverted" grayscale scale) * PNG 8-bit RGB images (grayscale, but encoded as RGB images with
* Variable width and height "inverted" grayscale scale requiring special treatment).
* Variable width and height of 3000 x 3000 pixels or less
* Output image: * Output image:
...@@ -131,8 +131,9 @@ class DataModule(CachingDataModule): ...@@ -131,8 +131,9 @@ class DataModule(CachingDataModule):
* Final specifications: * Final specifications:
* Grayscale, encoded as a single plane image, 8 bits * Grayscale, encoded as a single plane tensor, 32-bit floats,
* Square, with varying resolutions, depending on the input image square with varying resolutions, depending on the input image
* Labels: 0 (healthy), 1 (active tuberculosis)
""" """
def __init__(self, split_filename: str): def __init__(self, split_filename: str):
......
...@@ -246,17 +246,17 @@ class DataModule(CachingDataModule): ...@@ -246,17 +246,17 @@ class DataModule(CachingDataModule):
active TB cases (total samples = 8369): active TB cases (total samples = 8369):
- ``train`` dataset samples: - ``train`` dataset samples:
- Healthy: 4864 - Healthy, Sick or Latent TB: 4864
- Active TB only: 377 - Active TB only: 377
- Total: 5241 - Total: 5241
- ``validation`` dataset samples: - ``validation`` dataset samples:
- Healthy, Sick or Latent TB: 1239
- Active TB only: 96 - Active TB only: 96
- Healthy: 1239
- Total: 1335 - Total: 1335
- ``test`` dataset samples: - ``test`` dataset samples:
- Healthy: 1636 - Healthy, Sick or Latent TB: 1636
- Active TB only: 157 - Active TB only: 157
- Total: 1793 - Total: 1793
...@@ -265,8 +265,7 @@ class DataModule(CachingDataModule): ...@@ -265,8 +265,7 @@ class DataModule(CachingDataModule):
* Raw data input (on disk): * Raw data input (on disk):
* PNG images 8 bits RGB * PNG images 8 bits RGB, 512 x 512 pixels
* Resolution: 512x512 pixels
* Output image: * Output image:
...@@ -276,8 +275,10 @@ class DataModule(CachingDataModule): ...@@ -276,8 +275,10 @@ class DataModule(CachingDataModule):
* Final specifications: * Final specifications:
* RGB, encoded as a 3-plane image, 8 bits * RGB, encoded as a 3-plane tensor using 32-bit floats, square
* Square (512x512 px) (512x512 pixels)
* Labels: 0 (healthy, latent tb or sick but no tb depending on the
protocol), 1 (active tuberculosis)
""" """
def __init__(self, split_filename: str): def __init__(self, split_filename: str):
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment