Skip to content
Snippets Groups Projects
Commit 75cf19d0 authored by André Anjos's avatar André Anjos :speech_balloon:
Browse files

[doc] Improve documentation

parent 31c86afa
No related branches found
No related tags found
No related merge requests found
Pipeline #68969 failed
......@@ -6,3 +6,6 @@
include:
- project: biosignal/software/dev-profile
file: /gitlab/python.yml
tests:
before_script:
......@@ -2,6 +2,6 @@
#
# SPDX-License-Identifier: GPL-3.0-or-later
recursive-include doc *.rst *.ico *.png
recursive-include doc *.rst *.png
recursive-include tests *.py *.png *.csv *.json
recursive-include src/ptbench/data *.json.bz2
.wy-nav-content {
max-width: none;
}
{% include "autosummary/module.rst" %}
.. literalinclude:: ../../../../src/{{ fullname.replace(".", "/") }}.py
:start-at: import
{% extends "!layout.html" %}
{% block extrahead %}
<link href="{{ pathto("_static/style.css", True) }}" rel="stylesheet" type="text/css">
{% endblock %}
......@@ -12,10 +12,104 @@ This section includes information for using the Python API of
``ptbench``.
.. _ptbench.api.data:
Data Methods
------------
Auxiliary classes and methods to define raw dataset iterators.
.. autosummary::
:toctree: api/data
ptbench.data.sample
ptbench.data.dataset
ptbench.data.utils
ptbench.data.loader
ptbench.data.transforms
ptbench.configs.datasets
.. _ptbench.api.data.raw:
Raw Dataset Access
------------------
Direct data-access through iterators.
.. autosummary::
:toctree: api/data/raw
ptbench.data.hivtb_RS
ptbench.data.tbpoc
ptbench.data.montgomery_RS
ptbench.data.padchest
ptbench.data.hivtb
ptbench.data.indian_RS
ptbench.data.shenzhen_RS
ptbench.data.tbpoc_RS
ptbench.data.shenzhen
ptbench.data.montgomery
ptbench.data.indian
ptbench.data.nih_cxr14_re
ptbench.data.padchest_RS
.. _ptbench.api.models:
Models
------
CNN and other models implemented.
.. autosummary::
:toctree: api/models
ptbench.models.alexnet
ptbench.models.densenet
ptbench.models.densenet_rs
ptbench.models.logistic_regression
ptbench.models.normalizer
ptbench.models.pasa
ptbench.models.signs_to_tb
.. _ptbench.api.engines:
Command engines
---------------
Functions to actuate on the data.
.. autosummary::
:toctree: api/engine
ptbench.engine.trainer
ptbench.engine.predictor
ptbench.engine.evaluator
.. _ptbench.api.utils:
Various utilities
-----------------
Reusable auxiliary functions.
.. autosummary::
:toctree: api
:toctree: api/utils
ptbench
ptbench.utils.checkpointer
ptbench.utils.download
ptbench.utils.grad_cams
ptbench.utils.measure
ptbench.utils.model_serialization
ptbench.utils.model_zoo
ptbench.utils.plot
ptbench.utils.rc
ptbench.utils.resources
ptbench.utils.summary
ptbench.utils.table
.. include:: links.rst
{
"exposed": {
"versions": {
"stable": "https://www.idiap.ch/software/bob/docs/bob/exposed/stable/sphinx/",
"latest": "https://www.idiap.ch/software/bob/docs/bob/exposed/main/sphinx/"
},
"sources": {}
}
}
.. Copyright © 2022 Idiap Research Institute <contact@idiap.ch>
..
.. SPDX-License-Identifier: GPL-3.0-or-later
.. _ptbench.cli:
========================
Command-line Interface
========================
This section contains an overview of command-line applications shipped with
this package.
.. click:: ptbench.scripts.cli:cli
:prog: ptbench
:nested: full
.. include:: links.rst
......@@ -22,6 +22,11 @@ extensions = [
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx.ext.intersphinx",
"auto_intersphinx",
"sphinx_autodoc_typehints",
"sphinx_copybutton",
"sphinx_inline_tabs",
"sphinx_click",
]
# Be picky about warnings
......@@ -109,6 +114,18 @@ autodoc_default_options = {
"show-inheritance": True,
}
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
}
auto_intersphinx_packages = [
"matplotlib",
"numpy",
"pandas",
"pillow",
"psutil",
"torch",
"torchvision",
("exposed", "latest"),
("python", "3"),
]
auto_intersphinx_catalog = "catalog.json"
# Add our private index (for extras and fixes)
intersphinx_mapping = dict(extras=("", "extras.inv"))
.. Copyright © 2022 Idiap Research Institute <contact@idiap.ch>
..
.. SPDX-License-Identifier: GPL-3.0-or-later
.. _ptbench.config:
Preset Configurations
---------------------
This module contains preset configurations for baseline CNN architectures and
datasets.
Models
======
.. autosummary::
:toctree: api/configs/models
:template: config.rst
ptbench.configs.models.alexnet
ptbench.configs.models.alexnet_pretrained
ptbench.configs.models.densenet
ptbench.configs.models.densenet_pretrained
ptbench.configs.models.logistic_regression
ptbench.configs.models.pasa
ptbench.configs.models.signs_to_tb
ptbench.configs.models_datasets.densenet_rs
.. _ptbench.configs.datasets:
Datasets
========
Datasets include iterative accessors to raw data
(:ref:`ptbench.setup.datasets`) including data pre-processing and augmentation,
if applicable. Use these datasets for training and evaluating your models.
.. autosummary::
:toctree: api/configs/datasets
:template: config.rst
ptbench.configs.datasets.indian.default
ptbench.configs.datasets.indian.rgb
ptbench.configs.datasets.indian_RS.default
ptbench.configs.datasets.mc_ch.default
ptbench.configs.datasets.mc_ch.rgb
ptbench.configs.datasets.mc_ch_RS.default
ptbench.configs.datasets.mc_ch_in.default
ptbench.configs.datasets.mc_ch_in.rgb
ptbench.configs.datasets.mc_ch_in_RS.default
ptbench.configs.datasets.mc_ch_in_pc.default
ptbench.configs.datasets.mc_ch_in_pc.rgb
ptbench.configs.datasets.mc_ch_in_pc_RS.default
ptbench.configs.datasets.montgomery.default
ptbench.configs.datasets.montgomery.rgb
ptbench.configs.datasets.montgomery_RS.default
ptbench.configs.datasets.nih_cxr14_re.cardiomegaly_idiap
ptbench.configs.datasets.nih_cxr14_re.default
ptbench.configs.datasets.nih_cxr14_re.idiap
ptbench.configs.datasets.nih_cxr14_re_pc.idiap
ptbench.configs.datasets.padchest.cardiomegaly_idiap
ptbench.configs.datasets.padchest.idiap
ptbench.configs.datasets.padchest.no_tb_idiap
ptbench.configs.datasets.padchest.tb_idiap
ptbench.configs.datasets.padchest.tb_idiap_rgb
ptbench.configs.datasets.padchest_RS.tb_idiap
ptbench.configs.datasets.shenzhen.default
ptbench.configs.datasets.shenzhen.rgb
ptbench.configs.datasets.shenzhen_RS.default
.. _ptbench.configs.datasets.folds:
Cross-Validation Datasets
=========================
We support cross-validation with precise preset folds. In this section, you
will find the configuration for the first fold (fold-0) for all supported
datasets. Nine other folds are available for every configuration (from 1 to
9), making up 10 folds per supported dataset.
.. autosummary::
:toctree: api/configs/datasets
:template: config.rst
ptbench.configs.datasets.hivtb.fold_0
ptbench.configs.datasets.hivtb.fold_0_rgb
ptbench.configs.datasets.hivtb_RS.fold_0
ptbench.configs.datasets.indian.fold_0
ptbench.configs.datasets.indian.fold_0_rgb
ptbench.configs.datasets.indian_RS.fold_0
ptbench.configs.datasets.mc_ch.fold_0
ptbench.configs.datasets.mc_ch.fold_0_rgb
ptbench.configs.datasets.mc_ch_RS.fold_0
ptbench.configs.datasets.mc_ch_in.fold_0
ptbench.configs.datasets.mc_ch_in.fold_0_rgb
ptbench.configs.datasets.mc_ch_in_RS.fold_0
ptbench.configs.datasets.montgomery.fold_0
ptbench.configs.datasets.montgomery.fold_0_rgb
ptbench.configs.datasets.montgomery_RS.fold_0
ptbench.configs.datasets.shenzhen.fold_0
ptbench.configs.datasets.shenzhen.fold_0_rgb
ptbench.configs.datasets.shenzhen_RS.fold_0
ptbench.configs.datasets.tbpoc.fold_0
ptbench.configs.datasets.tbpoc.fold_0_rgb
ptbench.configs.datasets.tbpoc_RS.fold_0
.. include:: links.rst
# Sphinx inventory version 2
# Project: extras
# Version: stable
# The remainder of this file is compressed using zlib.
xE лSti *
P~M߃lYfh5Wٯit}No!%]B-4OΌ
\ No newline at end of file
# Sphinx inventory version 2
# Project: extras
# Version: stable
# The remainder of this file is compressed using zlib.
torchvision.transforms py:module 1 https://pytorch.org/vision/stable/transforms.html -
doc/img/direct_vs_indirect.png

344 KiB

......@@ -10,17 +10,49 @@
.. todolist::
.. todo:: write introduction about ptbench here
Benchmarks of convolutional neural network (CNN) architectures applied to
Pulmonary Tuberculosis (TB) detection on chest X-rays (CXR).
Please use the BibTeX reference below to cite this work:
Documentation
-------------
.. code:: bibtex
@INPROCEEDINGS{raposo_union_2022,
author = {Raposo, Geoffrey and Trajman, Anete and Anjos, Andr{\'{e}}},
month = 11,
title = {Pulmonary Tuberculosis Screening from Radiological Signs on Chest X-Ray Images Using Deep Models},
booktitle = {Union World Conference on Lung Health},
year = {2022},
date = {2022-11-01},
organization = {The Union},
}
@TECHREPORT{Raposo_Idiap-Com-01-2021,
author = {Raposo, Geoffrey},
keywords = {deep learning, generalization, Interpretability, transfer learning, Tuberculosis Detection},
projects = {Idiap},
month = {7},
title = {Active tuberculosis detection from frontal chest X-ray images},
type = {Idiap-Com},
number = {Idiap-Com-01-2021},
year = {2021},
institution = {Idiap},
url = {https://gitlab.idiap.ch/bob/bob.med.tb},
pdf = {https://publidiap.idiap.ch/downloads/reports/2021/Raposo_Idiap-Com-01-2021.pdf}
}
User Guide
----------
.. toctree::
:maxdepth: 2
install
usage
references
cli
config
api
......
......@@ -8,39 +8,210 @@
Installation
==============
.. todo:: fine-tune installation instructions for ptbench here
We support two installation modes, through pip_, or mamba_ (conda).
We support two installation modes, through pip_, or mamba_ (conda).
.. tab:: pip
Stable, from PyPI:
.. code:: sh
pip install ptbench
Latest beta, from GitLab package registry:
.. code:: sh
pip install --pre --index-url https://gitlab.idiap.ch/api/v4/groups/bob/-/packages/pypi/simple --extra-index-url https://pypi.org/simple ptbench
.. tip::
To avoid long command-lines you may configure pip to define the indexes and
package search priorities as you like.
.. tab:: mamba/conda
With pip
--------
Stable:
.. code-block:: sh
.. code:: sh
# stable, from PyPI:
$ pip install ptbench
mamba install -c https://www.idiap.ch/software/biosignal/conda -c conda-forge ptbench
Latest beta:
.. code:: sh
mamba install -c https://www.idiap.ch/software/biosignal/conda/label/beta -c conda-forge ptbench
.. _ptbench.setup:
Setup
-----
A configuration file may be useful to setup global options that should be often
reused. The location of the configuration file depends on the value of the
environment variable ``$XDG_CONFIG_HOME``, but defaults to
``~/.config/ptbench.toml``. You may edit this file using your preferred
editor.
Here is an example configuration file that may be useful as a starting point:
.. code:: toml
[datadir]
indian = "/Users/myself/dbs/tbxpredict"
montgomery = "/Users/myself/dbs/montgomery-xrayset"
shenzhen = "/Users/myself/dbs/shenzhen"
# latest beta, from GitLab package registry:
$ pip install --pre --index-url https://gitlab.idiap.ch/api/v4/groups/biosignal/software/-/packages/pypi/simple --extra-index-url https://pypi.org/simple ptbench
.. tip::
To avoid long command-lines you may configure pip to define the indexes and
package search priorities as you like.
To get a list of valid data directories that can be configured, execute:
.. code:: sh
ptbench dataset list
You must procure and download datasets by yourself. The raw data is not
included in this package as we are not authorised to redistribute it.
.. _ptbench.setup.datasets:
Supported Datasets
==================
Here is a list of currently supported datasets in this package, alongside
notable properties. Each dataset name is linked to the location where
raw data can be downloaded. The list of images in each split is available
in the source code.
.. _ptbench.setup.datasets.tb:
Tuberculosis datasets
~~~~~~~~~~~~~~~~~~~~~
The following datasets contain only the tuberculosis final diagnosis (0 or 1).
In addition to the splits presented in the following table, 10 folds
(for cross-validation) randomly generated are available for these datasets.
.. list-table::
* - Dataset
- Reference
- H x W
- Samples
- Training
- Validation
- Test
* - Montgomery_
- [MONTGOMERY-SHENZHEN-2014]_
- 4020 x 4892
- 138
- 88
- 22
- 28
* - Shenzhen_
- [MONTGOMERY-SHENZHEN-2014]_
- Varying
- 662
- 422
- 107
- 133
* - Indian_
- [INDIAN-2013]_
- Varying
- 155
- 83
- 20
- 52
.. _ptbench.setup.datasets.tb+signs:
Tuberculosis + radiological findings dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following dataset contains both the tuberculosis final diagnosis (0 or 1)
and radiological findings.
.. list-table::
* - Dataset
- Reference
- H x W
- Samples
- Train
- Test
* - PadChest_
- [PADCHEST-2019]_
- Varying
- 160'861
- 160'861
- 0
.. _ptbench.setup.datasets.signs:
Radiological findings datasets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following dataset contains only the radiological findings without any
information about tuberculosis.
With conda
----------
.. note::
.. code-block:: sh
NIH CXR14 labels for training and validation sets are the relabeled
versions done by the author of the CheXNeXt study [CHEXNEXT-2018]_.
# stable:
$ mamba install -c https://www.idiap.ch/software/biosignal/software/conda -c conda-forge ptbench
.. list-table::
# latest beta:
$ mamba install -c https://www.idiap.ch/software/biosignal/software/conda/label/beta -c conda-forge ptbench
* - Dataset
- Reference
- H x W
- Samples
- Training
- Validation
- Test
* - NIH_CXR14_re_
- [NIH-CXR14-2017]_
- 1024 x 1024
- 109'041
- 98'637
- 6'350
- 4'054
.. _ptbench.setup.datasets.hiv-tb:
HIV-Tuberculosis datasets
~~~~~~~~~~~~~~~~~~~~~~~~~
The following datasets contain only the tuberculosis final diagnosis (0 or 1)
and come from HIV infected patients. 10 folds (for cross-validation) randomly
generated are available for these datasets.
Please contact the authors of these datasets to have access to the data.
.. list-table::
* - Dataset
- Reference
- H x W
- Samples
* - TB POC
- [TB-POC-2018]_
- 2048 x 2500
- 407
* - HIV TB
- [HIV-TB-2019]_
- 2048 x 2500
- 243
.. include:: links.rst
......@@ -10,3 +10,11 @@
.. _python: http://www.python.org
.. _pip: https://pip.pypa.io/en/stable/
.. _mamba: https://mamba.readthedocs.io/en/latest/index.html
.. _pytorch: https://pytorch.org
.. Raw data websites
.. _montgomery: https://lhncbc.nlm.nih.gov/publication/pub9931
.. _shenzhen: https://lhncbc.nlm.nih.gov/publication/pub9931
.. _indian: https://sourceforge.net/projects/tbxpredict/
.. _NIH_CXR14_re: https://nihcc.app.box.com/v/ChestXray-NIHCC
.. _PadChest: https://bimcv.cipf.es/bimcv-projects/padchest/
py:class torch.nn.modules.loss._Loss
py:class Module
.. coding=utf-8
============
References
============
.. [MONTGOMERY-SHENZHEN-2014] *Jaeger S, Candemir S, Antani S, Wáng YX, Lu PX,
Thoma G.*, **Two public chest X-ray datasets for computer-aided screening of
pulmonary diseases.**, Quant Imaging Med Surg. 2014;4(6):475‐477.
https://dx.doi.org/10.3978%2Fj.issn.2223-4292.2014.11.20
.. [INDIAN-2013] https://sourceforge.net/projects/tbxpredict/
.. [PASA-2019] *Pasa, F., Golkov, V., Pfeiffer, F. et al.*,
**Efficient Deep Network Architectures for Fast Chest X-Ray Tuberculosis
Screening and Visualization.** Sci Rep 9, 6268 (2019).
https://doi.org/10.1038/s41598-019-42557-4
.. [SIMARD-2003] *P. Y. Simard, D. Steinkraus and J. C. Platt*,
**Best practices for convolutional neural networks applied to visual
document analysis**, Seventh International Conference on Document Analysis
and Recognition, 2003. Proceedings., Edinburgh, UK, 2003, pp. 958-963.
https://doi.org/10.1109/ICDAR.2003.1227801
.. [CHEXNEXT-2018] *Rajpurkar Pranav, Jeremy Irvin, Robyn L. Ball, Kaylie Zhu,
Brandon Yang, Hershel Mehta, Tony Duan, et al.*, **Deep Learning for Chest
Radiograph Diagnosis: A Retrospective Comparison of the CheXNeXt Algorithm
to Practicing Radiologists**. PLOS Medicine 15, nᵒ 11 (20 november 2018):
e1002686. https://doi.org/10.1371/journal.pmed.1002686
.. [NIH-CXR14-2017] *Xiaosong Wang et al.*, **ChestX-Ray8: Hospital-Scale
Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification
and Localization of Common Thorax Diseases.** 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). Honolulu, HI: IEEE,
July 2017, pp. 3462–3471. doi: 10.1109/CVPR.2017.369.
http://ieeexplore.ieee.org/document/8099852/
.. [PADCHEST-2019] *Aurelia Bustos et al.*, **PadChest: A large chest x-ray
image dataset with multi-label annotated reports** Medical Image Analysis,
Volume 66, 2020, 101797, ISSN 1361-8415. doi: 10.1016/j.media.2020.101797.
https://www.sciencedirect.com/science/article/abs/pii/S1361841520301614
.. [GOUTTE-2005] *C. Goutte and E. Gaussier*, **A probabilistic interpretation
of precision, recall and F-score, with implication for evaluation**,
European conference on Advances in Information Retrieval Research, 2005.
https://doi.org/10.1007/978-3-540-31865-1_25
.. [TB-POC-2018] *Griesel, Rulan and Stewart, Annemie and van der Plas, Helen
and Sikhondze, Welile and Rangaka, Molebogeng X and Nicol, Mark P and
Kengne, Andre P and Mendelson, Marc and Maartens, Gary*, **Optimizing
Tuberculosis Diagnosis in Human Immunodeficiency Virus–Infected Inpatients
Meeting the Criteria of Seriously Ill in the World Health Organization
Algorithm**, Clinical Infectious Diseases, 2017.
https://doi.org/10.1093/cid/cix988
.. [HIV-TB-2019] *Van Hoving, D. J. et al.*, **Brief report: real-world
performance and interobserver agreement of urine lipoarabinomannan in
diagnosing HIV-Associated tuberculosis in an emergency center.**,
J. Acquir. Immune Defic. Syndr. 1999 81, e10–e14 (2019).
......@@ -8,8 +8,84 @@
Usage
=======
This package supports a fully reproducible research experimentation cycle for
tuberculosis detection with support for the following activities.
.. todo:: write usage instructions for ptbench
.. figure:: img/direct_vs_indirect.png
.. _ptbench.usage.direct-detection:
Direct detection
----------------
* Training: Images are fed to a Convolutional Neural Network (CNN),
that is trained to detect the presence of tuberculosis
automatically, via error back propagation. The objective of this phase is to
produce a CNN model.
* Inference (prediction): The CNN is used to generate TB predictions.
* Evaluation: Predications are used to evaluate CNN performance against
provided annotations, and to generate measure files and score tables. Optimal
thresholds are also calculated.
* Comparison: Use predictions results to compare performance of multiple
systems.
.. _ptbench.usage.indirect-detection:
Indirect detection
------------------
* Training (step 1): Images are fed to a Convolutional Neural Network (CNN),
that is trained to detect the presence of radiological signs
automatically, via error back propagation. The objective of this phase is to
produce a CNN model.
* Inference (prediction): The CNN is used to generate radiological signs
predictions.
* Conversion of the radiological signs predictions into a new dataset.
* Training (step 2): Radiological signs are fed to a shallow network, that is
trained to detect the presence of tuberculosis automatically, via error back
propagation. The objective of this phase is to produce a shallow model.
* Inference (prediction): The shallow model is used to generate TB predictions.
* Evaluation: Predications are used to evaluate CNN performance against
provided annotations, and to generate measure files and score tables.
* Comparison: Use predictions results to compare performance of multiple
systems.
We provide :ref:`command-line interfaces (CLI) <ptbench.cli>` that implement
each of the phases above. This interface is configurable using :ref:`exposed's
extensible configuration framework <exposed.config>`. In essence,
each command-line option may be provided as a variable with the same name in a
Python file. Each file may combine any number of variables that are pertinent
to an application.
.. tip::
For reproducibility, we recommend you stick to configuration files when
parameterizing our CLI. Notice some of the options in the CLI interface
(e.g. ``--dataset``) cannot be passed via the actual command-line as it
may require complex Python types that cannot be synthetized in a single
input parameter.
We provide a number of :ref:`preset configuration files <ptbench.config>` that
can be used in one or more of the activities described in this section. Our
command-line framework allows you to refer to these preset configuration files
using special names (a.k.a. "resources"), that procure and load these for you
automatically.
.. _ptbench.usage.commands:
Commands
--------
.. toctree::
:maxdepth: 2
usage/training
usage/evaluation
usage/predtojson
usage/aggregpred
.. include:: links.rst
.. Copyright © 2023 Idiap Research Institute <contact@idiap.ch>
..
.. SPDX-License-Identifier: GPL-3.0-or-later
.. _ptbench.usage.aggregpred:
=======================================================
Aggregate multiple prediction files into a single one
=======================================================
This guide explains how to aggregate multiple prediction files into a single
one. It can be used when doing cross-validation to aggregate the predictions of
k different models before evaluating the aggregated predictions. We input
multiple prediction files (CSV files) and output a single one.
Use the sub-command :ref:`aggregpred <ptbench.cli>` aggregate your prediction
files together:
.. code:: sh
ptbench aggregpred -vv path/to/fold0/predictions.csv path/to/fold1/predictions.csv --output-folder=aggregpred
.. include:: ../links.rst
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment