Skip to content
Snippets Groups Projects
install.rst 5.96 KiB

Installation

We support two installation modes, through pip_, or mamba_ (conda).

Setup

A configuration file may be useful to setup global options that should be often reused. The location of the configuration file depends on the value of the environment variable $XDG_CONFIG_HOME, but defaults to ~/.config/mednet.toml. You may edit this file using your preferred editor.

Here is an example configuration file that may be useful as a starting point:

[datadir]
indian = "/Users/myself/dbs/tbxpredict"
montgomery = "/Users/myself/dbs/montgomery-xrayset"
shenzhen = "/Users/myself/dbs/shenzhen"
nih_cxr14_re = "/Users/myself/dbs/nih-cxr14-re"
tbx11k_simplified = "/Users/myself/dbs/tbx11k-simplified"

[nih_cxr14_re]
idiap_folder_structure = false  # set to `true` if at Idiap

Tip

To get a list of valid data directories that can be configured, execute:

mednet database list

You must procure and download databases by yourself. The raw data is not included in this package as we are not authorised to redistribute it.

To check whether the downloaded version is consistent with the structure that is expected by this package, run:

mednet database check <database_name>

Supported Databases

Here is a list of currently supported databases in this package, alongside notable properties. Each database name is linked to the location where raw data can be downloaded. The list of images in each split is available in the source code.

Tuberculosis databases

The following databases contain only the tuberculosis final diagnosis (0 or 1). In addition to the splits presented in the following table, 10 folds (for cross-validation) randomly generated are available for these databases.

Database Reference H x W Samples Training Validation Test
Montgomery_ [MONTGOMERY-SHENZHEN-2014]_ 4020 x 4892 138 88 22 28
Shenzhen_ [MONTGOMERY-SHENZHEN-2014]_ Varying 662 422 107 133
Indian_ [INDIAN-2013]_ Varying 155 83 20 52

Tuberculosis multilabel databases

The following databases contain the labels healthy, sick & non-TB, active TB, and latent TB. The implemented tbx11k database in this package is based on the simplified version, which is just a more compact version of the original. In addition to the splits presented in the following table, 10 folds (for cross-validation) randomly generated are available for these databases.

Database Reference H x W Samples Training Validation Test
TBX11K_ [TBX11K-2020]_ 512 x 512 11'200 6600 1800 2800
TBX11K_SIMPLIFIED_ [TBX11K-SIMPLIFIED-2020]_ 512 x 512 11'200 6600 1800 2800

Tuberculosis + radiological findings databases

The following databases contain both the tuberculosis final diagnosis (0 or 1) and radiological findings.

Database Reference H x W Samples Train Test
PadChest_ [PADCHEST-2019]_ Varying 160'861 160'861 0

Radiological findings databases

The following database contains only the radiological findings without any information about tuberculosis.

Note

NIH CXR14 labels for training and validation sets are the relabeled versions done by the author of the CheXNeXt study [CHEXNEXT-2018]_.

Database Reference H x W Samples Training Validation Test
NIH_CXR14_re_ [NIH-CXR14-2017]_ 1024 x 1024 109'041 98'637 6'350 4'054

HIV-Tuberculosis databases

The following databases contain only the tuberculosis final diagnosis (0 or 1) and come from HIV infected patients. 10 folds (for cross-validation) randomly generated are available for these databases.

Please contact the authors of these databases to have access to the data.

Database Reference H x W Samples
TB POC [TB-POC-2018]_ 2048 x 2500 407
HIV TB [HIV-TB-2019]_ 2048 x 2500 243