Snippets Groups Projects

1 year ago
a139fcc3

updated documentation with tbx11k · a139fcc3
ogueler@idiap.ch authored 1 year ago

a139fcc3

History

updated documentation with tbx11k
ogueler@idiap.ch authored 1 year ago

install.rst 5.64 KiB

Installation

We support two installation modes, through pip_, or mamba_ (conda).

Setup

A configuration file may be useful to setup global options that should be often reused. The location of the configuration file depends on the value of the environment variable $XDG_CONFIG_HOME, but defaults to ~/.config/ptbench.toml. You may edit this file using your preferred editor.

Here is an example configuration file that may be useful as a starting point:

[datadir]
indian = "/Users/myself/dbs/tbxpredict"
montgomery = "/Users/myself/dbs/montgomery-xrayset"
shenzhen = "/Users/myself/dbs/shenzhen"
nih_cxr14_re = "/Users/myself/dbs/nih-cxr14-re"
tbx11k_simplified = "/Users/myself/dbs/tbx11k-simplified"

[nih_cxr14_re]
idiap_folder_structure = false  # set to `true` if at Idiap

Tip

To get a list of valid data directories that can be configured, execute:

ptbench dataset list

You must procure and download datasets by yourself. The raw data is not included in this package as we are not authorised to redistribute it.

To check whether the downloaded version is consistent with the structure that is expected by this package, run:

ptbench dataset check montgomery

Supported Datasets

Here is a list of currently supported datasets in this package, alongside notable properties. Each dataset name is linked to the location where raw data can be downloaded. The list of images in each split is available in the source code.

Tuberculosis datasets

The following datasets contain only the tuberculosis final diagnosis (0 or 1). In addition to the splits presented in the following table, 10 folds (for cross-validation) randomly generated are available for these datasets.

Dataset	Reference	H x W	Samples	Training	Validation	Test
Montgomery_	[MONTGOMERY-SHENZHEN-2014]_	4020 x 4892	138	88	22	28
Shenzhen_	[MONTGOMERY-SHENZHEN-2014]_	Varying	662	422	107	133
Indian_	[INDIAN-2013]_	Varying	155	83	20	52

Tuberculosis multilabel dataset

The following dataset contains the labels healthy, sick & non-TB, active TB, and latent TB. The implemented tbx11k dataset in this package is based on the simplified version, which is just a more compact version of the original. In addition to the splits presented in the following table, 10 folds (for cross-validation) randomly generated are available for these datasets.

Dataset	Reference	H x W	Samples	Training	Validation	Test
TBX11K_	[TBX11K-2020]_	512 x 512	11'200	6600	1800	2800
TBX11K-SIMPLIFIED_	[TBX11K-SIMPLIFIED-2020]_	512 x 512	11'200	6600	1800	2800

Tuberculosis + radiological findings dataset

The following dataset contains both the tuberculosis final diagnosis (0 or 1) and radiological findings.

Dataset	Reference	H x W	Samples	Train	Test
PadChest_	[PADCHEST-2019]_	Varying	160'861	160'861	0

Radiological findings datasets

The following dataset contains only the radiological findings without any information about tuberculosis.

Note

NIH CXR14 labels for training and validation sets are the relabeled versions done by the author of the CheXNeXt study [CHEXNEXT-2018]_.

Dataset	Reference	H x W	Samples	Training	Validation	Test
NIH_CXR14_re_	[NIH-CXR14-2017]_	1024 x 1024	109'041	98'637	6'350	4'054

HIV-Tuberculosis datasets

The following datasets contain only the tuberculosis final diagnosis (0 or 1) and come from HIV infected patients. 10 folds (for cross-validation) randomly generated are available for these datasets.

Please contact the authors of these datasets to have access to the data.

Dataset	Reference	H x W	Samples
TB POC	[TB-POC-2018]_	2048 x 2500	407
HIV TB	[HIV-TB-2019]_	2048 x 2500	243