Making use of LightningDataModule and simplification of data loading
This merge request adds LightningDataModule to better organize the code and make better use of lightning's features. This centralizes common tasks such as DataLoader creation and application of transforms into a base class to be inherited from.
Data loading was also simplified by removing custom Sample
classes and maker
functions, and the addition of RuntimeDataset
and CachedDataset
.
Remaining tasks:
-
(@dcarron) Create a common/default DataModule for the shenzhen dataset that takes protocols and transforms as parameters to avoid copying code for each protocol configuration -
(@dcarron) Add typehints to ShenzenDataModule -
(@dcarron) Investigate issue where training a new model with ElasticDeformation as a transform converges more slowly if the data is not cached. -
(@andre.anjos) Update documentation on ShenzenDataModule -
(@biosignal) Update all datasets, using Shenzhen as a reference -
(@mdelitroz) Montgomery -
(@mdelitroz) Hivtb -
(@andre.anjos) Indian -
(@andre.anjos) Padchest -
(@mdelitroz) Tbpoc -
(@andre.anjos) tbx11_simplified -> renamed as tbx11k
, protocolv1
(uses original dataset organisation) -
(@andre.anjos) tbx11_simplified_v2 -> renamed as tbx11k
, protocolv2
(uses original dataset organisation) -
(@andre.anjos) mc_ch -> renamed as montgomery-shenzhen
-
(@andre.anjos) mc_ch_in -> renamed as montgomery-shenzhen-indian
-
(@andre.anjos) mc_ch_in_11k -> renamed as montgomery-shenzhen-indian-tbx11k-v1
-
(@andre.anjos) mc_ch_in_11k_v2 -> renamed as montgomery-shenzhen-indian-tbx11k-v2
-
(@andre.anjos) mc_ch_in_pc -> renamed as montgomery-shenzhen-indian-padchest
-
(@andre.anjos) nih_cxr14_re -> renamed as nih-cxr14
(n.b.: multi-class dataset, radiological findings) -
(@andre.anjos) nih_cxr14_re_pc -> renamed as nih-cxr14-padchest
-
-
(@dcarron) Update models -
(@dcarron) Update evaluation scripts -
(@dcarron) Update support for extra_validation datasets -
(@biosignal) Update full documentation -
(@biosignal) Update unit tests
Addresses the following issues:
- Closes #14 (closed)
- Closes #18 (closed)
- Closes #19 (closed)
- Closes #21 (closed)
- Closes #29 (closed)
Merge request reports
Activity
added 1 commit
- 2fcec25b - Removed TBDataset, using Runtime or Cached datasets instead
changed milestone to %Adopt new tooling (phase 3)
assigned to @andre.anjos
added 1 commit
- f92fc663 - Created common DataModule for Shenzhen dataset
added 2 commits
added 1 commit
- 6b6196a0 - Apply transforms during __getitem__ in CachedDataset
marked the checklist item (@dcarron) Investigate issue where training a new model with ElasticDeformation as a transform converges more slowly if the data is not cached. as completed
added 30 commits
- 6b6196a0...79187009 - 20 earlier commits
- a69868bf - [doc/references] Add header
- 0827ccca - [ptbench.data.datamodule] Implemented typing, added more logging, implemented...
- dccc1da3 - [ptbench.engine] Simplified, documented and created type hints for the...
- 99f52320 - [ptbench.scripts] Improved docs, adapt changes from weight-balancing strategy
- a0f264f0 - [ptbench.utils.accelerator] Add support for mps backend
- c25e5008 - [ptbench.models.pasa] Define new API for modules
- a518c23d - [ptbench.data.data.module] Pin memory when using MPS as well
- 7b12973c - [ptbench.engine.device] Move AcceleratorProcessor to engine submodule, revamp...
- 7eaac22c - Updated models
- 278a6198 - Merge branch 'add-datamodule-andre' into 'add-datamodule'
Toggle commit listmarked the checklist item (@dcarron) Create a common/default DataModule for the shenzhen dataset that takes protocols and transforms as parameters to avoid copying code for each protocol configuration as completed
marked the checklist item (@dcarron) Add typehints to ShenzenDataModule as completed
marked the checklist item (@andre.anjos) Update documentation on ShenzenDataModule as completed