Skip to content
Snippets Groups Projects

Reviewed DataModule design+docs+types

Merged André Anjos requested to merge add-datamodule-andre into add-datamodule

In this MR, I try to review the use of the data module and go further in simplifications for the whole code:

  • Slightly streamlined the datamodule approach
  • Added documentation
  • Added type annotations
  • Added some TODOs for further discussion

Closes #7 (closed).

TODO:

  • (@dcarron) Implement/document/type hint normaliser (ensure it is used correctly through all the models; integrate on the train script) and that train CLI works!
  • (@andre.anjos) Deal with import reweight_BCEWithLogitsLoss
Edited by André Anjos

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • André Anjos added 3 commits

    added 3 commits

    • 4c65b4fb - [pyproject] Format with taplo
    • 3da537d0 - [data.dataset] Make function private
    • 9e721cb3 - [models] Remove runtime checks; Use torchvision normaliser instead of our own

    Compare with previous version

  • André Anjos resolved all threads

    resolved all threads

  • @dcarron, I started doing my bit, but I got confused about the current state of the code. I hope you can clarify my doubts.

    If I look at the densenet model, I see both criterion and criterion_valid are not being saved on the lightning-module's hyper-parameters (they are explicitly excluded in the constructor). Then, if I looked at the Pasa model, this isn't the case - these parameters are saved. The config files for both these models pass criteria parameters via their constructor. How are these passed, finally, to the place they are needed in the densenet case? Is that just a bug? I clearly don't understand enough about lightning to make sense of all this.

    Optimiser: I see we pass optimiser as a string to the method - it states Adam in the case of the config file for the PASA model, alongside its parameters in a dictionary. Inside the model code there is something like:

    optimizer = getattr(torch.optim, self.hparams.optimizer)(
        self.parameters(), **self.hparams.optimizer_configs
    )

    I'm wondering why we need to mangle this in this way instead of passing the instantiated optimiser during the construction of the model. The lightning module does not like saving torch optimisers? However, criteria (losses) are OK?

  • After some local tests, I realised lightning actually saves the hyper-parameters in a YAML file, so it is hard to save complex objects. It has to be limited to what an YAML file can stand.

    Our configuration system (using software/clapper>) has some advantages w.r.t. this, like being able to snapshot the input configuration for the training session as a whole, and supporting complex objects.

    In other frameworks, we are able to register the command-line used to run a training session (reference implementation here: https://gitlab.idiap.ch/biosignal/software/deepdraw/-/blob/master/src/deepdraw/script/common.py#L140). We could copy that here, and slightly extend its functionality to also copy any modules passed as input. We then "capture" the whole parameter set. In this case, I'd propose we simply not bother about this particular feature of lightning (i.e. we never use self.save_hyperparameters()).

    Edited by André Anjos
  • The code to use this on the train script looks like this:

    # called on the CLICK main script
    
    # output_folder is the root folder where we save all info
    command_sh = os.path.join(output_folder, "train.sh")
    if os.path.exists(command_sh):
        backup = command_sh + "~"
        if os.path.exists(backup):
            os.unlink(backup)
        shutil.move(command_sh, backup)
    save_sh_command(command_sh)

    We can extend save_sh_command() to also save any eventual clapper config files used by the user. Let me check how to do this.

  • Or, to make it simple, open an issue for the future, and just save the command-line as of now...

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading