Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move reload_dataloaders_every_n_epochs to the DataHooks class #8738

Open
ananthsub opened this issue Aug 5, 2021 · 5 comments
Open

Move reload_dataloaders_every_n_epochs to the DataHooks class #8738

ananthsub opened this issue Aug 5, 2021 · 5 comments
Assignees
Labels
data handling Generic data-related topic deprecation Includes a deprecation design Includes a design discussion feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement
Milestone

Comments

@ananthsub
Copy link
Contributor

🚀 Feature

Motivation

We are auditing the Lightning components and APIs to assess opportunities for improvements:

reload_dataloaders_every_n_epochs is an argument to the Trainer constructor. However, this could be a property of the DataHooks class, instead of the trainer, as whether to initiate reloading the dataloading every n epochs should be determined by the actor providing the dataloaders (e.g. the LightningModule or LightningDataModule).

This is very similar to #8733 and how automatic/manual optimization is a property of the LightningModule. That property also started out as a trainer argument before being migrated to the lightning module. Since this pattern keeps occurring, we should separately understand why it's so appealing to add things to the trainer constructor instead of a more specific component.

Moreover, this one setting controls dataloader behavior for both train & val dataloaders. Do we need more granular control? Do we need two properties, one for training and one for validation? This could make sense as we could have very different epoch counts with features like val_check_interval, where the training epoch count != val epoch count. The property for validation would only apply during trainer.fit as trainer.validate only makes a single pass through the data.

However, the documentation for the test_dataloader: https://github.com/PyTorchLightning/pytorch-lightning/blob/963c26764682fa4cf64c93c5a7572ae0040e9c32/pytorch_lightning/core/hooks.py#L535-L537
Is this a copy/paste issue?

Pitch

  • Add a property to the DataHooks class for this in v1.5
  • Deprecate the Trainer argument for this in v1.5
  • Remove the Trainer argument in v1.7

Benefits:

  • Simplify the Trainer constructor (one fewer argument)
  • Keep the data loader management in one place instead of two (at the DataHooks level)

Alternatives

Keep as is?

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

@ananthsub ananthsub added feature Is an improvement or enhancement help wanted Open to be worked on data handling Generic data-related topic design Includes a design discussion labels Aug 5, 2021
@ninginthecloud
Copy link
Contributor

Hi, @ananthsub Can I work on this issue? Thank you~

@ninginthecloud
Copy link
Contributor

Hi, @ananthsub For this issue, I will do the migration of reload_dataloaders_every_n_epochs first before adding two different properties (train_*, val_*)to give more gradual control.
Because there exist some confusion about how reload_dataloaders_every_n_epochs works (see post), and sometimes, people want to only reload train_dataloader (post). I think it'll be great to add these two properties to avoid confusion and give more control.

@ninginthecloud
Copy link
Contributor

One thing I'm not sure is if we still want to keep _should_reload_dl_epoch() in pytorch_lightning/trainer/properites.py. I tend to move it to fit_loop and evaluation_loop separately. Let me know what you think.

@ananthsub
Copy link
Contributor Author

One thing I'm not sure is if we still want to keep _should_reload_dl_epoch() in pytorch_lightning/trainer/properites.py. I tend to move it to fit_loop and evaluation_loop separately. Let me know what you think.

I think it makes sense to be moved into the loops directly. the property is already private it's already private, and this way we can continue whittling down what's exposed on trainer properties. we could have a mini helper function to share across fit and evaluation loops if necessary

@awaelchli @tchaton @carmocca what do you think? it's somewhat related to #8946

@tchaton
Copy link
Contributor

tchaton commented Aug 21, 2021

Hey @ananthsub,

I think it is fine to move _should_reload_dl_epoch to the loops.

And I believe it would be worth to explore better scheduling mechanism owned by the DataModule.

class DataModule

    def should_reload_train_dataloader(self, epoch: int, total_batch_idx) -> bool:
        return epoch % 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data handling Generic data-related topic deprecation Includes a deprecation design Includes a design discussion feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement
Projects
None yet
5 participants