-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Attach train+val dataloaders to trainer in trainer loop #7207
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7207 +/- ##
=======================================
- Coverage 91% 87% -4%
=======================================
Files 199 199
Lines 12797 12797
=======================================
- Hits 11678 11145 -533
- Misses 1119 1652 +533 |
Hello @ananthsub! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-04-29 22:07:06 UTC |
@awaelchli @justusschock @tchaton what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@tchaton @awaelchli @Borda mind taking a look? |
c4cbff9
to
7831688
Compare
What does this PR do?
Fixes #7208 .
The issue occurs because the validation dataloader is never attached to the trainer by the time we reach this stage in the training loop: https://github.com/PyTorchLightning/pytorch-lightning/blob/44d775fccfb825561937f6fa03fe258af25c2b83/pytorch_lightning/trainer/training_loop.py#L558-L560
Currently this is conditional based on this logic, which is called once at the start of training: https://github.com/PyTorchLightning/pytorch-lightning/blob/44d775fccfb825561937f6fa03fe258af25c2b83/pytorch_lightning/trainer/trainer.py#L604
Removing the
reload_dataloaders_every_epoch
check here gives me consistent behavior again.I don't think this is a fully proper fix - we should be attaching these dataloaders to the trainer more clearly, and operating them on them for reloads later on in the training loop.
This is a behavior change that resulted after #6075. Before,
run_evaluation
ran, which populated the dataloader settings on the trainer, and then the checkpoint callback ran. After, the checkpoint callback ran first, before run_evaluation. the check for whether we run evaluation depends on the val dataloader being present, which was being set inside of run_evaluation.Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃