Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Attach train+val dataloaders to trainer in trainer loop #7207

Merged
merged 9 commits into from
Apr 30, 2021

Conversation

ananthsub
Copy link
Contributor

@ananthsub ananthsub commented Apr 26, 2021

What does this PR do?

Fixes #7208 .

The issue occurs because the validation dataloader is never attached to the trainer by the time we reach this stage in the training loop: https://github.com/PyTorchLightning/pytorch-lightning/blob/44d775fccfb825561937f6fa03fe258af25c2b83/pytorch_lightning/trainer/training_loop.py#L558-L560

Currently this is conditional based on this logic, which is called once at the start of training: https://github.com/PyTorchLightning/pytorch-lightning/blob/44d775fccfb825561937f6fa03fe258af25c2b83/pytorch_lightning/trainer/trainer.py#L604

Removing the reload_dataloaders_every_epoch check here gives me consistent behavior again.

I don't think this is a fully proper fix - we should be attaching these dataloaders to the trainer more clearly, and operating them on them for reloads later on in the training loop.

This is a behavior change that resulted after #6075. Before, run_evaluation ran, which populated the dataloader settings on the trainer, and then the checkpoint callback ran. After, the checkpoint callback ran first, before run_evaluation. the check for whether we run evaluation depends on the val dataloader being present, which was being set inside of run_evaluation.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@ananthsub ananthsub changed the title Attach train+val dataloaders to trainer in trainer loop [fix] Attach train+val dataloaders to trainer in trainer loop Apr 26, 2021
@codecov
Copy link

codecov bot commented Apr 26, 2021

Codecov Report

Merging #7207 (c97d172) into master (338f5a3) will decrease coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #7207    +/-   ##
=======================================
- Coverage      91%     87%    -4%     
=======================================
  Files         199     199            
  Lines       12797   12797            
=======================================
- Hits        11678   11145   -533     
- Misses       1119    1652   +533     

@pep8speaks
Copy link

pep8speaks commented Apr 26, 2021

Hello @ananthsub! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-29 22:07:06 UTC

@awaelchli awaelchli added the bug Something isn't working label Apr 26, 2021
@awaelchli awaelchli added this to the v1.3 milestone Apr 26, 2021
@edenlightning
Copy link
Contributor

@justusschock @awaelchli

@ananthsub
Copy link
Contributor Author

@awaelchli @justusschock @tchaton what do you think?

Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

tests/trainer/test_dataloaders.py Show resolved Hide resolved
@justusschock justusschock self-requested a review April 28, 2021 06:48
@ananthsub
Copy link
Contributor Author

@tchaton @awaelchli @Borda mind taking a look?

@mergify mergify bot removed the has conflicts label Apr 28, 2021
@ananthsub ananthsub added ready PRs ready to be merged data handling Generic data-related topic labels Apr 29, 2021
@ananthsub ananthsub force-pushed the attach-dataloaders branch from c4cbff9 to 7831688 Compare April 29, 2021 21:50
tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
@ananthsub
Copy link
Contributor Author

@carmocca @Borda @tchaton mind taking a look?

@ananthsub ananthsub merged commit e407edb into Lightning-AI:master Apr 30, 2021
@ananthsub ananthsub deleted the attach-dataloaders branch April 30, 2021 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data handling Generic data-related topic ready PRs ready to be merged
Projects
None yet
8 participants