-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remember the eval mode of submodules when switching trainer stages #18951
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
awaelchli
added
fun
Staff contributions outside working hours - to differentiate from the "community" label
trainer: validate
trainer: fit
labels
Nov 12, 2023
awaelchli
changed the title
[WIP] Remember the eval mode of submodules when switching trainer stages
Remember the eval mode of submodules when switching trainer stages
Nov 12, 2023
for more information, see https://pre-commit.ci
awaelchli
requested review from
williamFalcon,
tchaton,
carmocca,
justusschock and
Borda
as code owners
November 12, 2023 03:28
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #18951 +/- ##
==========================================
- Coverage 76% 48% -27%
==========================================
Files 450 442 -8
Lines 36508 36383 -125
==========================================
- Hits 27583 17572 -10011
- Misses 8925 18811 +9886 |
carmocca
reviewed
Nov 12, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR replace #18826?
Co-authored-by: Carlos Mocholí <[email protected]>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
carmocca
approved these changes
Nov 16, 2023
Borda
approved these changes
Nov 16, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
docs
Documentation related
feature
Is an improvement or enhancement
fun
Staff contributions outside working hours - to differentiate from the "community" label
pl
Generic label for PyTorch Lightning package
ready
PRs ready to be merged
trainer: fit
trainer: validate
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #18930
Part of #16827
A common issue users are facing is that the loop calls
train()
on the LightningModule despite the user having frozen certain layers. For example,This leads to a surprise when the user finds out that their batch norm layers have changed statistics, even though they were set explicitly to
eval()
mode. To avoid this, the user has to learn that they should override theon_validation_model_eval()
andon_validation_model_train()
hooks in the module, but this is a detail difficult to find in our docs and get right. Most users who face this challenge end up on slack or GH to ask for help.The PR makes the following changes to automate this for the user:
.training
mode of every submodule before calling.eval()
now. When the validation loop ends, and before switching to training, it restores the.training
mode on all submodules to what it was before. This ensures that layers the user has chosen to be in eval mode remain in eval mode!.train()
at the beginning with the same motivation: The user can now set a subset of their model to.eval()
mode / freeze it explicitly in the LightningModule's__init__
without doing acrobatics with hooks, and the Trainer will respect it and preserve it (see the added test). Note: This is not a breaking change, because PyTorch's default is to have a model in.training=True
mode.📚 Documentation preview 📚: https://pytorch-lightning--18951.org.readthedocs.build/en/18951/
cc @Borda @justusschock @awaelchli