Run `on_train_epoch_end` after the LM for callbacks that monitor #16567

carmocca · 2023-01-30T18:41:10Z

What does this PR do?

Unblocks #16520

EarlyStopping and ModelCheckpoint run checks on_train_epoch_end.
After #16520 lands, if you were logging on LightningModule.training_epoch_end you need to change it to LightningModule.on_train_epoch_end
But Callbacks run their hooks before the LightningModule, so these two callbacks fail to monitor a key that was logged on_train_epoch_end because the monitored key is not available when they run.

This was not an important problem before #16520 because training_epoch_end was a working alternative. It runs after on_train_epoch_end.

This PR suggests a hacky workaround of calling this hook later just for these two Callback classes

Other suggestions are welcome.

Does your PR introduce any breaking changes? If yes, please list them.

Yes, the hook call order has changed.

cc @awaelchli @Borda @justusschock @carmocca

github-actions · 2023-01-30T18:41:48Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu (macOS-11, pytorch, 3.8, 1.11)	success	✅
pl-cpu (macOS-11, pytorch, 3.9, 1.12)	success	✅
pl-cpu (macOS-11, pytorch, 3.10, 1.13)	success	✅
pl-cpu (macOS-11, pytorch, 3.8, 1.10, oldest)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.10)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.9, 1.11)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.10, 1.12)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.10, 1.13)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.7, 1.10, oldest)	success	✅
pl-cpu (windows-2022, pytorch, 3.9, 1.11)	success	✅
pl-cpu (windows-2022, pytorch, 3.10, 1.12)	success	✅
pl-cpu (windows-2022, pytorch, 3.10, 1.13)	success	✅
pl-cpu (windows-2022, pytorch, 3.7, 1.10, oldest)	success	✅
pl-cpu (slow, macOS-11, pytorch, 3.7, 1.11)	success	✅
pl-cpu (slow, ubuntu-20.04, pytorch, 3.7, 1.11)	success	✅
pl-cpu (slow, windows-2022, pytorch, 3.7, 1.11)	success	✅
pl-cpu (macOS-11, lightning, 3.8, 1.13)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.13)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.13)	success	✅

These checks are required after the changes to src/pytorch_lightning/loops/fit_loop.py, src/pytorch_lightning/trainer/trainer.py, tests/tests_pytorch/callbacks/test_early_stopping.py, tests/tests_pytorch/models/test_hooks.py.

🟢 pytorch_lightning: Azure GPU

Check ID	Status
pytorch-lightning (GPUs)	success	✅

These checks are required after the changes to src/pytorch_lightning/loops/fit_loop.py, src/pytorch_lightning/trainer/trainer.py, tests/tests_pytorch/callbacks/test_early_stopping.py, tests/tests_pytorch/models/test_hooks.py.

🟢 pytorch_lightning: Azure HPU

Check ID	Status
pytorch-lightning (HPUs)	success	✅

These checks are required after the changes to src/pytorch_lightning/loops/fit_loop.py, src/pytorch_lightning/trainer/trainer.py, tests/tests_pytorch/callbacks/test_early_stopping.py, tests/tests_pytorch/models/test_hooks.py.

🟢 pytorch_lightning: Azure IPU

Check ID	Status
pytorch-lightning (IPUs)	success	✅

These checks are required after the changes to src/pytorch_lightning/loops/fit_loop.py, src/pytorch_lightning/trainer/trainer.py, tests/tests_pytorch/callbacks/test_early_stopping.py, tests/tests_pytorch/models/test_hooks.py.

🟢 pytorch_lightning: Docs

Check ID	Status
make-doctest (pytorch)	success	✅
make-html (pytorch)	success	✅

These checks are required after the changes to src/pytorch_lightning/loops/fit_loop.py, src/pytorch_lightning/trainer/trainer.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to src/pytorch_lightning/loops/fit_loop.py, src/pytorch_lightning/trainer/trainer.py.

🟢 install

Check ID	Status
install-pkg (ubuntu-22.04, app, 3.7)	success	✅
install-pkg (ubuntu-22.04, app, 3.10)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.7)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.10)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.7)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.10)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.7)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.10)	success	✅
install-pkg (ubuntu-22.04, notset, 3.7)	success	✅
install-pkg (ubuntu-22.04, notset, 3.10)	success	✅
install-pkg (macOS-12, app, 3.7)	success	✅
install-pkg (macOS-12, app, 3.10)	success	✅
install-pkg (macOS-12, fabric, 3.7)	success	✅
install-pkg (macOS-12, fabric, 3.10)	success	✅
install-pkg (macOS-12, pytorch, 3.7)	success	✅
install-pkg (macOS-12, pytorch, 3.10)	success	✅
install-pkg (macOS-12, lightning, 3.7)	success	✅
install-pkg (macOS-12, lightning, 3.10)	success	✅
install-pkg (macOS-12, notset, 3.7)	success	✅
install-pkg (macOS-12, notset, 3.10)	success	✅
install-pkg (windows-2022, app, 3.7)	success	✅
install-pkg (windows-2022, app, 3.10)	success	✅
install-pkg (windows-2022, fabric, 3.7)	success	✅
install-pkg (windows-2022, fabric, 3.10)	success	✅
install-pkg (windows-2022, pytorch, 3.7)	success	✅
install-pkg (windows-2022, pytorch, 3.10)	success	✅
install-pkg (windows-2022, lightning, 3.7)	success	✅
install-pkg (windows-2022, lightning, 3.10)	success	✅
install-pkg (windows-2022, notset, 3.7)	success	✅
install-pkg (windows-2022, notset, 3.10)	success	✅

These checks are required after the changes to src/pytorch_lightning/loops/fit_loop.py, src/pytorch_lightning/trainer/trainer.py.

🟢 link-check

Check ID	Status
markdown-link-check	success	✅

These checks are required after the changes to src/pytorch_lightning/CHANGELOG.md.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

Run on_train_epoch_end after the LM for callbacks that monitor

586d2d4

carmocca added callback breaking change Includes a breaking change hooks Related to the hooks API pl Generic label for PyTorch Lightning package labels Jan 30, 2023

carmocca requested a review from williamFalcon as a code owner January 30, 2023 18:41

carmocca self-assigned this Jan 30, 2023

carmocca requested review from tchaton, awaelchli and justusschock as code owners January 30, 2023 18:41

carmocca added this to the 2.0 milestone Jan 30, 2023

carmocca added 2 commits January 30, 2023 19:57

Update early stopping tests

10a4af1

CHANGELOG

dc604d7

carmocca mentioned this pull request Jan 30, 2023

Remove memory-retaining epoch-end hooks #16520

Merged

awaelchli approved these changes Jan 31, 2023

View reviewed changes

carmocca enabled auto-merge (squash) February 1, 2023 02:21

justusschock approved these changes Feb 1, 2023

View reviewed changes

carmocca merged commit df09370 into master Feb 1, 2023

carmocca deleted the loops/monitoring-callbacks branch February 1, 2023 14:27

mergify bot added the ready PRs ready to be merged label Feb 1, 2023

carmocca mentioned this pull request Mar 21, 2023

on_validation_epoch_end() invocation order #17131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run `on_train_epoch_end` after the LM for callbacks that monitor #16567

Run `on_train_epoch_end` after the LM for callbacks that monitor #16567

carmocca commented Jan 30, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Jan 30, 2023 •

edited

Loading

Run on_train_epoch_end after the LM for callbacks that monitor #16567

Run on_train_epoch_end after the LM for callbacks that monitor #16567

Conversation

carmocca commented Jan 30, 2023 • edited by github-actions bot Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

github-actions bot commented Jan 30, 2023 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

Run `on_train_epoch_end` after the LM for callbacks that monitor #16567

Run `on_train_epoch_end` after the LM for callbacks that monitor #16567

carmocca commented Jan 30, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Jan 30, 2023 •

edited

Loading