-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor load in checkpoint connector #4593
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tarepan
requested review from
ananyahjha93,
awaelchli,
Borda,
justusschock,
nateraw,
SeanNaren,
tchaton,
teddykoker and
williamFalcon
as code owners
November 9, 2020 20:28
Codecov Report
@@ Coverage Diff @@
## master #4593 +/- ##
======================================
- Coverage 93% 93% -0%
======================================
Files 134 134
Lines 9909 9907 -2
======================================
- Hits 9208 9205 -3
- Misses 701 702 +1 |
tchaton
reviewed
Nov 16, 2020
Co-authored-by: chaton <[email protected]>
Hello @tarepan! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-12-13 15:04:53 UTC |
justusschock
approved these changes
Nov 30, 2020
@Borda Yes, ready to go. |
tchaton
approved these changes
Dec 3, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
awaelchli
approved these changes
Dec 3, 2020
Co-authored-by: Adrian Wälchli <[email protected]>
gianscarpe
added a commit
to gianscarpe/pytorch-lightning
that referenced
this pull request
Dec 15, 2020
commit f374782c65ff01f9814457c707b214698b0f27c6 Author: gianscarpe <[email protected]> Date: Tue Dec 15 18:28:21 2020 +0100 Rebasing onto master commit 84bb9db Author: Jirka Borovec <[email protected]> Date: Mon Dec 14 22:46:14 2020 +0100 simplify changelog (Lightning-AI#5135) commit 69123af Author: Tadej Svetina <[email protected]> Date: Mon Dec 14 20:13:58 2020 +0100 Fix hanging metrics tests (Lightning-AI#5134) commit eb9cb3c Author: Shachar Mirkin <[email protected]> Date: Mon Dec 14 13:39:29 2020 +0100 Add Google Colab badges (Lightning-AI#5111) * Add colab badges to notebook Add colab badges to notebook to notebooks 4 & 5 * Add colab badges Co-authored-by: chaton <[email protected]> commit 0327f6b Author: Carlos Mocholí <[email protected]> Date: Mon Dec 14 08:38:10 2020 +0100 Do not warn when the name key is used in the lr_scheduler dict (Lightning-AI#5057) * Do not warn when the name key is used * Missing line * Consistency * Update pytorch_lightning/callbacks/lr_monitor.py * Update docs * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <[email protected]> * Update CHANGELOG Co-authored-by: Rohit Gupta <[email protected]> commit 16feb51 Author: tarepan <[email protected]> Date: Mon Dec 14 01:13:50 2020 +0900 Refactor load in checkpoint connector (Lightning-AI#4593) * Refactor load step commentaries * Refactor hpc ckpt suffix acquisition * Refactor restore/hpc_load match * Refactor hpc load trial * Refactor checkpoint dir check * Refactor unneeded function nest * Refactor nested If * Refactor duplicated cache clear * Refactor attempt flow with if/elif * Fix pip8 * Refactor hook commentary Co-authored-by: chaton <[email protected]> * Fix pep8 * Refactor hpc load checkpoint path acquisition * Fix pip8 * Fix doc Co-authored-by: Adrian Wälchli <[email protected]> * Refactor None Union type with Optional Co-authored-by: chaton <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Roger Shieh <[email protected]> commit 398f122 Author: Carlos Mocholí <[email protected]> Date: Sun Dec 13 16:04:16 2020 +0100 Improve some tests (Lightning-AI#5049) * Improve some tests * Add TrainerState asserts Co-authored-by: Roger Shieh <[email protected]> commit a49291d Author: Jirka Borovec <[email protected]> Date: Sat Dec 12 17:21:19 2020 +0100 drop unused test with result api (Lightning-AI#5058) Co-authored-by: chaton <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> commit b50ad9e Author: Jirka Borovec <[email protected]> Date: Sat Dec 12 15:55:11 2020 +0100 split tests for deprecated api (Lightning-AI#5071) * imports * imports * flake8 Co-authored-by: Rohit Gupta <[email protected]> commit 3100b78 Author: Rohit Gupta <[email protected]> Date: Sat Dec 12 15:47:03 2020 +0530 Allow any input in to_onnx and to_torchscript (Lightning-AI#4378) * branch merge * sample * update with valid input tensors * pep * pathlib * Updated with BoringModel and added more input types * try fix * pep * skip test with torch < 1.4 * fix test * Apply suggestions from code review * update tests * Allow any input in to_onnx and to_torchscript * Update tests/models/test_torchscript.py Co-authored-by: Adrian Wälchli <[email protected]> * no_grad * try fix random failing test * rm example_input_array * rm example_input_array Co-authored-by: chaton <[email protected]> Co-authored-by: Jeff Yang <[email protected]> Co-authored-by: Roger Shieh <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: edenlightning <[email protected]> commit b5a2afd Author: Roger Shieh <[email protected]> Date: Sat Dec 12 15:14:17 2020 +0800 Remove beta arg from F1 class and functional (Lightning-AI#5076) * remove beta from F1 * remove from functional Co-authored-by: Teddy Koker <[email protected]> commit 0de43d1 Author: Roger Shieh <[email protected]> Date: Sat Dec 12 14:23:55 2020 +0800 Fix docs metrics formatting (Lightning-AI#5077) * fix functional f1 fbeta formatting * Update f_beta.py * remove line breaks * Update f_beta.py add line breaks and pad * pad linea breaks with 2 spaces instead of tab commit d38e4d1 Author: skhiuk <[email protected]> Date: Sat Dec 12 14:00:32 2020 +0900 fix: MNIST minor bug (Lightning-AI#5075) Co-authored-by: Roger Shieh <[email protected]> commit 5f34f2b Author: edenlightning <[email protected]> Date: Fri Dec 11 20:42:04 2020 -0500 Update installation instructions for FairScale (Lightning-AI#5099) Co-authored-by: Jirka Borovec <[email protected]> commit 63fb7f9 Author: Jirka Borovec <[email protected]> Date: Sat Dec 12 00:17:19 2020 +0100 CI: upload report only on failer (Lightning-AI#5086) * CI: upload report only on failer * Apply suggestions from code review Co-authored-by: chaton <[email protected]> * Apply suggestions from code review Co-authored-by: chaton <[email protected]> Co-authored-by: Roger Shieh <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> commit 1e501f0 Author: Jirka Borovec <[email protected]> Date: Fri Dec 11 22:56:19 2020 +0100 add back compatibility for deprecated metrics 2/n (Lightning-AI#5068) * add back compatibility for deprecated metrics * fix * imports * imports commit 4a3f906 Author: Jirka Borovec <[email protected]> Date: Fri Dec 11 22:11:21 2020 +0100 add back compatibility for deprecated metrics 1/n (Lightning-AI#5067) * add back compatibility for metrics * tests * Add deprecated metric utility functions back to functional (Lightning-AI#5062) * add back *deprecated* metric utility functions to functional * pep * pep * suggestions * move Co-authored-by: Jirka Borovec <[email protected]> * more * fix * import * docs * tests * fix Co-authored-by: Teddy Koker <[email protected]> commit ddc3757 Author: chaton <[email protected]> Date: Fri Dec 11 21:21:25 2020 +0100 Pre release (Lightning-AI#5098) * add rc release * update changelog * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Rohit Gupta <[email protected]> commit 1a970b2 Author: chaton <[email protected]> Date: Fri Dec 11 20:24:59 2020 +0100 [hotfix] Extend Optimizer + update doc (Lightning-AI#5095) * resolve urgent bug * update pr * update doc * update * remove typo * add defaults * Update pytorch_lightning/__init__.py * Update setup.py * update doc * Update docs/source/optimizers.rst Co-authored-by: Jirka Borovec <[email protected]> * update * resolve doc * debug test * update test * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <[email protected]> * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <[email protected]> * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <[email protected]> * remove useless import * Update docs/source/optimizers.rst Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> commit 74171ef Author: Jirka Borovec <[email protected]> Date: Fri Dec 11 18:42:53 2020 +0100 drop duplicate metrics (Lightning-AI#5014) * drop duplicate metrics * keep * fix commit 7755572 Author: chaton <[email protected]> Date: Fri Dec 11 14:51:45 2020 +0100 Check if optimizer supports closure (Lightning-AI#4981) * check if optimizer support closure * cleanup test * resolve tests * resolve flake * update test due to patch limit * update * update dep * Update tests/core/test_lightning_optimizer.py Co-authored-by: Rohit Gupta <[email protected]> * Update tests/core/test_lightning_optimizer.py Co-authored-by: Rohit Gupta <[email protected]> * resolve bug * update test * resolve tests * Update requirements/extra.txt Co-authored-by: Jirka Borovec <[email protected]> * remove bolts dep * remove bolts * add missing bolts dep for tests * remove need for bolts Co-authored-by: Rohit Gupta <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> commit 4e6a871 Author: Rohit Gupta <[email protected]> Date: Fri Dec 11 15:07:32 2020 +0530 Added CHANGELOG section (Lightning-AI#5065) commit 7e8673d Author: Alan Du <[email protected]> Date: Thu Dec 10 12:26:02 2020 -0500 Update DDP docs (Lightning-AI#5046) * Fix flake8 error to fix CI * Correct weights-loading to use correct callbacks * Fix dangling links Co-authored-by: chaton <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> commit 2c3d43d Author: chaton <[email protected]> Date: Thu Dec 10 15:24:44 2020 +0100 Initialize trainer with None in DDPAccelerator (Lightning-AI#4915) * Initialize trainer with None * add typing to all accelerators * resolve imports * update * add typing * removed typo * update * Fix formatting and imports in accelerator Co-authored-by: maxjeblick <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: SeanNaren <[email protected]> Co-authored-by: Roger Shieh <[email protected]> commit d5fa02e Author: Jirka Borovec <[email protected]> Date: Thu Dec 10 14:06:13 2020 +0100 simplify accelerator steps (Lightning-AI#5015) * simplify accelerator steps * Apply suggestions from code review Co-authored-by: Rohit Gupta <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> commit 820d5c7 Author: Hemil Desai <[email protected]> Date: Thu Dec 10 16:26:18 2020 +0530 Add a notebook example to reach a quick baseline of ~94% accuracy on CIFAR (Lightning-AI#4818) * Add a notebook example to reach a quick baseline of ~94% accuracy on CIFAR10 using Resnet in Lightning * Remove outputs * PR Feedback * some changes * some more changes Co-authored-by: chaton <[email protected]> Co-authored-by: rohitgr7 <[email protected]> commit 4ebce38 Author: Jirka Borovec <[email protected]> Date: Thu Dec 10 11:01:33 2020 +0100 update usage of deprecated automatic_optimization (Lightning-AI#5011) * drop deprecated usage automatic_optimization * Apply suggestions from code review Co-authored-by: Adrian Wälchli <[email protected]> * Apply suggestions from code review Co-authored-by: Rohit Gupta <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> commit 77fb425 Author: Jirka Borovec <[email protected]> Date: Thu Dec 10 08:38:14 2020 +0100 update usage of deprecated profiler (Lightning-AI#5010) * drop deprecated profiler * lut Co-authored-by: Roger Shieh <[email protected]> commit cdbddbe Author: Jirka Borovec <[email protected]> Date: Thu Dec 10 01:52:39 2020 +0100 release 1.1.0 (Lightning-AI#5048) * release 1.1.0 * pep8 commit 05f25f3 Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 20:14:34 2020 +0100 update usage of deprecated checkpoint_callback (Lightning-AI#5006) * drop usage of deprecated checkpoint_callback * fix * fix commit ce91795 Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 20:13:57 2020 +0100 ref: clean config [1/n] add intermediate setters (Lightning-AI#4990) * add intermediate setters * show inputs * fix options * move * fix * less talk * fix * talk less * str * cases * rename Co-authored-by: chaton <[email protected]> commit 068502f Author: Francisco J. H. Heras <[email protected]> Date: Wed Dec 9 19:13:13 2020 +0000 Loss format from .3f to .3g in the tqdm (Lightning-AI#4972) Co-authored-by: Jirka Borovec <[email protected]> commit bcbba3b Author: Rohit Gupta <[email protected]> Date: Thu Dec 10 00:42:44 2020 +0530 Simplify GPU and TPU accelerator (Lightning-AI#5024) commit 90d1d9f Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 19:05:12 2020 +0100 drop deprecated reorder from AUC (Lightning-AI#5004) * drop deprecated reorder from AUC * chlog * fix * fix * simple * fix * fix * fix Co-authored-by: Roger Shieh <[email protected]> commit 20b806a Author: chaton <[email protected]> Date: Wed Dec 9 16:31:18 2020 +0000 [feat] 3/n pp (Lightning-AI#5036) * add pp doc * udpate doc * update doc * update doc * Update docs * update doc * udpate * update doc * update doc * Formatting, update sharded zip link * Update docs/source/multi_gpu.rst Co-authored-by: Carlos Mocholí <[email protected]> * Apply suggestions from code review * Reference directly to section Co-authored-by: SeanNaren <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> commit 69725ad Author: Carlos Mocholí <[email protected]> Date: Wed Dec 9 16:48:46 2020 +0100 Add carmocca to core (Lightning-AI#5038) Co-authored-by: Roger Shieh <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> commit cff2489 Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 15:53:49 2020 +0100 fix GH release badges (Lightning-AI#5040) * fix GH release badges * rtd commit ef8ef12 Author: chaton <[email protected]> Date: Wed Dec 9 12:56:51 2020 +0000 [feat] pp 2/n (Lightning-AI#5026) * Added changes for RPC plugin * Add missing kwargs * Fix code format * Loading refactors by introducing is_distributed var, fix optimizer step flow * Add rpc guard * Added docstrings and typing * resolve comments * Add additional rpc hook, refactor name of exit process hook for clarity * remove annotation * Modify behaviour to allow optional return, add test for rpc plugin * resolve tests * rename is_ddp_based * update * update for windows * update * resolve test * code smell * Added sequential plugin * resolve bug * update * cleanup * add Exception * resolve docs * Remove ddp support * Revert distributed -> ddp * Update pl_examples/basic_examples/conv_sequential_example.py Co-authored-by: Jirka Borovec <[email protected]> * Update pl_examples/basic_examples/conv_sequential_example.py Co-authored-by: Jirka Borovec <[email protected]> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Jirka Borovec <[email protected]> * Address code review points * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Jirka Borovec <[email protected]> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Jirka Borovec <[email protected]> * Add missing return * Fix formatting, add datamodule args * add small comment * resolve comments * resolve comments * update source for fairscale * update extras * remove staticmethod * resolve flake8 * Skip tests that are failing due to bug upstream with multiple optimizers and shard * update * update on comments * clean test * latest comments * remove old comments * add todo * Update version * update * resolve bugs * resolve bugs * update test * remove hanging test * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <[email protected]> * resolve on comments * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <[email protected]> * resolve on comments * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <[email protected]> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <[email protected]> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <[email protected]> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <[email protected]> * remove ImportError Co-authored-by: SeanNaren <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> commit 7d9784e Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 12:53:37 2020 +0100 adding missing changelogs (Lightning-AI#5019) * adding missing changelogs * Apply suggestions from code review Co-authored-by: Rohit Gupta <[email protected]> * Apply suggestions from code review Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> commit 6a99d95 Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 11:53:22 2020 +0100 fix ci: release (Lightning-AI#5037) commit e2c404b Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 10:59:44 2020 +0100 CI: update badges for release (Lightning-AI#5002) * fix images * not sleep * a0 * path * assets * assets * bitecode * rls * rls * badges * fix * org * drop * clean * codecov * fix * clean commit 53d7c95 Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 09:18:23 2020 +0100 drop usage of deprecated distributed_backend (Lightning-AI#5009) Co-authored-by: chaton <[email protected]> Co-authored-by: Roger Shieh <[email protected]> commit 2c11d96 Author: Jirka Borovec <[email protected]> Date: Wed Dec 9 03:57:11 2020 +0100 replace pyright by mypy (Lightning-AI#5021) * drop pyright & add mypy * detail * name * fix * flake8 * ver Co-authored-by: Sean Naren <[email protected]> commit 127454a Author: Ananya Harsh Jha <[email protected]> Date: Tue Dec 8 18:20:01 2020 -0500 All gatherwith grads (Lightning-AI#5012) * all_gather * ddp * horovod * grad tests * fixed ddp * ddp fixed, removed tpu, horovod for now * changelog * windows fix * windows fix * removed batch from ctx * all_gather * ddp * horovod * grad tests * fixed ddp * ddp fixed, removed tpu, horovod for now * changelog * windows fix * windows fix * removed batch from ctx * removed code duplication * merge Co-authored-by: Jirka Borovec <[email protected]> commit ee9b3fe Author: Sean Naren <[email protected]> Date: Tue Dec 8 22:02:10 2020 +0000 [feat] pp 1/n (Lightning-AI#5016) * Added changes for RPC plugin * Add missing kwargs * Fix code format * Loading refactors by introducing is_distributed var, fix optimizer step flow * Add rpc guard * Added docstrings and typing * resolve comments * Add additional rpc hook, refactor name of exit process hook for clarity * remove annotation * Modify behaviour to allow optional return, add test for rpc plugin * resolve tests * rename is_ddp_based * update * update for windows * update * resolve test * code smell * Revert back to init_ddp_connection for backwards compat * Swap to explicit name for property * Add missing speed parity increase for CI variability, fix call counts for child process Co-authored-by: tchaton <[email protected]> commit ddd3eda Author: brett koonce <[email protected]> Date: Tue Dec 8 15:27:43 2020 -0600 docs: minor spelling tweaks (Lightning-AI#5022) commit 6d2aeff Author: Rohit Gupta <[email protected]> Date: Wed Dec 9 01:37:53 2020 +0530 fast_dev_run can be int (Lightning-AI#4629) * fast_dev_run can be int * pep * chlog * add check and update docs * logging with fdr * update docs * suggestions Co-authored-by: Carlos Mocholí <[email protected]> * fdr flush logs * update trainer.fast_dev_run * codefactor and pre-commit isort * tmp Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Roger Shieh <[email protected]> Co-authored-by: edenlightning <[email protected]> commit 79ae66d Author: maxjeblick <[email protected]> Date: Tue Dec 8 18:19:55 2020 +0100 Initialize trainer with None (Lightning-AI#4847) Co-authored-by: Sean Naren <[email protected]> Co-authored-by: chaton <[email protected]> Co-authored-by: edenlightning <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Refactor checkpoint load in CheckpointConnector.
Only refactoring for code-health, no functional changes.
Two main refactored points:
restore
(normal load) &hpc_load
(hpc load)Commits are well-separated based on refactored parts.
If changes are no apparent in "Files changed" diff, please check "Commits" diff.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In in short, see following bullet-list:
Did you have fun?
Make sure you had fun coding 👍