Simplify optimization Logic #4984

tchaton · 2020-12-05T14:33:39Z

What does this PR do?

This PR attempts uniformize between automatic and manual optimization.

It allows resolve a bug where optimizer_step from model wasn't called with LightningOptimizer.

Fixes # (issue)

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

…ng manual optimization

This reverts commit ccca6b6

…htning/pytorch-lightning into bug/fixfix_ddp_manual

codecov · 2020-12-05T18:00:41Z

Codecov Report

Merging #4984 (dc9fa19) into master (ab7c947) will increase coverage by 0%.
The diff coverage is 93%.

@@          Coverage Diff           @@
##           master   #4984   +/-   ##
======================================
  Coverage      93%     93%           
======================================
  Files         129     129           
  Lines        9372    9397   +25     
======================================
+ Hits         8689    8713   +24     
- Misses        683     684    +1

…/pytorch-lightning into fix/legacy_rout

awaelchli · 2020-12-05T20:57:27Z

pytorch_lightning/plugins/native_amp.py

+
+        if not self.trainer.train_loop.automatic_optimization:
+            trainer.scaler.unscale_(optimizer)
+            trainer.call_hook("on_after_backward")


not sure about the hook call here. We may want to find a better place than in the plugin.
Can it not live inside the backward closure?

does it mean that this hook is called only in manual optim?

Hey both.
For automatic_optimization, it is fine. It is called on training_step_and_backard.

For manual_optimization, we can't predict when people are going to make a .step. Therefore, the hook needs to be call after latest closure and before .step.

However, I can move this to the precision plugin to reduce code.

This will be taken care in a next PR @awaelchli. I will need to clean out training_loop for automatic_optimization first, so it uses the same API as manual one.

pytorch_lightning/trainer/training_loop.py

pytorch_lightning/core/lightning.py

Borda

lgtm

pytorch_lightning/core/optimizer.py

Borda · 2020-12-06T09:53:54Z

pytorch_lightning/plugins/native_amp.py

+
+        if not self.trainer.train_loop.automatic_optimization:
+            trainer.scaler.unscale_(optimizer)
+            trainer.call_hook("on_after_backward")


does it mean that this hook is called only in manual optim?

pytorch_lightning/trainer/configuration_validator.py

pep8speaks · 2020-12-06T09:58:06Z

Hello @tchaton! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-07 09:57:09 UTC

pytorch_lightning/core/optimizer.py

…htning/pytorch-lightning into fix/legacy_rout

rohitgr7

Also, can you change this
https://github.com/PyTorchLightning/pytorch-lightning/blob/b00991efd8d6b7d1941d0eb3c1a499f95b4a3eea/pytorch_lightning/accelerators/accelerator.py#L163
to just
if self.trainer.testing

I'll close my #4982 then since these were the only 2 issues it's resolving.

rohitgr7 · 2020-12-06T18:53:14Z

pytorch_lightning/trainer/configuration_validator.py

+            raise MisconfigurationException(
+                'When overriding `LightningModule` optimizer_step with `Trainer(automatic_optimization=True, ...)`,'
+                ' `accumulate_grad_batches` should to be 1. It ensures `optimizer_step` is called on every batch'


why is that? shouldn't I expect lightning to call optimizer.step at the correct batch if I set accumulate_grad_batches>1, whether I override it or not?

Yes, it will.
However, when people are overriding optimizer_step. They might not pay attention to batch_idx indices.
They might do something like this.

Trainer(accumulate_grad_batches=2) ... def optimizer_step(...): batch_idx % 3 ==0: optimizer.step()

which will result in accumulated gradient of 6 and not 2 as expected.
It is why I found safer to make sure the optimizer_step is called on every batch_idx.

pytorch_lightning/trainer/configuration_validator.py

rohitgr7 · 2020-12-06T19:19:17Z

pytorch_lightning/core/optimizer.py

+        trainer.train_loop.on_before_zero_grad(self)
+
+        model.optimizer_zero_grad(
+            trainer.current_epoch,
+            trainer.batch_idx,
+            optimizer,
+            self._optimizer_idx
+        )


it won't be called if enable_pl_optimizer=False with automatic_optimization=True??

When people are using automatic_optimization=True and enable_pl_optimizer=False without overriding optimizer_step.

We will wrap their optimizer only the time to run step. So this zero_grad will be called.
However, if they override optimizer_step with Trainer(automatic_optimization=True, enable_pl_optimizer=False), they will get a warning that we can't take care of zero_grad for them.

if not isinstance(optimizer, LightningOptimizer): # wraps into LightingOptimizer only for running step optimizer = LightningOptimizer.to_lightning_optimizer(optimizer, self.trainer) optimizer.step(closure=optimizer_closure, *args, **kwargs)

rohitgr7 · 2020-12-06T19:31:26Z

pytorch_lightning/core/optimizer.py

+        if self.accumulate_grad_batches is None:
+            return self._trainer.train_loop._accumulated_batches_reached()
+        return (self._trainer.batch_idx + 1) % self.accumulate_grad_batches == 0


isn't _accumulate_grad_batches always None here? I can't find it is set to a value anywhere.

Yes, _accumulate_grad_batches is None.
People will be able to set an accumulated value if they want.

optimizer = LightningOptimizer(optimizer, 3) or optimizer.accumulate_grad_batches = 3

Or could even have random .step.

optimizer.step(make_optimzer_step=np.random.randint(100) > 90)

pytorch_lightning/core/optimizer.py

pytorch_lightning/trainer/configuration_validator.py

Co-authored-by: Rohit Gupta <[email protected]>

…/pytorch-lightning into fix/legacy_rout

SeanNaren and others added 16 commits December 3, 2020 23:05

Rely on ddp plugin for blocking sync behaviour, and skip if we're usi…

072c272

…ng manual optimization

debug

ccca6b6

Revert "debug"

e4113fc

This reverts commit ccca6b6

Expose manual reduce for automatic optimization

8c11440

Add input arguments

cab1107

Enable parity test

b0b7d22

clean imports

ab0d892

Expose hook after to ensure we reset

7a33458

Fix naming

fad427c

Merge branch 'master' into feature/fix_ddp_manual

9f3eaea

add

20c9687

Merge branch 'master' into bug/fixfix_ddp_manual

5bb6a7f

Merge branch 'master' into bug/fixfix_ddp_manual

1fe943e

fix test

b0252e4

Merge branch 'bug/fixfix_ddp_manual' of https://github.com/PyTorchLig…

6a30906

…htning/pytorch-lightning into bug/fixfix_ddp_manual

uniformize optimizer logic

5ea9bd8

tchaton mentioned this pull request Dec 5, 2020

Remove zero_grad call in LightningOptimizer #4982

Closed

11 tasks

tchaton self-assigned this Dec 5, 2020

tchaton added the design Includes a design discussion label Dec 5, 2020

tchaton added this to the 1.1 milestone Dec 5, 2020

tchaton added the priority: 0 High priority task label Dec 5, 2020

tchaton added 5 commits December 5, 2020 14:51

resolve test

2e96ec5

resovle flake8

2562c96

Merge branch 'master' into fix/legacy_rout

0e488d4

Merge branch 'master' into bug/fixfix_ddp_manual

2cb807f

resolve amp bug

26b8b63

tchaton marked this pull request as ready for review December 5, 2020 18:04

tchaton requested review from awaelchli and Borda as code owners December 5, 2020 18:04

tchaton added 2 commits December 5, 2020 21:14

resolve bug

df8133c

Merge branch 'fix/legacy_rout' of https://github.com/PyTorchLightning…

a0498a2

…/pytorch-lightning into fix/legacy_rout

awaelchli approved these changes Dec 5, 2020

View reviewed changes

SeanNaren reviewed Dec 5, 2020

View reviewed changes

pytorch_lightning/core/lightning.py Show resolved Hide resolved

SeanNaren approved these changes Dec 5, 2020

View reviewed changes

Borda approved these changes Dec 6, 2020

View reviewed changes

Apply suggestions from code review

fc087ac

Borda reviewed Dec 6, 2020

View reviewed changes

pytorch_lightning/core/optimizer.py Outdated Show resolved Hide resolved

tchaton added 5 commits December 6, 2020 10:18

Merge branch 'master' into bug/fixfix_ddp_manual

9c9ba39

update on comments

99fda97

Merge branch 'master' into fix/legacy_rout

fd45709

Merge branch 'bug/fixfix_ddp_manual' of https://github.com/PyTorchLig…

5779ff9

…htning/pytorch-lightning into fix/legacy_rout

resolve bugs

9d7cf13

tchaton added the ready PRs ready to be merged label Dec 6, 2020

tchaton added 2 commits December 6, 2020 17:01

remove tests

991371a

Merge branch 'master' into fix/legacy_rout

d21cff2

rohitgr7 reviewed Dec 6, 2020

View reviewed changes

pytorch_lightning/trainer/configuration_validator.py Outdated Show resolved Hide resolved

tchaton and others added 4 commits December 7, 2020 08:13

Update pytorch_lightning/trainer/configuration_validator.py

e65a174

Co-authored-by: Rohit Gupta <[email protected]>

simplify testing

c211ccd

Merge branch 'fix/legacy_rout' of https://github.com/PyTorchLightning…

e3805ee

…/pytorch-lightning into fix/legacy_rout

add more tests

a5599fb

justusschock approved these changes Dec 7, 2020

View reviewed changes

Merge branch 'master' into fix/legacy_rout

dc9fa19

williamFalcon approved these changes Dec 7, 2020

View reviewed changes

tchaton merged commit 02152c1 into master Dec 7, 2020

tchaton deleted the fix/legacy_rout branch December 7, 2020 12:55

carmocca mentioned this pull request Jun 9, 2021

Allow overriding optimizer_zero_grad and/or optimizer_step when using accumulate_grad_batches #6910

Closed

DavidMChan mentioned this pull request Jun 15, 2021

[feat] Allow overriding optimizer_zero_grad and/or optimizer_step when using accumulate_grad_batches #7980

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify optimization Logic #4984

Simplify optimization Logic #4984

tchaton commented Dec 5, 2020 •

edited

Loading

codecov bot commented Dec 5, 2020 •

edited

Loading

awaelchli Dec 5, 2020

Borda Dec 6, 2020

tchaton Dec 6, 2020 •

edited

Loading

tchaton Dec 6, 2020

Borda left a comment

Borda Dec 6, 2020

pep8speaks commented Dec 6, 2020 •

edited

Loading

rohitgr7 left a comment

rohitgr7 Dec 6, 2020

tchaton Dec 7, 2020

rohitgr7 Dec 6, 2020 •

edited

Loading

tchaton Dec 7, 2020 •

edited

Loading

rohitgr7 Dec 6, 2020 •

edited

Loading

tchaton Dec 7, 2020

Simplify optimization Logic #4984

Simplify optimization Logic #4984

Conversation

tchaton commented Dec 5, 2020 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Dec 5, 2020 • edited Loading

Codecov Report

awaelchli Dec 5, 2020

Choose a reason for hiding this comment

Borda Dec 6, 2020

Choose a reason for hiding this comment

tchaton Dec 6, 2020 • edited Loading

Choose a reason for hiding this comment

tchaton Dec 6, 2020

Choose a reason for hiding this comment

Borda left a comment

Choose a reason for hiding this comment

Borda Dec 6, 2020

Choose a reason for hiding this comment

pep8speaks commented Dec 6, 2020 • edited Loading

Comment last updated at 2020-12-07 09:57:09 UTC

rohitgr7 left a comment

Choose a reason for hiding this comment

rohitgr7 Dec 6, 2020

Choose a reason for hiding this comment

tchaton Dec 7, 2020

Choose a reason for hiding this comment

rohitgr7 Dec 6, 2020 • edited Loading

Choose a reason for hiding this comment

tchaton Dec 7, 2020 • edited Loading

Choose a reason for hiding this comment

rohitgr7 Dec 6, 2020 • edited Loading

Choose a reason for hiding this comment

tchaton Dec 7, 2020

Choose a reason for hiding this comment

tchaton commented Dec 5, 2020 •

edited

Loading

codecov bot commented Dec 5, 2020 •

edited

Loading

tchaton Dec 6, 2020 •

edited

Loading

pep8speaks commented Dec 6, 2020 •

edited

Loading

rohitgr7 Dec 6, 2020 •

edited

Loading

tchaton Dec 7, 2020 •

edited

Loading

rohitgr7 Dec 6, 2020 •

edited

Loading