Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify optimization Logic #4984

Merged
merged 56 commits into from
Dec 7, 2020
Merged

Simplify optimization Logic #4984

merged 56 commits into from
Dec 7, 2020

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Dec 5, 2020

What does this PR do?

This PR attempts uniformize between automatic and manual optimization.

It allows resolve a bug where optimizer_step from model wasn't called with LightningOptimizer.

Fixes # (issue)

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

@tchaton tchaton self-assigned this Dec 5, 2020
@tchaton tchaton added the design Includes a design discussion label Dec 5, 2020
@tchaton tchaton added this to the 1.1 milestone Dec 5, 2020
@tchaton tchaton added the priority: 0 High priority task label Dec 5, 2020
@codecov
Copy link

codecov bot commented Dec 5, 2020

Codecov Report

Merging #4984 (dc9fa19) into master (ab7c947) will increase coverage by 0%.
The diff coverage is 93%.

@@          Coverage Diff           @@
##           master   #4984   +/-   ##
======================================
  Coverage      93%     93%           
======================================
  Files         129     129           
  Lines        9372    9397   +25     
======================================
+ Hits         8689    8713   +24     
- Misses        683     684    +1     

@tchaton tchaton marked this pull request as ready for review December 5, 2020 18:04

if not self.trainer.train_loop.automatic_optimization:
trainer.scaler.unscale_(optimizer)
trainer.call_hook("on_after_backward")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about the hook call here. We may want to find a better place than in the plugin.
Can it not live inside the backward closure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that this hook is called only in manual optim?

Copy link
Contributor Author

@tchaton tchaton Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey both.
For automatic_optimization, it is fine. It is called on training_step_and_backard.

For manual_optimization, we can't predict when people are going to make a .step. Therefore, the hook needs to be call after latest closure and before .step.

However, I can move this to the precision plugin to reduce code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be taken care in a next PR @awaelchli. I will need to clean out training_loop for automatic_optimization first, so it uses the same API as manual one.

pytorch_lightning/trainer/training_loop.py Show resolved Hide resolved
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pytorch_lightning/core/optimizer.py Outdated Show resolved Hide resolved
pytorch_lightning/core/optimizer.py Outdated Show resolved Hide resolved
pytorch_lightning/core/optimizer.py Outdated Show resolved Hide resolved
pytorch_lightning/core/optimizer.py Outdated Show resolved Hide resolved
pytorch_lightning/core/optimizer.py Outdated Show resolved Hide resolved
pytorch_lightning/core/optimizer.py Outdated Show resolved Hide resolved

if not self.trainer.train_loop.automatic_optimization:
trainer.scaler.unscale_(optimizer)
trainer.call_hook("on_after_backward")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that this hook is called only in manual optim?

pytorch_lightning/trainer/configuration_validator.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/configuration_validator.py Outdated Show resolved Hide resolved
@pep8speaks
Copy link

pep8speaks commented Dec 6, 2020

Hello @tchaton! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-07 09:57:09 UTC

@tchaton tchaton added the ready PRs ready to be merged label Dec 6, 2020
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you change this
https://github.com/PyTorchLightning/pytorch-lightning/blob/b00991efd8d6b7d1941d0eb3c1a499f95b4a3eea/pytorch_lightning/accelerators/accelerator.py#L163
to just
if self.trainer.testing

I'll close my #4982 then since these were the only 2 issues it's resolving.

Comment on lines 86 to 88
raise MisconfigurationException(
'When overriding `LightningModule` optimizer_step with `Trainer(automatic_optimization=True, ...)`,'
' `accumulate_grad_batches` should to be 1. It ensures `optimizer_step` is called on every batch'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that? shouldn't I expect lightning to call optimizer.step at the correct batch if I set accumulate_grad_batches>1, whether I override it or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will.
However, when people are overriding optimizer_step. They might not pay attention to batch_idx indices.
They might do something like this.

Trainer(accumulate_grad_batches=2)
...
def optimizer_step(...):
     batch_idx % 3 ==0: optimizer.step()

which will result in accumulated gradient of 6 and not 2 as expected.
It is why I found safer to make sure the optimizer_step is called on every batch_idx.

pytorch_lightning/trainer/configuration_validator.py Outdated Show resolved Hide resolved
Comment on lines +121 to +128
trainer.train_loop.on_before_zero_grad(self)

model.optimizer_zero_grad(
trainer.current_epoch,
trainer.batch_idx,
optimizer,
self._optimizer_idx
)
Copy link
Contributor

@rohitgr7 rohitgr7 Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't be called if enable_pl_optimizer=False with automatic_optimization=True??

Copy link
Contributor Author

@tchaton tchaton Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When people are using automatic_optimization=True and enable_pl_optimizer=False without overriding optimizer_step.

We will wrap their optimizer only the time to run step. So this zero_grad will be called.
However, if they override optimizer_step with Trainer(automatic_optimization=True, enable_pl_optimizer=False), they will get a warning that we can't take care of zero_grad for them.

        if not isinstance(optimizer, LightningOptimizer):
            # wraps into LightingOptimizer only for running step
            optimizer = LightningOptimizer.to_lightning_optimizer(optimizer, self.trainer)
        optimizer.step(closure=optimizer_closure, *args, **kwargs)

Comment on lines +94 to +96
if self.accumulate_grad_batches is None:
return self._trainer.train_loop._accumulated_batches_reached()
return (self._trainer.batch_idx + 1) % self.accumulate_grad_batches == 0
Copy link
Contributor

@rohitgr7 rohitgr7 Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't _accumulate_grad_batches always None here? I can't find it is set to a value anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, _accumulate_grad_batches is None.
People will be able to set an accumulated value if they want.

optimizer = LightningOptimizer(optimizer, 3)

or 

optimizer.accumulate_grad_batches = 3

Or could even have random .step.

optimizer.step(make_optimzer_step=np.random.randint(100) > 90)

pytorch_lightning/core/optimizer.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants