-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify optimization Logic #4984
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4984 +/- ##
======================================
Coverage 93% 93%
======================================
Files 129 129
Lines 9372 9397 +25
======================================
+ Hits 8689 8713 +24
- Misses 683 684 +1 |
|
||
if not self.trainer.train_loop.automatic_optimization: | ||
trainer.scaler.unscale_(optimizer) | ||
trainer.call_hook("on_after_backward") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about the hook call here. We may want to find a better place than in the plugin.
Can it not live inside the backward closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that this hook is called only in manual optim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey both.
For automatic_optimization, it is fine. It is called on training_step_and_backard.
For manual_optimization, we can't predict when people are going to make a .step
. Therefore, the hook needs to be call after latest closure and before .step
.
However, I can move this to the precision plugin to reduce code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be taken care in a next PR @awaelchli. I will need to clean out training_loop for automatic_optimization
first, so it uses the same API as manual one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
if not self.trainer.train_loop.automatic_optimization: | ||
trainer.scaler.unscale_(optimizer) | ||
trainer.call_hook("on_after_backward") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that this hook is called only in manual optim?
Hello @tchaton! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-12-07 09:57:09 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you change this
https://github.com/PyTorchLightning/pytorch-lightning/blob/b00991efd8d6b7d1941d0eb3c1a499f95b4a3eea/pytorch_lightning/accelerators/accelerator.py#L163
to just
if self.trainer.testing
I'll close my #4982 then since these were the only 2 issues it's resolving.
raise MisconfigurationException( | ||
'When overriding `LightningModule` optimizer_step with `Trainer(automatic_optimization=True, ...)`,' | ||
' `accumulate_grad_batches` should to be 1. It ensures `optimizer_step` is called on every batch' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that? shouldn't I expect lightning to call optimizer.step
at the correct batch if I set accumulate_grad_batches>1
, whether I override it or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will.
However, when people are overriding optimizer_step. They might not pay attention to batch_idx indices.
They might do something like this.
Trainer(accumulate_grad_batches=2)
...
def optimizer_step(...):
batch_idx % 3 ==0: optimizer.step()
which will result in accumulated gradient of 6 and not 2 as expected.
It is why I found safer to make sure the optimizer_step
is called on every batch_idx.
trainer.train_loop.on_before_zero_grad(self) | ||
|
||
model.optimizer_zero_grad( | ||
trainer.current_epoch, | ||
trainer.batch_idx, | ||
optimizer, | ||
self._optimizer_idx | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't be called if enable_pl_optimizer=False
with automatic_optimization=True
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When people are using automatic_optimization=True
and enable_pl_optimizer=False
without overriding optimizer_step
.
We will wrap their optimizer only the time to run step. So this zero_grad
will be called.
However, if they override optimizer_step
with Trainer(automatic_optimization=True, enable_pl_optimizer=False), they will get a warning that we can't take care of zero_grad for them.
if not isinstance(optimizer, LightningOptimizer):
# wraps into LightingOptimizer only for running step
optimizer = LightningOptimizer.to_lightning_optimizer(optimizer, self.trainer)
optimizer.step(closure=optimizer_closure, *args, **kwargs)
if self.accumulate_grad_batches is None: | ||
return self._trainer.train_loop._accumulated_batches_reached() | ||
return (self._trainer.batch_idx + 1) % self.accumulate_grad_batches == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't _accumulate_grad_batches
always None here? I can't find it is set to a value anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, _accumulate_grad_batches is None.
People will be able to set an accumulated value if they want.
optimizer = LightningOptimizer(optimizer, 3)
or
optimizer.accumulate_grad_batches = 3
Or could even have random .step
.
optimizer.step(make_optimzer_step=np.random.randint(100) > 90)
Co-authored-by: Rohit Gupta <[email protected]>
What does this PR do?
This PR attempts uniformize between automatic and manual optimization.
It allows resolve a bug where optimizer_step from model wasn't called with LightningOptimizer.
Fixes # (issue)
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃