-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add learning rate scheduling support for DeepSpeedStrategy
#20320
base: master
Are you sure you want to change the base?
Conversation
Thanks for the contribution @amorehead! Let's get to a green CI and take it from there |
hey @amorehead looks like CI failures are legit, let me know if you can fix those |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @amorehead! I added a few comments. Essentially we need to turn this into a non-breaking change.
Also a small update to docs is needed.
|
||
Currently, only a single optimizer is supported. | ||
self, module: Module, optimizers: list[Optimizer], scheduler: Optional[_LRScheduler] = None | ||
) -> tuple["DeepSpeedEngine", list[Optimizer], Optional[_LRScheduler]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return None
, we need to return Any
here so we can ignore the scheduler
if it is not provided in input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Addressed in this commit.
src/lightning/fabric/fabric.py
Outdated
@@ -266,7 +269,7 @@ def setup( | |||
|
|||
if optimizers: | |||
# join both types in a tuple for API convenience | |||
return (module, *optimizers) | |||
return (module, *optimizers, scheduler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, it will cause existing user code to fail, because scheduler
is returned unconditionally.
Since scheduler
is Optional
in the signature, I suggest we only return it if it was not None
as an argument, so we won't break anyone's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Addressed in this commit.
optimizer: Optional[Optimizer] = None, | ||
) -> tuple["DeepSpeedEngine", Optimizer]: | ||
self, model: Module, optimizer: Optional[Optimizer] = None, scheduler: Optional[_LRScheduler] = None | ||
) -> tuple["DeepSpeedEngine", Optimizer, Optional[_LRScheduler]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in this commit.
@@ -104,7 +104,10 @@ def pl_worker_init_function(worker_id: int, rank: Optional[int] = None) -> None: | |||
if _NUMPY_AVAILABLE: | |||
import numpy as np | |||
|
|||
np.random.seed(seed_sequence[3] & 0xFFFFFFFF) # numpy takes 32-bit seed only | |||
ss = np.random.SeedSequence([base_seed, worker_id, global_rank]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated change, it shouldn't be included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is now merged into master
per this previous pull request, is this comment still relevant?
@amorehead I'm wrapping up the last few PRs for the release. Do you have time to fix this one in the next couple of days? |
@lantiga, apologies. Just now getting to fixing this pull request up. I've updated the |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20320 +/- ##
========================================
- Coverage 88% 87% -1%
========================================
Files 267 267
Lines 23380 23385 +5
========================================
- Hits 20481 20337 -144
- Misses 2899 3048 +149 |
What does this PR do?
DeepSpeedStrategy
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--20320.org.readthedocs.build/en/20320/