-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix for BF16 grad reductions with distopt #6340
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
8 tasks
ericharper
approved these changes
Mar 31, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
timmoon10
added a commit
to timmoon10/NeMo
that referenced
this pull request
Mar 31, 2023
* Debug distopt support for BF16 grad reductions Signed-off-by: Tim Moon <[email protected]> * Dump and load FP32 main params Signed-off-by: Tim Moon <[email protected]> * Style tweaks Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]>
8 tasks
ericharper
pushed a commit
that referenced
this pull request
Apr 3, 2023
* GPT support for BF16 grad reductions (#5920) * Add support for BF16 grad reductions with distopt Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> * Add custom functions to launch distopt communication in interleaved pipeline parallelism (#6183) Signed-off-by: Tim Moon <[email protected]> * Bugfix for BF16 grad reductions with distopt (#6340) * Debug distopt support for BF16 grad reductions Signed-off-by: Tim Moon <[email protected]> * Dump and load FP32 main params Signed-off-by: Tim Moon <[email protected]> * Style tweaks Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]>
mikolajblaz
added a commit
to mikolajblaz/NeMo
that referenced
this pull request
Apr 5, 2023
* Debug distopt support for BF16 grad reductions Signed-off-by: Tim Moon <[email protected]> * Dump and load FP32 main params Signed-off-by: Tim Moon <[email protected]> * Style tweaks Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]>
mikolajblaz
added a commit
to mikolajblaz/NeMo
that referenced
this pull request
Apr 5, 2023
* Debug distopt support for BF16 grad reductions Signed-off-by: Tim Moon <[email protected]> * Dump and load FP32 main params Signed-off-by: Tim Moon <[email protected]> * Style tweaks Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]>
ericharper
pushed a commit
that referenced
this pull request
Apr 5, 2023
* Debug distopt support for BF16 grad reductions * Dump and load FP32 main params * Style tweaks --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: Tim Moon <[email protected]>
hsiehjackson
pushed a commit
to hsiehjackson/NeMo
that referenced
this pull request
Jun 2, 2023
* Debug distopt support for BF16 grad reductions Signed-off-by: Tim Moon <[email protected]> * Dump and load FP32 main params Signed-off-by: Tim Moon <[email protected]> * Style tweaks Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> Signed-off-by: hsiehjackson <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
#5920 adds support for BF16 grad reductions with distopt, with embedding grad reductions done in FP32. @mikolajblaz found some bugs and it turns out the FP32 reductions were not being done at all. This PR fixes those issues. When I run GPT-3 175B, I confirm the embedding grads are now optimized with the FP32 optimizer. Loss values are the same with FP32 and BF16 grad reductions, up to 50 steps and within numerical accuracy.
This turned out messier than I would have liked. It would have been better to integrate distopt support for multiple grad dtypes into Apex.
Collection: NLP
Changelog
Usage
Set the optimizer to
distributed_fused_adam
in the config file:NeMo/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml
Line 207 in aaa0cca
Configure the optimizer with
grad_sync_dtype: bf16
.Before your PR is "Ready for review"
Pre checks:
PR Type:
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information
main
instead ofr1.17
. See that PR for discussions.