-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade NeMo to latest mcore and TE #7862
Conversation
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: dimapihtar <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml
Outdated
Show resolved
Hide resolved
examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
jenkins |
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
jenkins |
Signed-off-by: dimapihtar <[email protected]>
jenkins |
2 similar comments
jenkins |
jenkins |
Signed-off-by: dimapihtar <[email protected]>
jenkins |
@@ -3150,51 +3160,51 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' | |||
sh "rm -rf examples/nlp/language_modeling/gpt_index_mappings" | |||
} | |||
} | |||
stage('L2: Megatron GPT with Rope Pretraining and Resume Training TP=2') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericharper this one uses position_embedding_type=rope
which causes an error (TE bug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not even using mcore_gpt or transformer_engine flags though....
Signed-off-by: dimapihtar <[email protected]>
jenkins |
Signed-off-by: dimapihtar <[email protected]>
jenkins |
Signed-off-by: dimapihtar <[email protected]>
jenkins |
Signed-off-by: dimapihtar <[email protected]>
jenkins |
Signed-off-by: eharper <[email protected]>
Signed-off-by: eharper <[email protected]>
jenkins |
Signed-off-by: eharper <[email protected]>
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
* mcore upgrade Signed-off-by: dimapihtar <[email protected]> * remove GPTEmbedding import (deprecated) Signed-off-by: dimapihtar <[email protected]> * switch to LanguageModelEmbedding Signed-off-by: dimapihtar <[email protected]> * reset config Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pass attn_mask-type through the forward method Signed-off-by: dimapihtar <[email protected]> * reset conf Signed-off-by: dimapihtar <[email protected]> * add more attn_mask_type params Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attn_,ask_type fixes Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove attn_mask_type param Signed-off-by: dimapihtar <[email protected]> * t5/mt5 fix Signed-off-by: dimapihtar <[email protected]> * revert configs Signed-off-by: dimapihtar <[email protected]> * revert configs Signed-off-by: dimapihtar <[email protected]> * remove attn_mask_type param Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * change mcore commit Signed-off-by: dimapihtar <[email protected]> * add TE installation Signed-off-by: dimapihtar <[email protected]> * add env var Signed-off-by: dimapihtar <[email protected]> * comment out rope test Signed-off-by: dimapihtar <[email protected]> * change mcore installation Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * change mcore installation Signed-off-by: dimapihtar <[email protected]> * change mcore commit Signed-off-by: dimapihtar <[email protected]> * add -e Signed-off-by: eharper <[email protected]> * revert Signed-off-by: eharper <[email protected]> * revert jenkins test comment Signed-off-by: eharper <[email protected]> --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: eharper <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]>
* mcore upgrade Signed-off-by: dimapihtar <[email protected]> * remove GPTEmbedding import (deprecated) Signed-off-by: dimapihtar <[email protected]> * switch to LanguageModelEmbedding Signed-off-by: dimapihtar <[email protected]> * reset config Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pass attn_mask-type through the forward method Signed-off-by: dimapihtar <[email protected]> * reset conf Signed-off-by: dimapihtar <[email protected]> * add more attn_mask_type params Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attn_,ask_type fixes Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove attn_mask_type param Signed-off-by: dimapihtar <[email protected]> * t5/mt5 fix Signed-off-by: dimapihtar <[email protected]> * revert configs Signed-off-by: dimapihtar <[email protected]> * revert configs Signed-off-by: dimapihtar <[email protected]> * remove attn_mask_type param Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * change mcore commit Signed-off-by: dimapihtar <[email protected]> * add TE installation Signed-off-by: dimapihtar <[email protected]> * add env var Signed-off-by: dimapihtar <[email protected]> * comment out rope test Signed-off-by: dimapihtar <[email protected]> * change mcore installation Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * change mcore installation Signed-off-by: dimapihtar <[email protected]> * change mcore commit Signed-off-by: dimapihtar <[email protected]> * add -e Signed-off-by: eharper <[email protected]> * revert Signed-off-by: eharper <[email protected]> * revert jenkins test comment Signed-off-by: eharper <[email protected]> --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: eharper <[email protected]>
What does this PR do ?
Upgrades NeMo to latest mcore and TE versions.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information