-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fast-glu activation in change partitions #6909
Conversation
Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: hsiehjackson <[email protected]>
@@ -199,7 +199,7 @@ def compute_tp_splits( | |||
# alias the global index to idx | |||
idx = global_idx | |||
|
|||
swiglu_activation = 'swiglu' in str(model_cfg.get('activation', '')).lower() | |||
fast_glu_activation = str(model_cfg.get('activation', '')).lower() in ['fast-geglu', 'fast-swiglu', 'fast-reglu'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add swiglu to the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swiglu
didn't use the torch chunk tricks, so we don't need to handle the partition. Only fast_glu_activation
need.
https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/modules/common/megatron/mlp.py#L230-L231
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Megatron-LM
, they do special handing for swiglu
for partition conversion (code) is because they have chunk operation for swiglu
in their implementation (code). In NeMo, we also have chunk operation but it only used when our activation is fast_glu_activation
(code). Therefore, I change the partition conversion script from swiglu
to fast_glu_activation
I explain the reason why the chunk operation needs special handling for partition conversion:
TP=2:
GPU0: tensor A [a1, a2] -> chunk to tensor a1 and tensor a2 -> activation(a1) * a2
GPU1: tensor B [b1, b2] -> chunk to tensor b1 and tensor b2 -> activation(b1) * b2
(Wrong) TP = 1
GPU0: tensor C = [a1, a2, b1, b2] (normal TP concatenation) -> chunk to tensor [a1,a2] and tensor [b1,b2] -> activation([a1,a2]) * [b1,b2]
(Correct) TP = 1
GPU0: tensor C = [a1, b1, a2, b2] (special handling) -> chunk to tensor [a1,b1] and tensor [a1,b2] -> activation([a1,b1]) * [a2,b2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix and explanation !
@@ -199,7 +199,7 @@ def compute_tp_splits( | |||
# alias the global index to idx | |||
idx = global_idx | |||
|
|||
swiglu_activation = 'swiglu' in str(model_cfg.get('activation', '')).lower() | |||
fast_glu_activation = str(model_cfg.get('activation', '')).lower() in ['fast-geglu', 'fast-swiglu', 'fast-reglu'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense.
* Fix fast-swiglu Signed-off-by: hsiehjackson <[email protected]> * change to all fast glu activation Signed-off-by: hsiehjackson <[email protected]> --------- Signed-off-by: hsiehjackson <[email protected]> Signed-off-by: Sudhakar Singh <[email protected]>
* Fix fast-swiglu Signed-off-by: hsiehjackson <[email protected]> * change to all fast glu activation Signed-off-by: hsiehjackson <[email protected]> --------- Signed-off-by: hsiehjackson <[email protected]> Signed-off-by: Sudhakar Singh <[email protected]>
What does this PR do ?
Change
swiglu
to all fast glu activation for partition conversion.Collection: [NLP]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information