Fix fast-glu activation in change partitions #6909

hsiehjackson · 2023-06-23T03:24:08Z

What does this PR do ?

Change swiglu to all fast glu activation for partition conversion.

Collection: [NLP]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: hsiehjackson <[email protected]>

titu1994 · 2023-06-23T17:34:03Z

examples/nlp/language_modeling/megatron_change_num_partitions.py

@@ -199,7 +199,7 @@ def compute_tp_splits(
    # alias the global index to idx
    idx = global_idx

-    swiglu_activation = 'swiglu' in str(model_cfg.get('activation', '')).lower()
+    fast_glu_activation = str(model_cfg.get('activation', '')).lower() in ['fast-geglu', 'fast-swiglu', 'fast-reglu']


Add swiglu to the list

swiglu didn't use the torch chunk tricks, so we don't need to handle the partition. Only fast_glu_activation need.
https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/modules/common/megatron/mlp.py#L230-L231

From Megatron-LM, they do special handing for swiglu for partition conversion (code) is because they have chunk operation for swiglu in their implementation (code). In NeMo, we also have chunk operation but it only used when our activation is fast_glu_activation (code). Therefore, I change the partition conversion script from swiglu to fast_glu_activation

I explain the reason why the chunk operation needs special handling for partition conversion:

TP=2: GPU0: tensor A [a1, a2] -> chunk to tensor a1 and tensor a2 -> activation(a1) * a2 GPU1: tensor B [b1, b2] -> chunk to tensor b1 and tensor b2 -> activation(b1) * b2 (Wrong) TP = 1 GPU0: tensor C = [a1, a2, b1, b2] (normal TP concatenation) -> chunk to tensor [a1,a2] and tensor [b1,b2] -> activation([a1,a2]) * [b1,b2] (Correct) TP = 1 GPU0: tensor C = [a1, b1, a2, b2] (special handling) -> chunk to tensor [a1,b1] and tensor [a1,b2] -> activation([a1,b1]) * [a2,b2]

Ok makes sense.

examples/nlp/language_modeling/megatron_change_num_partitions.py

titu1994

Thanks for the fix and explanation !

titu1994 · 2023-06-23T19:16:57Z

examples/nlp/language_modeling/megatron_change_num_partitions.py

@@ -199,7 +199,7 @@ def compute_tp_splits(
    # alias the global index to idx
    idx = global_idx

-    swiglu_activation = 'swiglu' in str(model_cfg.get('activation', '')).lower()
+    fast_glu_activation = str(model_cfg.get('activation', '')).lower() in ['fast-geglu', 'fast-swiglu', 'fast-reglu']


Ok makes sense.

* Fix fast-swiglu Signed-off-by: hsiehjackson <[email protected]> * change to all fast glu activation Signed-off-by: hsiehjackson <[email protected]> --------- Signed-off-by: hsiehjackson <[email protected]> Signed-off-by: Sudhakar Singh <[email protected]>

Fix fast-swiglu

ba8336c

Signed-off-by: hsiehjackson <[email protected]>

github-actions bot added the NLP label Jun 23, 2023

hsiehjackson requested a review from titu1994 June 23, 2023 15:21

change to all fast glu activation

78399f4

Signed-off-by: hsiehjackson <[email protected]>

hsiehjackson changed the title ~~Fix fast-swiglu in change partitions~~ Fix fast-glu activation in change partitions Jun 23, 2023

hsiehjackson requested a review from yzhang123 June 23, 2023 17:07

titu1994 requested changes Jun 23, 2023

View reviewed changes

titu1994 approved these changes Jun 23, 2023

View reviewed changes

hsiehjackson merged commit 74cbbb2 into main Jun 23, 2023
15 checks passed

hsiehjackson deleted the fix-change-partitions branch June 23, 2023 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fast-glu activation in change partitions #6909

Fix fast-glu activation in change partitions #6909

hsiehjackson commented Jun 23, 2023 •

edited

Loading

titu1994 Jun 23, 2023

hsiehjackson Jun 23, 2023

hsiehjackson Jun 23, 2023 •

edited

Loading

titu1994 Jun 23, 2023

titu1994 left a comment

titu1994 Jun 23, 2023

Fix fast-glu activation in change partitions #6909

Fix fast-glu activation in change partitions #6909

Conversation

hsiehjackson commented Jun 23, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

titu1994 Jun 23, 2023

Choose a reason for hiding this comment

hsiehjackson Jun 23, 2023

Choose a reason for hiding this comment

hsiehjackson Jun 23, 2023 • edited Loading

Choose a reason for hiding this comment

titu1994 Jun 23, 2023

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 Jun 23, 2023

Choose a reason for hiding this comment

hsiehjackson commented Jun 23, 2023 •

edited

Loading

hsiehjackson Jun 23, 2023 •

edited

Loading