Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fast-glu activation in change partitions #6909

Merged
merged 2 commits into from
Jun 23, 2023
Merged

Conversation

hsiehjackson
Copy link
Collaborator

@hsiehjackson hsiehjackson commented Jun 23, 2023

What does this PR do ?

Change swiglu to all fast glu activation for partition conversion.

Collection: [NLP]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: hsiehjackson <[email protected]>
@hsiehjackson hsiehjackson changed the title Fix fast-swiglu in change partitions Fix fast-glu activation in change partitions Jun 23, 2023
@@ -199,7 +199,7 @@ def compute_tp_splits(
# alias the global index to idx
idx = global_idx

swiglu_activation = 'swiglu' in str(model_cfg.get('activation', '')).lower()
fast_glu_activation = str(model_cfg.get('activation', '')).lower() in ['fast-geglu', 'fast-swiglu', 'fast-reglu']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add swiglu to the list

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swiglu didn't use the torch chunk tricks, so we don't need to handle the partition. Only fast_glu_activation need.
https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/modules/common/megatron/mlp.py#L230-L231

Copy link
Collaborator Author

@hsiehjackson hsiehjackson Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Megatron-LM, they do special handing for swiglu for partition conversion (code) is because they have chunk operation for swiglu in their implementation (code). In NeMo, we also have chunk operation but it only used when our activation is fast_glu_activation (code). Therefore, I change the partition conversion script from swiglu to fast_glu_activation

I explain the reason why the chunk operation needs special handling for partition conversion:

TP=2: 
    GPU0: tensor A [a1, a2] -> chunk to tensor a1 and tensor a2 -> activation(a1) * a2 
    GPU1: tensor B [b1, b2] -> chunk to tensor b1 and tensor b2 -> activation(b1) * b2 
(Wrong) TP = 1 
    GPU0: tensor C = [a1, a2, b1, b2] (normal TP concatenation) -> chunk to tensor [a1,a2] and tensor [b1,b2] -> activation([a1,a2]) * [b1,b2]
(Correct) TP = 1
    GPU0: tensor C = [a1, b1, a2, b2] (special handling) -> chunk to tensor [a1,b1] and tensor [a1,b2] -> activation([a1,b1]) * [a2,b2]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense.

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and explanation !

@@ -199,7 +199,7 @@ def compute_tp_splits(
# alias the global index to idx
idx = global_idx

swiglu_activation = 'swiglu' in str(model_cfg.get('activation', '')).lower()
fast_glu_activation = str(model_cfg.get('activation', '')).lower() in ['fast-geglu', 'fast-swiglu', 'fast-reglu']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense.

@hsiehjackson hsiehjackson merged commit 74cbbb2 into main Jun 23, 2023
15 checks passed
@hsiehjackson hsiehjackson deleted the fix-change-partitions branch June 23, 2023 19:19
sudhakarsingh27 pushed a commit that referenced this pull request Jun 27, 2023
* Fix fast-swiglu

Signed-off-by: hsiehjackson <[email protected]>

* change to all fast glu activation

Signed-off-by: hsiehjackson <[email protected]>

---------

Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
sudhakarsingh27 pushed a commit that referenced this pull request Jun 27, 2023
* Fix fast-swiglu

Signed-off-by: hsiehjackson <[email protected]>

* change to all fast glu activation

Signed-off-by: hsiehjackson <[email protected]>

---------

Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants