-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to remove bias terms from Megatron transformers #3973
Conversation
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
This pull request introduces 1 alert when merging a53d663 into ef59a86 - view on LGTM.com new alerts:
|
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
This pull request introduces 1 alert when merging 9784ec0 into ef59a86 - view on LGTM.com new alerts:
|
This pull request introduces 4 alerts when merging d506340 into ef59a86 - view on LGTM.com new alerts:
|
This pull request introduces 4 alerts when merging ce455f6 into 1a0575b - view on LGTM.com new alerts:
|
Signed-off-by: MaximumEntropy <[email protected]>
This pull request introduces 4 alerts when merging 77d75d0 into d4408cc - view on LGTM.com new alerts:
|
This pull request introduces 4 alerts when merging c9ffe41 into 0be1e94 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! See minor comment regarding two configs line (if not independent, consider use one?)
@@ -69,6 +69,8 @@ model: | |||
gradient_as_bucket_view: True # Allocate gradients in a contiguous bucket to save memory (less fragmentation and buffer memory) | |||
bias_gelu_fusion: True # Use a kernel that fuses the bias addition from weight matrices with the subsequent gelu activation. | |||
masked_softmax_fusion: True # Use a kernel that fuses the attention softmax with it's mask. | |||
bias_dropout_add_fusion: True # Use a kernel that fuses the bias addition, dropout and residual connection addition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you set the two independently to either True/False?
Signed-off-by: MaximumEntropy <[email protected]>
This pull request introduces 4 alerts when merging fedb582 into e4ee26b - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
This pull request introduces 4 alerts when merging ae39532 into 9005f23 - view on LGTM.com new alerts:
|
What does this PR do ?
Adds to option to remove exclude terms from megatron transformer weight matrices.
Collection: NLP
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information