-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pipelined TP comm overlap available with mcore #8005
Conversation
jenkins |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Signed-off-by: Sangkug Lym <[email protected]>
53ba3f4
to
f4721ca
Compare
jenkins |
@@ -792,12 +792,14 @@ def _validate_and_override_config(self): | |||
) % vp_size == 0, 'Make sure the number of model chunks is the same across all pipeline stages.' | |||
|
|||
if self.cfg.get('ub_tp_comm_overlap', False): | |||
if not self.cfg.get('transformer_engine', False) or not self.cfg.get('sequence_parallel', False): | |||
if self.cfg.get('ub_tp_comm_overlap', False) and not self.cfg.get('sequence_parallel', False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erhoo82 if self.cfg.get('ub_tp_comm_overlap', False)
seems to be redundant in L795 as it already exists in L794.
Signed-off-by: Sangkug Lym <[email protected]>
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
* Make pipelined TP comm overlap available with mcore Signed-off-by: Sangkug Lym <[email protected]> * remove unnecessary condition Signed-off-by: Sangkug Lym <[email protected]> --------- Signed-off-by: Sangkug Lym <[email protected]>
* Make pipelined TP comm overlap available with mcore Signed-off-by: Sangkug Lym <[email protected]> * remove unnecessary condition Signed-off-by: Sangkug Lym <[email protected]> --------- Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]>
* Make pipelined TP comm overlap available with mcore Signed-off-by: Sangkug Lym <[email protected]> * remove unnecessary condition Signed-off-by: Sangkug Lym <[email protected]> --------- Signed-off-by: Sangkug Lym <[email protected]>
What does this PR do ?
Enable the use of pipelined tensor-parallel communication overlap with mcore path.
Removed the condition that enables the use of
ub_tp_comm_overlap
only withtransformer_engine=True
.Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information