Conversation
bfce1c9 to
98d5fcf
Compare
| self.dp_cp_group = self.pg_collection.dp_cp | ||
| self.pp_group = self.pg_collection.pp | ||
| from megatron.core.pipeline_parallel.p2p_communication import P2PCommunicator | ||
| self.p2p_communicator = P2PCommunicator(pp_group=self.pp_group, config=self.config) |
There was a problem hiding this comment.
what is this self.config? looks like p2p communicator needs ModelParallelConfig, not TransformerConfig
There was a problem hiding this comment.
The type of this config is megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLMoEModelProvider.
There was a problem hiding this comment.
isnt this cuda graphs file agnostic to a particular model. I'm wondering how ould this be related to qwen3vlrprovider, also given this is megatron LM and not bridge
There was a problem hiding this comment.
This pr is not for particular model, it is for M4.
In order to support M4, we need use pg groups in pg_collection everywhere, including in this cuda_graph file.
There was a problem hiding this comment.
Yeah I get that. My question/confusion was what's the type of config we pass here to P2Pcommunicator. Is it TransformerConfig or ModelParallelConfig
14c15b0 to
844848b
Compare
5210c51 to
9e676af
Compare
3ba536f to
d470fca
Compare
|
cuda graph is very helpful for QWen3-VL training performance, and many customers want to try training QWen3-VL with both cuda graph and M4. Hi, @jiemingz, I have addressed reviewer's comments. Can you help to review my update? If @jiemingz is out of office for Chinese New Year, is there any other expert can help to review this pr? @NVIDIA/core-adlr @NVIDIA/mcore-oncall |
|
/ok to test 81d8781 |
|
/ok to test 9dc24c7 |
9dc24c7 to
cdc9ca2
Compare
|
/ok to test 2c8b58e |
|
/ok to test ad6d2d4 |
|
Hi, @yashaswikarnati , do you have any more comments? |
|
Hi, @NVIDIA/mcore-oncall , can you help to review this pr? |
|
/ok to test b714709 |
What does this PR do ?
fix this issue #3135
Global pg groups, such as _TENSOR_MODEL_PARALLEL_GROUP , in parallel_state.py will be deprecated after M4.
So we need pass pg_collection while init TECudaGraphHelper, and use pg group in pg_collection later.
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.