M4 leftover for QWen3-VL with MCore vision encoder#2370
M4 leftover for QWen3-VL with MCore vision encoder#2370chtruong814 merged 7 commits intoNVIDIA-NeMo:mainfrom
Conversation
|
Hi, @yaoyu-33, can you help to review this pr? |
📝 WalkthroughWalkthroughThe changes propagate a process group collection (pg_collection) parameter through the Qwen3VL model hierarchy, enabling distributed parallel components (vision model, transformer blocks, rotary embeddings) to access context-parallel and tensor-parallel groups directly instead of via global parallel state queries. Changes
Sequence DiagramsequenceDiagram
actor Caller
participant Qwen3VLModel
participant Qwen3VLVisionModel
participant Qwen3VLVisionTransformerBlock
participant Qwen3VLGPTModel
participant Qwen3VLMultimodalRotaryEmbedding
Caller->>Qwen3VLModel: __init__(pg_collection)
Qwen3VLModel->>Qwen3VLVisionModel: __init__(pg_collection=pg_collection)
Qwen3VLVisionModel->>Qwen3VLVisionTransformerBlock: __init__(pg_collection=pg_collection)
Qwen3VLVisionTransformerBlock->>Qwen3VLVisionTransformerBlock: extract tp_group, cp_group, pp_group
Qwen3VLModel->>Qwen3VLGPTModel: __init__()
Qwen3VLGPTModel->>Qwen3VLMultimodalRotaryEmbedding: __init__(cp_group=self.pg_collection.cp)
Qwen3VLMultimodalRotaryEmbedding->>Qwen3VLMultimodalRotaryEmbedding: assert cp_group is not None
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/vision_model.py (1)
51-51:⚠️ Potential issue | 🟠 MajorSame incorrect type annotation as in
transformer_block.py.Should be
ProcessGroupCollection | None(or justProcessGroupCollectionif made required). See the comment ontransformer_block.pyLine 71.Proposed fix
- pg_collection: Optional[torch.distributed.ProcessGroup] = None, + pg_collection: ProcessGroupCollection | None = None,Also add the missing import:
from megatron.core.process_groups_config import ProcessGroupCollectionsrc/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py (1)
20-20:⚠️ Potential issue | 🟡 MinorRemove unused
parallel_stateimport on line 20.The
parallel_statemodule is imported but never referenced in the file. With the codebase usingself.cp_groupinstead, this import is no longer needed.
🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py`:
- Line 71: The pg_collection parameter in transformer_block.py (and the similar
parameter in vision_model.py) is incorrectly annotated as
Optional[torch.distributed.ProcessGroup]; update the type annotation to
Optional[ProcessGroupCollection] (the imported class) so attributes .cp, .tp,
.pp are valid and to remove the unused-import lint; change the annotation on the
function/method signature(s) that declare pg_collection and adjust any imports
if necessary to reference ProcessGroupCollection unambiguously.
- Around line 84-87: The constructor currently assigns self.pg_collection =
pg_collection and immediately dereferences pg_collection.cp / .tp / .pp into
self.cp_group, self.tp_group, self.pp_group without checking for None; make
pg_collection required or add a null-guard: either remove the default None so
pg_collection must be passed, or validate at the top of the constructor (e.g.,
if pg_collection is None: raise ValueError("pg_collection must be provided for
TransformerBlock")) before accessing pg_collection.cp / pg_collection.tp /
pg_collection.pp — update the code that sets self.cp_group, self.tp_group,
self.pp_group accordingly.
- Line 514: Qwen3VLTransformerBlock uses self.tp_group but lacks an __init__
that initializes it from pg_collection; add an __init__(self, config, spec,
pre_process=True, post_process=True, vp_stage=None, pg_collection=None) that
calls super().__init__(...) with the same args and then sets self.tp_group =
pg_collection.tp and self.pp_group = pg_collection.pp (mirror the pattern in
Qwen3VLVisionTransformerBlock) so tp_group/pp_group exist before use.
🧹 Nitpick comments (1)
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py (1)
55-55:cp_groupis effectively required but typed with aNonedefault.The assertion on Line 73 enforces that
cp_groupmust not beNone, making the= Nonedefault misleading. Consider removing the default to make the API contract explicit.Proposed fix
- cp_group: torch.distributed.ProcessGroup = None, + cp_group: torch.distributed.ProcessGroup,Also applies to: 73-74
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py
Outdated
Show resolved
Hide resolved
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py
Show resolved
Hide resolved
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py
Show resolved
Hide resolved
7ed06e2 to
5c3ed06
Compare
|
/ok to test d75dac9 |
d75dac9 to
9ac3b0d
Compare
|
/ok to test e4c5bc6 |
e4c5bc6 to
0c6c01e
Compare
|
/ok to test c200198 |
c200198 to
a456f67
Compare
|
/ok to test badc17a |
1d34e0b to
d3e503f
Compare
|
/ok to test 03a123f |
Signed-off-by: Shifang Xu <shifangx@nvidia.com>
|
/ok to test f33487a |
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
!1943 built vision mode classl with MCore, and defined a new rope function. But it did not pass pg_collection or pg groups into the new defined classe and function.
This pr is used to make sure pg_collection and pg groups are passed to any submodule or functions of QWen3-VL.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information