Skip to content

M4 leftover for QWen3-VL with MCore vision encoder#2370

Merged
chtruong814 merged 7 commits intoNVIDIA-NeMo:mainfrom
shifangx:shifang/qwen3_vl_m4_leftover
Feb 27, 2026
Merged

M4 leftover for QWen3-VL with MCore vision encoder#2370
chtruong814 merged 7 commits intoNVIDIA-NeMo:mainfrom
shifangx:shifang/qwen3_vl_m4_leftover

Conversation

@shifangx
Copy link
Contributor

@shifangx shifangx commented Feb 13, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

!1943 built vision mode classl with MCore, and defined a new rope function. But it did not pass pg_collection or pg groups into the new defined classe and function.

This pr is used to make sure pg_collection and pg groups are passed to any submodule or functions of QWen3-VL.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shifangx
Copy link
Contributor Author

Hi, @yaoyu-33, can you help to review this pr?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

The changes propagate a process group collection (pg_collection) parameter through the Qwen3VL model hierarchy, enabling distributed parallel components (vision model, transformer blocks, rotary embeddings) to access context-parallel and tensor-parallel groups directly instead of via global parallel state queries.

Changes

Cohort / File(s) Summary
Model Initialization with Process Group Collection
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py, src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/vision_model.py, src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py
Introduces optional pg_collection parameter to Qwen3VLModel and Qwen3VLVisionTransformerBlock; extracts tp_group, cp_group, and pp_group from pg_collection; forwards pg_collection through the model component hierarchy to vision and transformer blocks.
Rotary Embedding Context-Parallel Integration
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py, src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py
Adds required cp_group parameter to Qwen3VLMultimodalRotaryEmbedding and replaces parallel_state.get_context_parallel_group() calls with self.cp_group; text_model passes cp_group from pg_collection.cp during embedding instantiation.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant Qwen3VLModel
    participant Qwen3VLVisionModel
    participant Qwen3VLVisionTransformerBlock
    participant Qwen3VLGPTModel
    participant Qwen3VLMultimodalRotaryEmbedding

    Caller->>Qwen3VLModel: __init__(pg_collection)
    Qwen3VLModel->>Qwen3VLVisionModel: __init__(pg_collection=pg_collection)
    Qwen3VLVisionModel->>Qwen3VLVisionTransformerBlock: __init__(pg_collection=pg_collection)
    Qwen3VLVisionTransformerBlock->>Qwen3VLVisionTransformerBlock: extract tp_group, cp_group, pp_group
    
    Qwen3VLModel->>Qwen3VLGPTModel: __init__()
    Qwen3VLGPTModel->>Qwen3VLMultimodalRotaryEmbedding: __init__(cp_group=self.pg_collection.cp)
    Qwen3VLMultimodalRotaryEmbedding->>Qwen3VLMultimodalRotaryEmbedding: assert cp_group is not None
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

r0.3.0

Suggested reviewers

  • malay-nagda
  • erhoo82
  • ko3n1g
🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (75 files):

⚔️ 3rdparty/Megatron-LM (content)
⚔️ docs/models/vlm/qwen3-vl.md (content)
⚔️ docs/training/multi-token-prediction.md (content)
⚔️ docs/training/resiliency.md (content)
⚔️ examples/conversion/create_hf_toy_model.py (content)
⚔️ examples/conversion/hf_to_megatron_generate_text.py (content)
⚔️ examples/conversion/hf_to_megatron_generate_vlm.py (content)
⚔️ examples/models/vlm/gemma3_vl/peft.sh (content)
⚔️ examples/models/vlm/gemma3_vl/sft.sh (content)
⚔️ examples/models/vlm/glm_45v/inference.sh (content)
⚔️ examples/models/vlm/glm_45v/slurm_peft.sh (content)
⚔️ examples/models/vlm/glm_45v/slurm_sft.sh (content)
⚔️ examples/models/vlm/ministral3/conversion.sh (content)
⚔️ examples/models/vlm/ministral3/inference.sh (content)
⚔️ examples/models/vlm/ministral3/peft.sh (content)
⚔️ examples/models/vlm/ministral3/sft.sh (content)
⚔️ examples/resiliency/fault_tolerance/fault_tolerance_example.py (content)
⚔️ examples/resiliency/fault_tolerance/run_fault_tolerance.sh (content)
⚔️ examples/rl/rlhf_with_bridge.py (content)
⚔️ pyproject.toml (content)
⚔️ scripts/performance/configs/qwen/qwen3_workload_base_configs.py (content)
⚔️ scripts/performance/setup_experiment.py (content)
⚔️ scripts/performance/utils/overrides.py (content)
⚔️ scripts/run_ci_tests.sh (content)
⚔️ src/megatron/bridge/data/builders/finetuning_dataset.py (content)
⚔️ src/megatron/bridge/data/loaders.py (content)
⚔️ src/megatron/bridge/models/gemma_vl/gemma3_vl_bridge.py (content)
⚔️ src/megatron/bridge/models/nemotron_vl/nemotron_vl_bridge.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/__init__.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/vision_model.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/qwen25_vl_bridge.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py (content)
⚔️ src/megatron/bridge/models/qwen_vl/qwen3_vl_provider.py (content)
⚔️ src/megatron/bridge/recipes/deepseek/deepseek_v3.py (content)
⚔️ src/megatron/bridge/recipes/gemma/gemma2.py (content)
⚔️ src/megatron/bridge/recipes/gemma/gemma3.py (content)
⚔️ src/megatron/bridge/recipes/glm/glm45.py (content)
⚔️ src/megatron/bridge/recipes/gpt_oss/gpt_oss.py (content)
⚔️ src/megatron/bridge/recipes/kimi/kimi_k2.py (content)
⚔️ src/megatron/bridge/recipes/llama/llama3.py (content)
⚔️ src/megatron/bridge/recipes/moonlight/moonlight_16b.py (content)
⚔️ src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py (content)
⚔️ src/megatron/bridge/recipes/nemotronh/nemotronh.py (content)
⚔️ src/megatron/bridge/recipes/olmoe/olmoe_7b.py (content)
⚔️ src/megatron/bridge/recipes/qwen/qwen2.py (content)
⚔️ src/megatron/bridge/recipes/qwen/qwen3.py (content)
⚔️ src/megatron/bridge/recipes/qwen/qwen3_moe.py (content)
⚔️ src/megatron/bridge/recipes/qwen/qwen3_next.py (content)
⚔️ src/megatron/bridge/recipes/utils/finetune_utils.py (content)
⚔️ src/megatron/bridge/training/config.py (content)
⚔️ src/megatron/bridge/training/eval.py (content)
⚔️ src/megatron/bridge/training/fault_tolerance.py (content)
⚔️ src/megatron/bridge/training/initialize.py (content)
⚔️ src/megatron/bridge/training/profiling.py (content)
⚔️ src/megatron/bridge/training/tokenizers/config.py (content)
⚔️ src/megatron/bridge/training/tokenizers/tokenizer.py (content)
⚔️ src/megatron/bridge/training/train.py (content)
⚔️ tests/functional_tests/L2_Launch_training.sh (content)
⚔️ tests/functional_tests/data/builders/test_hf_dataset.py (content)
⚔️ tests/functional_tests/data/datasets/test_chat_template.py (content)
⚔️ tests/functional_tests/data/datasets/test_sft.py (content)
⚔️ tests/functional_tests/data/test_utils.py (content)
⚔️ tests/unit_tests/data/datasets/test_chat_template.py (content)
⚔️ tests/unit_tests/models/gemma_vl/test_gemma3_vl_bridge.py (content)
⚔️ tests/unit_tests/models/nemotron_vl/test_nemotron_vl_bridge.py (content)
⚔️ tests/unit_tests/models/qwen_vl/test_qwen25_vl_bridge.py (content)
⚔️ tests/unit_tests/models/qwen_vl/test_qwen25_vl_provider.py (content)
⚔️ tests/unit_tests/recipes/test_qwen_recipes.py (content)
⚔️ tests/unit_tests/training/test_tokenizer.py (content)
⚔️ tutorials/training/reduced_precision_training.ipynb (content)
⚔️ uv.lock (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
Test Results For Major Changes ⚠️ Warning PR lacks test results and testing documentation for infrastructure-level changes to distributed training components despite comprehensive existing test suites being available. Execute existing unit/functional test suites, fix identified type annotation errors (ProcessGroupCollection vs torch.distributed.ProcessGroup), add null guards, document CI results in PR.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title refers to 'M4 leftover' and passes pg_collection to QWen3-VL vision model components, which matches the core change across multiple files: propagating process group collections through the vision model hierarchy.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/vision_model.py (1)

51-51: ⚠️ Potential issue | 🟠 Major

Same incorrect type annotation as in transformer_block.py.

Should be ProcessGroupCollection | None (or just ProcessGroupCollection if made required). See the comment on transformer_block.py Line 71.

Proposed fix
-        pg_collection: Optional[torch.distributed.ProcessGroup] = None,
+        pg_collection: ProcessGroupCollection | None = None,

Also add the missing import:

from megatron.core.process_groups_config import ProcessGroupCollection
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py (1)

20-20: ⚠️ Potential issue | 🟡 Minor

Remove unused parallel_state import on line 20.

The parallel_state module is imported but never referenced in the file. With the codebase using self.cp_group instead, this import is no longer needed.

🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py`:
- Line 71: The pg_collection parameter in transformer_block.py (and the similar
parameter in vision_model.py) is incorrectly annotated as
Optional[torch.distributed.ProcessGroup]; update the type annotation to
Optional[ProcessGroupCollection] (the imported class) so attributes .cp, .tp,
.pp are valid and to remove the unused-import lint; change the annotation on the
function/method signature(s) that declare pg_collection and adjust any imports
if necessary to reference ProcessGroupCollection unambiguously.
- Around line 84-87: The constructor currently assigns self.pg_collection =
pg_collection and immediately dereferences pg_collection.cp / .tp / .pp into
self.cp_group, self.tp_group, self.pp_group without checking for None; make
pg_collection required or add a null-guard: either remove the default None so
pg_collection must be passed, or validate at the top of the constructor (e.g.,
if pg_collection is None: raise ValueError("pg_collection must be provided for
TransformerBlock")) before accessing pg_collection.cp / pg_collection.tp /
pg_collection.pp — update the code that sets self.cp_group, self.tp_group,
self.pp_group accordingly.
- Line 514: Qwen3VLTransformerBlock uses self.tp_group but lacks an __init__
that initializes it from pg_collection; add an __init__(self, config, spec,
pre_process=True, post_process=True, vp_stage=None, pg_collection=None) that
calls super().__init__(...) with the same args and then sets self.tp_group =
pg_collection.tp and self.pp_group = pg_collection.pp (mirror the pattern in
Qwen3VLVisionTransformerBlock) so tp_group/pp_group exist before use.
🧹 Nitpick comments (1)
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py (1)

55-55: cp_group is effectively required but typed with a None default.

The assertion on Line 73 enforces that cp_group must not be None, making the = None default misleading. Consider removing the default to make the API contract explicit.

Proposed fix
-        cp_group: torch.distributed.ProcessGroup = None,
+        cp_group: torch.distributed.ProcessGroup,

Also applies to: 73-74

@shifangx shifangx changed the title qwen3-vl m4 leftover after pr1943 m4 leftover for QWen3-VL Feb 13, 2026
@shifangx shifangx changed the title m4 leftover for QWen3-VL M4 leftover for QWen3-VL Feb 13, 2026
@shifangx shifangx force-pushed the shifang/qwen3_vl_m4_leftover branch from 7ed06e2 to 5c3ed06 Compare February 14, 2026 11:43
@shifangx
Copy link
Contributor Author

/ok to test d75dac9

@shifangx
Copy link
Contributor Author

/ok to test e4c5bc6

@shifangx
Copy link
Contributor Author

/ok to test c200198

@shifangx shifangx force-pushed the shifang/qwen3_vl_m4_leftover branch from c200198 to a456f67 Compare February 14, 2026 12:32
@shifangx
Copy link
Contributor Author

/ok to test badc17a

@shifangx
Copy link
Contributor Author

/ok to test 03a123f

@shifangx shifangx changed the title M4 leftover for QWen3-VL M4 leftover for QWen3-VL after !1943 merged Feb 26, 2026
Signed-off-by: Shifang Xu <shifangx@nvidia.com>
@shifangx
Copy link
Contributor Author

/ok to test f33487a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants