M4 leftover for QWen3-VL with MCore vision encoder by shifangx · Pull Request #2370 · NVIDIA-NeMo/Megatron-Bridge

shifangx · 2026-02-13T12:56:12Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

!1943 built vision mode classl with MCore, and defined a new rope function. But it did not pass pg_collection or pg groups into the new defined classe and function.

This pr is used to make sure pg_collection and pg groups are passed to any submodule or functions of QWen3-VL.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

copy-pr-bot · 2026-02-13T12:56:16Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

shifangx · 2026-02-13T13:01:35Z

Hi, @yaoyu-33, can you help to review this pr?

coderabbitai · 2026-02-13T13:02:11Z

📝 Walkthrough

Walkthrough

The changes propagate a process group collection (pg_collection) parameter through the Qwen3VL model hierarchy, enabling distributed parallel components (vision model, transformer blocks, rotary embeddings) to access context-parallel and tensor-parallel groups directly instead of via global parallel state queries.

Changes

Cohort / File(s)	Summary
Model Initialization with Process Group Collection `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py`, `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/vision_model.py`, `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py`	Introduces optional `pg_collection` parameter to Qwen3VLModel and Qwen3VLVisionTransformerBlock; extracts `tp_group`, `cp_group`, and `pp_group` from pg_collection; forwards pg_collection through the model component hierarchy to vision and transformer blocks.
Rotary Embedding Context-Parallel Integration `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py`, `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py`	Adds required `cp_group` parameter to Qwen3VLMultimodalRotaryEmbedding and replaces parallel_state.get_context_parallel_group() calls with self.cp_group; text_model passes cp_group from pg_collection.cp during embedding instantiation.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant Qwen3VLModel
    participant Qwen3VLVisionModel
    participant Qwen3VLVisionTransformerBlock
    participant Qwen3VLGPTModel
    participant Qwen3VLMultimodalRotaryEmbedding

    Caller->>Qwen3VLModel: __init__(pg_collection)
    Qwen3VLModel->>Qwen3VLVisionModel: __init__(pg_collection=pg_collection)
    Qwen3VLVisionModel->>Qwen3VLVisionTransformerBlock: __init__(pg_collection=pg_collection)
    Qwen3VLVisionTransformerBlock->>Qwen3VLVisionTransformerBlock: extract tp_group, cp_group, pp_group
    
    Qwen3VLModel->>Qwen3VLGPTModel: __init__()
    Qwen3VLGPTModel->>Qwen3VLMultimodalRotaryEmbedding: __init__(cp_group=self.pg_collection.cp)
    Qwen3VLMultimodalRotaryEmbedding->>Qwen3VLMultimodalRotaryEmbedding: assert cp_group is not None

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Support qwen3-vl for THD format and CP #1943: Modifies the same Qwen3VL codepaths by adding and propagating pg_collection and cp_group through model components with identical pattern.
M4: add pg_collection into setup and wire into train.py #1062: Introduces pg_collection ProcessGroupCollection exposure in global state setup that this PR consumes and forwards into model components.
[M4] feat: Add M4 end2end qwen3_vl example #2117: Modifies Qwen3VLModel's constructor to accept and forward pg_collection to submodules using the same propagation pattern.

Suggested labels

r0.3.0

Suggested reviewers

malay-nagda
erhoo82
ko3n1g

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (75 files): ⚔️ `3rdparty/Megatron-LM` (content) ⚔️ `docs/models/vlm/qwen3-vl.md` (content) ⚔️ `docs/training/multi-token-prediction.md` (content) ⚔️ `docs/training/resiliency.md` (content) ⚔️ `examples/conversion/create_hf_toy_model.py` (content) ⚔️ `examples/conversion/hf_to_megatron_generate_text.py` (content) ⚔️ `examples/conversion/hf_to_megatron_generate_vlm.py` (content) ⚔️ `examples/models/vlm/gemma3_vl/peft.sh` (content) ⚔️ `examples/models/vlm/gemma3_vl/sft.sh` (content) ⚔️ `examples/models/vlm/glm_45v/inference.sh` (content) ⚔️ `examples/models/vlm/glm_45v/slurm_peft.sh` (content) ⚔️ `examples/models/vlm/glm_45v/slurm_sft.sh` (content) ⚔️ `examples/models/vlm/ministral3/conversion.sh` (content) ⚔️ `examples/models/vlm/ministral3/inference.sh` (content) ⚔️ `examples/models/vlm/ministral3/peft.sh` (content) ⚔️ `examples/models/vlm/ministral3/sft.sh` (content) ⚔️ `examples/resiliency/fault_tolerance/fault_tolerance_example.py` (content) ⚔️ `examples/resiliency/fault_tolerance/run_fault_tolerance.sh` (content) ⚔️ `examples/rl/rlhf_with_bridge.py` (content) ⚔️ `pyproject.toml` (content) ⚔️ `scripts/performance/configs/qwen/qwen3_workload_base_configs.py` (content) ⚔️ `scripts/performance/setup_experiment.py` (content) ⚔️ `scripts/performance/utils/overrides.py` (content) ⚔️ `scripts/run_ci_tests.sh` (content) ⚔️ `src/megatron/bridge/data/builders/finetuning_dataset.py` (content) ⚔️ `src/megatron/bridge/data/loaders.py` (content) ⚔️ `src/megatron/bridge/models/gemma_vl/gemma3_vl_bridge.py` (content) ⚔️ `src/megatron/bridge/models/nemotron_vl/nemotron_vl_bridge.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/__init__.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/vision_model.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/qwen25_vl_bridge.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py` (content) ⚔️ `src/megatron/bridge/models/qwen_vl/qwen3_vl_provider.py` (content) ⚔️ `src/megatron/bridge/recipes/deepseek/deepseek_v3.py` (content) ⚔️ `src/megatron/bridge/recipes/gemma/gemma2.py` (content) ⚔️ `src/megatron/bridge/recipes/gemma/gemma3.py` (content) ⚔️ `src/megatron/bridge/recipes/glm/glm45.py` (content) ⚔️ `src/megatron/bridge/recipes/gpt_oss/gpt_oss.py` (content) ⚔️ `src/megatron/bridge/recipes/kimi/kimi_k2.py` (content) ⚔️ `src/megatron/bridge/recipes/llama/llama3.py` (content) ⚔️ `src/megatron/bridge/recipes/moonlight/moonlight_16b.py` (content) ⚔️ `src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py` (content) ⚔️ `src/megatron/bridge/recipes/nemotronh/nemotronh.py` (content) ⚔️ `src/megatron/bridge/recipes/olmoe/olmoe_7b.py` (content) ⚔️ `src/megatron/bridge/recipes/qwen/qwen2.py` (content) ⚔️ `src/megatron/bridge/recipes/qwen/qwen3.py` (content) ⚔️ `src/megatron/bridge/recipes/qwen/qwen3_moe.py` (content) ⚔️ `src/megatron/bridge/recipes/qwen/qwen3_next.py` (content) ⚔️ `src/megatron/bridge/recipes/utils/finetune_utils.py` (content) ⚔️ `src/megatron/bridge/training/config.py` (content) ⚔️ `src/megatron/bridge/training/eval.py` (content) ⚔️ `src/megatron/bridge/training/fault_tolerance.py` (content) ⚔️ `src/megatron/bridge/training/initialize.py` (content) ⚔️ `src/megatron/bridge/training/profiling.py` (content) ⚔️ `src/megatron/bridge/training/tokenizers/config.py` (content) ⚔️ `src/megatron/bridge/training/tokenizers/tokenizer.py` (content) ⚔️ `src/megatron/bridge/training/train.py` (content) ⚔️ `tests/functional_tests/L2_Launch_training.sh` (content) ⚔️ `tests/functional_tests/data/builders/test_hf_dataset.py` (content) ⚔️ `tests/functional_tests/data/datasets/test_chat_template.py` (content) ⚔️ `tests/functional_tests/data/datasets/test_sft.py` (content) ⚔️ `tests/functional_tests/data/test_utils.py` (content) ⚔️ `tests/unit_tests/data/datasets/test_chat_template.py` (content) ⚔️ `tests/unit_tests/models/gemma_vl/test_gemma3_vl_bridge.py` (content) ⚔️ `tests/unit_tests/models/nemotron_vl/test_nemotron_vl_bridge.py` (content) ⚔️ `tests/unit_tests/models/qwen_vl/test_qwen25_vl_bridge.py` (content) ⚔️ `tests/unit_tests/models/qwen_vl/test_qwen25_vl_provider.py` (content) ⚔️ `tests/unit_tests/recipes/test_qwen_recipes.py` (content) ⚔️ `tests/unit_tests/training/test_tokenizer.py` (content) ⚔️ `tutorials/training/reduced_precision_training.ipynb` (content) ⚔️ `uv.lock` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.
Test Results For Major Changes	⚠️ Warning	PR lacks test results and testing documentation for infrastructure-level changes to distributed training components despite comprehensive existing test suites being available.	Execute existing unit/functional test suites, fix identified type annotation errors (ProcessGroupCollection vs torch.distributed.ProcessGroup), add null guards, document CI results in PR.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title refers to 'M4 leftover' and passes pg_collection to QWen3-VL vision model components, which matches the core change across multiple files: propagating process group collections through the vision model hierarchy.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/vision_model.py (1)
51-51: ⚠️ Potential issue | 🟠 Major

Same incorrect type annotation as in transformer_block.py.

Should be ProcessGroupCollection | None (or just ProcessGroupCollection if made required). See the comment on transformer_block.py Line 71.
Proposed fix
-        pg_collection: Optional[torch.distributed.ProcessGroup] = None,
+        pg_collection: ProcessGroupCollection | None = None,
Also add the missing import:
from megatron.core.process_groups_config import ProcessGroupCollection
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py (1)

20-20: ⚠️ Potential issue | 🟡 Minor

Remove unused parallel_state import on line 20.

The parallel_state module is imported but never referenced in the file. With the codebase using self.cp_group instead, this import is no longer needed.

🤖 Fix all issues with AI agents

In `@src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py`:
- Line 71: The pg_collection parameter in transformer_block.py (and the similar
parameter in vision_model.py) is incorrectly annotated as
Optional[torch.distributed.ProcessGroup]; update the type annotation to
Optional[ProcessGroupCollection] (the imported class) so attributes .cp, .tp,
.pp are valid and to remove the unused-import lint; change the annotation on the
function/method signature(s) that declare pg_collection and adjust any imports
if necessary to reference ProcessGroupCollection unambiguously.
- Around line 84-87: The constructor currently assigns self.pg_collection =
pg_collection and immediately dereferences pg_collection.cp / .tp / .pp into
self.cp_group, self.tp_group, self.pp_group without checking for None; make
pg_collection required or add a null-guard: either remove the default None so
pg_collection must be passed, or validate at the top of the constructor (e.g.,
if pg_collection is None: raise ValueError("pg_collection must be provided for
TransformerBlock")) before accessing pg_collection.cp / pg_collection.tp /
pg_collection.pp — update the code that sets self.cp_group, self.tp_group,
self.pp_group accordingly.
- Line 514: Qwen3VLTransformerBlock uses self.tp_group but lacks an __init__
that initializes it from pg_collection; add an __init__(self, config, spec,
pre_process=True, post_process=True, vp_stage=None, pg_collection=None) that
calls super().__init__(...) with the same args and then sets self.tp_group =
pg_collection.tp and self.pp_group = pg_collection.pp (mirror the pattern in
Qwen3VLVisionTransformerBlock) so tp_group/pp_group exist before use.

🧹 Nitpick comments (1)

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py (1)
55-55: cp_group is effectively required but typed with a None default.

The assertion on Line 73 enforces that cp_group must not be None, making the = None default misleading. Consider removing the default to make the API contract explicit.
Proposed fix
-        cp_group: torch.distributed.ProcessGroup = None,
+        cp_group: torch.distributed.ProcessGroup,
Also applies to: 73-74

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py

shifangx · 2026-02-14T11:46:33Z

/ok to test d75dac9

shifangx · 2026-02-14T11:51:02Z

/ok to test e4c5bc6

shifangx · 2026-02-14T11:55:04Z

/ok to test c200198

shifangx · 2026-02-14T12:36:52Z

/ok to test badc17a

shifangx · 2026-02-25T03:30:47Z

/ok to test 03a123f

Signed-off-by: Shifang Xu <shifangx@nvidia.com>

shifangx · 2026-02-26T02:32:53Z

/ok to test f33487a

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

shifangx changed the title ~~qwen3-vl m4 leftover after pr1943~~ m4 leftover for QWen3-VL Feb 13, 2026

shifangx changed the title ~~m4 leftover for QWen3-VL~~ M4 leftover for QWen3-VL Feb 13, 2026

shifangx mentioned this pull request Feb 14, 2026

Adding CUDA Graph Support for Vision Encoder #2334

Open

5 tasks

shifangx force-pushed the shifang/qwen3_vl_m4_leftover branch from 7ed06e2 to 5c3ed06 Compare February 14, 2026 11:43

copy-pr-bot bot had a problem deploying to nemo-ci February 14, 2026 11:46 Error

shifangx force-pushed the shifang/qwen3_vl_m4_leftover branch from d75dac9 to 9ac3b0d Compare February 14, 2026 11:48

copy-pr-bot bot had a problem deploying to nemo-ci February 14, 2026 11:51 Error

shifangx force-pushed the shifang/qwen3_vl_m4_leftover branch from e4c5bc6 to 0c6c01e Compare February 14, 2026 11:54

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 11:55 Inactive

copy-pr-bot bot temporarily deployed to test February 14, 2026 11:55 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 11:58 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 14, 2026 12:05 Failure

qwen3-vl m4 leftover

a456f67

shifangx force-pushed the shifang/qwen3_vl_m4_leftover branch from c200198 to a456f67 Compare February 14, 2026 12:32

Merge branch 'main' into shifang/qwen3_vl_m4_leftover

badc17a

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 12:37 Inactive

copy-pr-bot bot temporarily deployed to test February 14, 2026 12:37 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 12:54 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 14, 2026 13:00 Failure

shifangx requested a review from yaoyu-33 February 14, 2026 14:20

shifangx force-pushed the shifang/qwen3_vl_m4_leftover branch from 1d34e0b to d3e503f Compare February 25, 2026 03:28

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 14:05 Inactive

shifangx changed the title ~~M4 leftover for QWen3-VL~~ M4 leftover for QWen3-VL after !1943 merged Feb 26, 2026

Merge branch 'main' into shifang/qwen3_vl_m4_leftover

f33487a

Signed-off-by: Shifang Xu <shifangx@nvidia.com>

copy-pr-bot bot temporarily deployed to test February 26, 2026 02:33 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 26, 2026 06:31 Inactive

yaoyu-33 approved these changes Feb 27, 2026

View reviewed changes

Conversation

shifangx commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 13, 2026

Uh oh!

shifangx commented Feb 13, 2026

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shifangx commented Feb 14, 2026

Uh oh!

shifangx commented Feb 14, 2026

Uh oh!

shifangx commented Feb 14, 2026

Uh oh!

shifangx commented Feb 14, 2026

Uh oh!

shifangx commented Feb 25, 2026

Uh oh!

shifangx commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shifangx commented Feb 13, 2026 •

edited

Loading

coderabbitai bot commented Feb 13, 2026 •

edited

Loading