Support Energon dataset format for updated Qwen3-VL finetuning pipeline by aub123 · Pull Request #2679 · NVIDIA-NeMo/Megatron-Bridge

aub123 · 2026-03-06T09:25:52Z

Motivation

Recent updates to the Qwen3-VL finetuning pipeline introduce changes in the expected multimodal input format. The current implementation does not directly support Energon-format datasets used in our training pipeline.

Changes

Update multimodal sample parsing in qwen3_vl_bridge.py
Add compatibility logic for Energon dataset structure
Align data processing with the updated Qwen3-VL finetuning interface

Notes

This change ensures Energon-format datasets can be directly used in the Qwen3-VL finetuning pipeline without additional preprocessing.
Tested on Qwen3-VL-8B model.

Summary by CodeRabbit

Release Notes

Refactor
- Streamlined Qwen3VL and Qwen3VL MoE model provider initialization with enhanced dtype consistency and explicit rotation position embedding configuration.
- Simplified model bridge registration API for improved clarity and maintainability.
- Consolidated multiple parameter mappings into unified configuration approach.
Breaking Changes
- Removed legacy weight alignment utility functions; dependent code must migrate to updated parameter mappings.

copy-pr-bot · 2026-03-06T09:25:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-06T09:27:25Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b4a1e265-8c09-4cd7-b226-aab9697a0303

📥 Commits

Reviewing files that changed from the base of the PR and between e3e340b and edb3c63.

📒 Files selected for processing (1)

src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py

📝 Walkthrough

Walkthrough

The Qwen3VL bridge initialization was refactored to shift from generic provider_kwargs-based construction to explicit parameter instantiation with integrated dtype handling and RoPE configuration extraction. New mapping classes were introduced for expert MLPs, and legacy weight alignment utilities were removed.

Changes

Cohort / File(s)	Summary
Qwen3VL Bridge Provider Construction `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py`	Replaced provider factory approach with direct explicit instantiation of Qwen3VLModelProvider and MoE variant, incorporating model_dtype inference from HF text_config, vision config synchronization, rotary/RoPE configuration extraction, and consolidated Qwen3-specific parameters (gated LUs, QKV bias, attention config, vision tokens, vocab settings).
Mapping Registry & Classes `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py`	Added ReplicatedMapping for vision_model.* to model.visual.* paths and expanded language-model-centric mappings (embeddings, output, layernorms, attention projections, MLP projections, QK layernorm). Introduced ExpertMLPDownProjMapping and ExpertMLPGateUpProjMapping with transpose-based weight handling and relaxed validation patterns for wildcard flexibility.
Utilities & Helpers Removal `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py`	Removed `maybe_modify_converted_hf_weight` and `_align_weight_to_shape` legacy utilities; simplified MOE weight alignment via new ExpertMLP mapping transpose logic.
Bridge Registration API `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py`	Updated MegatronModelBridge.register_bridge decorator usage on Qwen3VLBridge and Qwen3VLMoEBridge from four-parameter form (source, target, provider, model_type) to two-parameter form (source, target only).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[bridge, model] Qwen 3.5 VL Bridge #2530: Concurrently modifies Qwen VL bridge components and introduces similar ExpertMLP mapping class changes.

Suggested reviewers

yaoyu-33
chtruong814

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Adapt Energon dataset format to new Qwen3-VL finetune interface

edb3c63

github-actions bot added the community-request label Mar 6, 2026

aub123 closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Energon dataset format for updated Qwen3-VL finetuning pipeline#2679

Support Energon dataset format for updated Qwen3-VL finetuning pipeline#2679
aub123 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
aub123:fix/qwen3vl-energon-compat

aub123 commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aub123 commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Notes

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aub123 commented Mar 6, 2026 •

edited by coderabbitai bot

Loading