Skip to content

Support Energon dataset format for updated Qwen3-VL finetuning pipeline#2679

Closed
aub123 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
aub123:fix/qwen3vl-energon-compat
Closed

Support Energon dataset format for updated Qwen3-VL finetuning pipeline#2679
aub123 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
aub123:fix/qwen3vl-energon-compat

Conversation

@aub123
Copy link

@aub123 aub123 commented Mar 6, 2026

Motivation

Recent updates to the Qwen3-VL finetuning pipeline introduce changes in the expected multimodal input format. The current implementation does not directly support Energon-format datasets used in our training pipeline.

Changes

  • Update multimodal sample parsing in qwen3_vl_bridge.py
  • Add compatibility logic for Energon dataset structure
  • Align data processing with the updated Qwen3-VL finetuning interface

Notes

This change ensures Energon-format datasets can be directly used in the Qwen3-VL finetuning pipeline without additional preprocessing.
Tested on Qwen3-VL-8B model.

Summary by CodeRabbit

Release Notes

  • Refactor

    • Streamlined Qwen3VL and Qwen3VL MoE model provider initialization with enhanced dtype consistency and explicit rotation position embedding configuration.
    • Simplified model bridge registration API for improved clarity and maintainability.
    • Consolidated multiple parameter mappings into unified configuration approach.
  • Breaking Changes

    • Removed legacy weight alignment utility functions; dependent code must migrate to updated parameter mappings.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b4a1e265-8c09-4cd7-b226-aab9697a0303

📥 Commits

Reviewing files that changed from the base of the PR and between e3e340b and edb3c63.

📒 Files selected for processing (1)
  • src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py

📝 Walkthrough

Walkthrough

The Qwen3VL bridge initialization was refactored to shift from generic provider_kwargs-based construction to explicit parameter instantiation with integrated dtype handling and RoPE configuration extraction. New mapping classes were introduced for expert MLPs, and legacy weight alignment utilities were removed.

Changes

Cohort / File(s) Summary
Qwen3VL Bridge Provider Construction
src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py
Replaced provider factory approach with direct explicit instantiation of Qwen3VLModelProvider and MoE variant, incorporating model_dtype inference from HF text_config, vision config synchronization, rotary/RoPE configuration extraction, and consolidated Qwen3-specific parameters (gated LUs, QKV bias, attention config, vision tokens, vocab settings).
Mapping Registry & Classes
src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py
Added ReplicatedMapping for vision_model.* to model.visual.* paths and expanded language-model-centric mappings (embeddings, output, layernorms, attention projections, MLP projections, QK layernorm). Introduced ExpertMLPDownProjMapping and ExpertMLPGateUpProjMapping with transpose-based weight handling and relaxed validation patterns for wildcard flexibility.
Utilities & Helpers Removal
src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py
Removed maybe_modify_converted_hf_weight and _align_weight_to_shape legacy utilities; simplified MOE weight alignment via new ExpertMLP mapping transpose logic.
Bridge Registration API
src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py
Updated MegatronModelBridge.register_bridge decorator usage on Qwen3VLBridge and Qwen3VLMoEBridge from four-parameter form (source, target, provider, model_type) to two-parameter form (source, target only).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • yaoyu-33
  • chtruong814
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant