Skip to content

[model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts#2735

Merged
kamran-nvidia merged 13 commits intomainfrom
kamran/variuos_fixes
Mar 12, 2026
Merged

[model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts#2735
kamran-nvidia merged 13 commits intomainfrom
kamran/variuos_fixes

Conversation

@kamran-nvidia
Copy link
Copy Markdown
Contributor

@kamran-nvidia kamran-nvidia commented Mar 10, 2026

What does this PR do ?

[model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts and recipes

Changelog

Summary

  • Refactor Qwen3-VL recipes: Remove dataset_type parameter from all SFT/PEFT config functions, replacing the inline Energon branching with a dedicated qwen3_vl_8b_peft_energon_config recipe. This simplifies each config function and makes dataset selection explicit via recipe choice rather than a runtime argument.
  • Remove --dataset_type CLI argument from run_recipe.py and all recipe loading functions, reducing interface complexity.
  • Rename Ministral3 scripts (peft.shpeft_unpacked.sh, sft.shsft_unpacked.sh) and rename sft_energon.shpeft_energon.sh for Qwen3-VL to better reflect their actual usage.
  • Fix bugs: Correct train.eval_itersvalidation.eval_iters parameter reference in sft_unpacked.sh.
  • Update parallelism configs: Adjust TP/PP/EP settings in peft_unpacked.sh and sft_unpacked.sh for more practical finetuning configurations.
  • Add W&B report links for expected training dynamics in Qwen3-VL and Ministral3 READMEs.

Test plan

  • Verify Qwen3-VL 8B SFT with sft_unpacked.sh runs correctly with updated parallelism (TP=4,1 and TP=2,1)
  • Verify Qwen3-VL 8B PEFT with peft_unpacked.sh runs with updated parallelism configs
  • Verify peft_energon.sh works with the new qwen3_vl_8b_peft_energon_config recipe
  • Confirm run_recipe.py no longer accepts --dataset_type and existing scripts still work
  • Verify W&B report links in READMEs are accessible

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • New Features

    • Added Energon dataset support for Qwen3-VL PEFT training workflows.
  • Documentation

    • Updated training guides with new "Expected Training Dynamics" sections including performance benchmarks via WandB reports.
  • Chores

    • Adjusted training hyperparameters (iterations, batch sizes, learning rates) across VLM example scripts for optimized performance.
    • Updated model configurations to use Instruct model variants.
    • Simplified training script parameter handling by removing dataset type configuration option.

…e-tuning

- Updated README.md to reference new unpacked scripts for SFT and PEFT.
- Introduced new `sft_unpacked.sh` and `peft_unpacked.sh` scripts for Ministral3 with enhanced configurations.
- Modified `peft.sh` and `sft.sh` scripts for Qwen3-VL to improve training parameters and evaluation intervals.
- Added `peft_energon.sh` for Energon dataset fine-tuning with LoRA.
- Enhanced `qwen3_vl.py` to support Energon dataset and updated dataset handling in SFT and PEFT configurations.
- Adjusted parallelism configurations and training parameters across various scripts for consistency and performance.

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33 yaoyu-33 added model-qwen area:model Model implementations and HF bridge logic and removed model-qwen labels Mar 11, 2026
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
…_unpacked.sh and sft_unpacked.sh

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
… in peft_unpacked.sh

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
…n README files

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@kamran-nvidia kamran-nvidia changed the title Refactor and enhance Qwen3-VL and Ministral3 scripts for improved fin… [model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts Mar 12, 2026
@kamran-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 37d2baa

@kamran-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 4011f62

@kamran-nvidia kamran-nvidia marked this pull request as ready for review March 12, 2026 18:06
@kamran-nvidia kamran-nvidia requested a review from cuichenx March 12, 2026 18:06
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

This PR updates VLM training configurations and framework across ministral3 and qwen3_vl models. Key changes include adjusting hyperparameters (training iterations, batch sizes, learning rates) in shell scripts, updating recipe references from finetune to SFT/PEFT configs, removing dataset_type support from the training framework, adding new Energon dataset support with corresponding configurations, and updating model references to Instruct variants.

Changes

Cohort / File(s) Summary
Documentation Updates
examples/models/vlm/ministral3/README.md, examples/models/vlm/qwen3_vl/README.md
Added "Expected Training Dynamics" subsections with WandB report links; updated script references from sft.sh/peft.sh to unpacked/energon variants.
Ministral3 Training Configuration
examples/models/vlm/ministral3/sft_unpacked.sh, examples/models/vlm/ministral3/peft_unpacked.sh
Updated hyperparameters (TRAIN_ITERS: 50→100, GLOBAL_BATCH_SIZE: 32→16, EVAL_ITERS: 10→20, learning rates adjusted); changed recipe names to sft_config/peft_config; moved eval_iters from train to validation section; added dataset.pack_sequences_in_batch=False.
Qwen3-VL Training Configuration
examples/models/vlm/qwen3_vl/peft.sh, examples/models/vlm/qwen3_vl/peft_unpacked.sh, examples/models/vlm/qwen3_vl/peft_energon.sh, examples/models/vlm/qwen3_vl/sft.sh, examples/models/vlm/qwen3_vl/sft_unpacked.sh
Standardized hyperparameters across configs (TRAIN_ITERS: 50→100, batch size reductions, EVAL_ITERS: 10→20, added EVAL_INTERVAL: 20); updated recipe references to sft_config/peft_config; migrated evaluation parameters to validation section; updated parallelism configurations; removed MoE test blocks from sft scripts; added energon-specific recipe variant.
Training Framework
scripts/training/run_recipe.py
Removed dataset_type parameter from CLI arguments, load_recipe signature, and all dynamic kwargs construction; simplified configuration passing by eliminating conditional dataset_type handling.
VLM Recipe Configuration
src/megatron/bridge/recipes/qwen_vl/__init__.py, src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py
Added new qwen3_vl_8b_peft_energon_config function with Energon dataset support via _make_energon_dataset helper; updated HF model references from standard to Instruct variants (8B, 30B-A3B, 235B-A22B); added alltoall token dispatcher for MoE configs; exposed new Energon config in public API.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • yaoyu-33
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the primary change: refactoring fine-tuning scripts for Qwen3-VL and Ministral3 models, which aligns with the major modifications across multiple script files and recipe configurations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes ✅ Passed PR includes W&B report links in modified READMEs documenting loss curves and grad norms for hyperparameter and parallelism changes, with PR objectives confirming completion of testing for major modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kamran/variuos_fixes
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/vlm/ministral3/README.md`:
- Around line 127-128: The README's "Expected Training Dynamics" section now
links an existing Weights & Biases report but older sentences still say "W&B
report coming soon" — remove or replace those stale lines (the instances
referencing "W&B report coming soon" near the W&B link and earlier in the file)
so the document consistently reflects that the W&B report exists; update any
surrounding text to reference the provided link (the W&B report URL and the
"Expected Training Dynamics" section) instead of the placeholder.
- Around line 104-110: Update the "Available recipes" section in
examples/models/vlm/ministral3/README.md to match the actual exports in
src/megatron/bridge/recipes/ministral3/__init__.py by replacing any occurrences
of ministral3_*_finetune_config with ministral3_*_sft_config and
ministral3_*_peft_config where appropriate, and ensure the README recipe
examples and script references (sft_unpacked.sh, peft_unpacked.sh) consistently
use the *_sft_config and *_peft_config names so users can load the recipes as
exported.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de9f982b-b22c-4869-8ffc-e57c667e1a18

📥 Commits

Reviewing files that changed from the base of the PR and between 52eaf8c and 4011f62.

📒 Files selected for processing (12)
  • examples/models/vlm/ministral3/README.md
  • examples/models/vlm/ministral3/peft_unpacked.sh
  • examples/models/vlm/ministral3/sft_unpacked.sh
  • examples/models/vlm/qwen3_vl/README.md
  • examples/models/vlm/qwen3_vl/peft.sh
  • examples/models/vlm/qwen3_vl/peft_energon.sh
  • examples/models/vlm/qwen3_vl/peft_unpacked.sh
  • examples/models/vlm/qwen3_vl/sft.sh
  • examples/models/vlm/qwen3_vl/sft_unpacked.sh
  • scripts/training/run_recipe.py
  • src/megatron/bridge/recipes/qwen_vl/__init__.py
  • src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py
💤 Files with no reviewable changes (1)
  • scripts/training/run_recipe.py

…n3 VL PEFT tests

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@kamran-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test b61857f

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@kamran-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 2299ea6

Copy link
Copy Markdown
Contributor

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants