[model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts#2735
[model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts#2735kamran-nvidia merged 13 commits intomainfrom
Conversation
…e-tuning - Updated README.md to reference new unpacked scripts for SFT and PEFT. - Introduced new `sft_unpacked.sh` and `peft_unpacked.sh` scripts for Ministral3 with enhanced configurations. - Modified `peft.sh` and `sft.sh` scripts for Qwen3-VL to improve training parameters and evaluation intervals. - Added `peft_energon.sh` for Energon dataset fine-tuning with LoRA. - Enhanced `qwen3_vl.py` to support Energon dataset and updated dataset handling in SFT and PEFT configurations. - Adjusted parallelism configurations and training parameters across various scripts for consistency and performance. Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
…_unpacked.sh and sft_unpacked.sh Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
… in peft_unpacked.sh Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
…n README files Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
|
/ok to test 37d2baa |
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
|
/ok to test 4011f62 |
📝 WalkthroughWalkthroughThis PR updates VLM training configurations and framework across ministral3 and qwen3_vl models. Key changes include adjusting hyperparameters (training iterations, batch sizes, learning rates) in shell scripts, updating recipe references from finetune to SFT/PEFT configs, removing dataset_type support from the training framework, adding new Energon dataset support with corresponding configurations, and updating model references to Instruct variants. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.OpenGrep is compatible with Semgrep configurations. Add an |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/models/vlm/ministral3/README.md`:
- Around line 127-128: The README's "Expected Training Dynamics" section now
links an existing Weights & Biases report but older sentences still say "W&B
report coming soon" — remove or replace those stale lines (the instances
referencing "W&B report coming soon" near the W&B link and earlier in the file)
so the document consistently reflects that the W&B report exists; update any
surrounding text to reference the provided link (the W&B report URL and the
"Expected Training Dynamics" section) instead of the placeholder.
- Around line 104-110: Update the "Available recipes" section in
examples/models/vlm/ministral3/README.md to match the actual exports in
src/megatron/bridge/recipes/ministral3/__init__.py by replacing any occurrences
of ministral3_*_finetune_config with ministral3_*_sft_config and
ministral3_*_peft_config where appropriate, and ensure the README recipe
examples and script references (sft_unpacked.sh, peft_unpacked.sh) consistently
use the *_sft_config and *_peft_config names so users can load the recipes as
exported.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: de9f982b-b22c-4869-8ffc-e57c667e1a18
📒 Files selected for processing (12)
examples/models/vlm/ministral3/README.mdexamples/models/vlm/ministral3/peft_unpacked.shexamples/models/vlm/ministral3/sft_unpacked.shexamples/models/vlm/qwen3_vl/README.mdexamples/models/vlm/qwen3_vl/peft.shexamples/models/vlm/qwen3_vl/peft_energon.shexamples/models/vlm/qwen3_vl/peft_unpacked.shexamples/models/vlm/qwen3_vl/sft.shexamples/models/vlm/qwen3_vl/sft_unpacked.shscripts/training/run_recipe.pysrc/megatron/bridge/recipes/qwen_vl/__init__.pysrc/megatron/bridge/recipes/qwen_vl/qwen3_vl.py
💤 Files with no reviewable changes (1)
- scripts/training/run_recipe.py
…n3 VL PEFT tests Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
|
/ok to test b61857f |
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
|
/ok to test 2299ea6 |
What does this PR do ?
[model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts and recipes
Changelog
Summary
dataset_typeparameter from all SFT/PEFT config functions, replacing the inline Energon branching with a dedicatedqwen3_vl_8b_peft_energon_configrecipe. This simplifies each config function and makes dataset selection explicit via recipe choice rather than a runtime argument.--dataset_typeCLI argument fromrun_recipe.pyand all recipe loading functions, reducing interface complexity.peft.sh→peft_unpacked.sh,sft.sh→sft_unpacked.sh) and renamesft_energon.sh→peft_energon.shfor Qwen3-VL to better reflect their actual usage.train.eval_iters→validation.eval_itersparameter reference insft_unpacked.sh.peft_unpacked.shandsft_unpacked.shfor more practical finetuning configurations.Test plan
sft_unpacked.shruns correctly with updated parallelism (TP=4,1 and TP=2,1)peft_unpacked.shruns with updated parallelism configspeft_energon.shworks with the newqwen3_vl_8b_peft_energon_configreciperun_recipe.pyno longer accepts--dataset_typeand existing scripts still workGitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
New Features
Documentation
Chores