Skip to content

Qwen3-VL Sequence Packing Example scripts#2380

Merged
cuichenx merged 10 commits intomainfrom
kamran/qwen3_vl_seq_packing
Feb 18, 2026
Merged

Qwen3-VL Sequence Packing Example scripts#2380
cuichenx merged 10 commits intomainfrom
kamran/qwen3_vl_seq_packing

Conversation

@kamran-nvidia
Copy link
Copy Markdown
Contributor

@kamran-nvidia kamran-nvidia commented Feb 13, 2026

What does this PR do ?

Adds Qwen 3 VL script for testing Sequence Packing (examples/models/vlm/qwen3_vl/)

Results for Dense and MoE models with different Context Parallel sizes below:

dense_model moe_model

Changelog

Adds Qwen 3 VL scripts for testing Sequence Packing under (examples/models/vlm/qwen3_vl/):

  • peft_seq_packed.sh : Script to verify sequence packing with LoRA fine-tuning and different CPs
  • peft_seq_unpacked.sh: Script to verify different parallelism configs with LoRA fine-tuning
  • sft_seq_packed.sh : Script to verify sequence packing with full fine-tuning and different CPs
  • sft_seq_unpacked.sh: Script to verify different parallelism configs with full fine-tuning

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Documentation

    • Added documentation for sequence-packed parameter-efficient fine-tuning with LoRA for vision-language models.
  • New Features

    • Added training script for orchestrating LoRA fine-tuning experiments with sequence packing across multiple model configurations.
    • Extended support for Qwen3 VL model training with sequence packing capability.

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kamran-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test afe227a

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@kamran-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test d45ca41

@kamran-nvidia kamran-nvidia marked this pull request as ready for review February 17, 2026 16:55
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This PR adds sequence-packed parameter-efficient fine-tuning support for Qwen3-VL by introducing documentation, a new bash script for orchestrating LoRA finetuning experiments with sequence packing configurations, registering a new forward step function in the training dispatcher, and modifying the forward step to reshape loss_mask during sequence packing.

Changes

Cohort / File(s) Summary
Documentation
examples/models/vlm/qwen3_vl/README.md
Added documentation section describing Sequence-Packed Parameter-Efficient Fine-Tuning (PEFT) with LoRA, linking to seq_packing.sh script and W&B reporting.
Example Scripts
examples/models/vlm/qwen3_vl/seq_packing.sh
New Bash script that orchestrates LoRA finetuning experiments with sequence packing. Defines workspace, model checkpoints, shared hyperparameters, and nested loops over SEQ_PACKING_CONFIGS and PARALLELISM_CONFIGS to launch distributed training runs for 8B and 30B model variants with varying parallelism strategies.
Training Integration
scripts/training/run_recipe.py
Registered new Qwen3-VL forward step function (qwen3_vl_step) in STEP_FUNCTIONS dispatcher by importing qwen3_vl_forward_step.
Model Forward Logic
src/megatron/bridge/models/qwen_vl/qwen3_vl_step.py
Modified forward_step to reshape loss_mask to shape (1, -1) in forward_args when pack_sequences_in_batch is enabled, ensuring loss_mask alignment with model expectations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

Run CICD

Suggested reviewers

  • cuichenx
  • yashaswikarnati
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR description includes training loss curves for Dense and MoE models across multiple configurations demonstrating normal convergence without regressions. Critical bug identified at line 286 in qwen3_vl_step.py: loss_mask reshaped without None guard when pipeline_model_parallel_size > 1, causing AttributeError. Must be fixed before merge.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding sequence packing example scripts for Qwen3-VL models.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kamran/qwen3_vl_seq_packing

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/training/run_recipe.py (1)

109-109: ⚠️ Potential issue | 🟡 Minor

Help text for --step_func doesn't mention the new qwen3_vl_step option.

The help string still only lists gpt_step, vlm_step, and llava_step. Users won't discover qwen3_vl_step from --help.

Proposed fix
-        help="Step function: gpt_step (text-only), vlm_step (vision-language), or llava_step (LLaVA models)",
+        help="Step function: gpt_step (text-only), vlm_step (vision-language), llava_step (LLaVA models), or qwen3_vl_step (Qwen3 VL models)",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/training/run_recipe.py` at line 109, Update the help string for the
--step_func argument to include the new qwen3_vl_step option so users see it in
--help; locate the argparse add_argument call that defines "--step_func" in
run_recipe.py (the help parameter mentioning gpt_step, vlm_step, llava_step) and
append ", qwen3_vl_step (Qwen-3 vision-language)" or similar to the existing
help text.
🧹 Nitpick comments (3)
examples/models/vlm/qwen3_vl/seq_packing.sh (3)

1-1: Add set -euo pipefail for safer script execution.

Per Google Shell Style Guide, scripts should fail early on errors. Without this, a failing training run will be silently ignored and the loop will continue to the next configuration.

Proposed fix
 #!/usr/bin/env bash
+set -euo pipefail
 # Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.

As per coding guidelines: **/*.sh: Follow Google Shell Style Guide.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/vlm/qwen3_vl/seq_packing.sh` at line 1, The script lacks
strict error handling; update seq_packing.sh to enable safe-failure mode by
adding a shell option line immediately after the shebang to set -euo pipefail
(and optionally set IFS=$'\n\t') so the script exits on errors, treats unset
variables as failures, and propagates pipe errors; place this change at the top
of seq_packing.sh (right after #!/usr/bin/env bash) so all subsequent commands
run under these safer semantics.

74-87: Hyperparameters are duplicated between dense and MoE sections.

Lines 22–31 and 77–86 define identical values for DATASET_NAME, SEQ_LENGTH, TRAIN_ITERS, GLOBAL_BATCH_SIZE, MICRO_BATCH_SIZE, EVAL_ITERS, LR, MIN_LR, LR_WARMUP_ITERS, LOG_INTERVAL, and WANDB_PROJECT. Consider defining shared defaults once at the top and only overriding what differs (checkpoint path, model name, parallelism configs).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/vlm/qwen3_vl/seq_packing.sh` around lines 74 - 87, Move the
duplicated hyperparameter definitions into a single shared defaults block and
reference those variables in both the dense and MoE sections instead of
redefining them; specifically keep only differing values like
PRETRAINED_CHECKPOINT and MODEL_NAME (and any parallelism-specific vars) in each
section and remove the repeated definitions of DATASET_NAME, SEQ_LENGTH,
TRAIN_ITERS, GLOBAL_BATCH_SIZE, MICRO_BATCH_SIZE, EVAL_ITERS, LR, MIN_LR,
LR_WARMUP_ITERS, LOG_INTERVAL, and WANDB_PROJECT so the top-level defaults are
used and individual sections only override what changes.

43-43: Use uv run instead of python directly.

The coding guidelines require using uv run to execute scripts instead of activating a virtual environment and calling python directly. Both invocations on lines 43 and 98 should be updated.

Proposed fix
-        python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
+        uv run -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \

As per coding guidelines: {**/*.sh,examples/**/*.py}: Use 'uv run' to execute scripts instead of activating a virtual environment and calling 'python' directly.

Also applies to: 98-98

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/vlm/qwen3_vl/seq_packing.sh` at line 43, Replace direct
Python invocations with uv run: find the two occurrences of the command "python
-m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py" (the
one at line 43 and the one at line 98) and update them to invoke the interpreter
via uv run (e.g., "uv run -- python -m torch.distributed.run --nproc_per_node=8
scripts/training/run_recipe.py") so the script is executed through uv rather
than calling python directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/models/qwen_vl/qwen3_vl_step.py`:
- Around line 286-287: The reshape of forward_args["loss_mask"] is missing a
None guard and will raise on non-last pipeline-parallel stages; mirror the
existing guard used for forward_args["labels"] (see the check around the labels
reshape in qwen3_vl_step.py) by wrapping the line that does
forward_args["loss_mask"] = forward_args["loss_mask"].reshape(1, -1) in an if
forward_args["loss_mask"] is not None: block so loss_mask is only reshaped when
present.

---

Outside diff comments:
In `@scripts/training/run_recipe.py`:
- Line 109: Update the help string for the --step_func argument to include the
new qwen3_vl_step option so users see it in --help; locate the argparse
add_argument call that defines "--step_func" in run_recipe.py (the help
parameter mentioning gpt_step, vlm_step, llava_step) and append ", qwen3_vl_step
(Qwen-3 vision-language)" or similar to the existing help text.

---

Nitpick comments:
In `@examples/models/vlm/qwen3_vl/seq_packing.sh`:
- Line 1: The script lacks strict error handling; update seq_packing.sh to
enable safe-failure mode by adding a shell option line immediately after the
shebang to set -euo pipefail (and optionally set IFS=$'\n\t') so the script
exits on errors, treats unset variables as failures, and propagates pipe errors;
place this change at the top of seq_packing.sh (right after #!/usr/bin/env bash)
so all subsequent commands run under these safer semantics.
- Around line 74-87: Move the duplicated hyperparameter definitions into a
single shared defaults block and reference those variables in both the dense and
MoE sections instead of redefining them; specifically keep only differing values
like PRETRAINED_CHECKPOINT and MODEL_NAME (and any parallelism-specific vars) in
each section and remove the repeated definitions of DATASET_NAME, SEQ_LENGTH,
TRAIN_ITERS, GLOBAL_BATCH_SIZE, MICRO_BATCH_SIZE, EVAL_ITERS, LR, MIN_LR,
LR_WARMUP_ITERS, LOG_INTERVAL, and WANDB_PROJECT so the top-level defaults are
used and individual sections only override what changes.
- Line 43: Replace direct Python invocations with uv run: find the two
occurrences of the command "python -m torch.distributed.run --nproc_per_node=8
scripts/training/run_recipe.py" (the one at line 43 and the one at line 98) and
update them to invoke the interpreter via uv run (e.g., "uv run -- python -m
torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py") so the
script is executed through uv rather than calling python directly.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: kamran-nvidia <kjafarisadeg@nvidia.com>
…urations

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
…Megatron-Bridge into kamran/qwen3_vl_seq_packing

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@kamran-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test e36e86b

Copy link
Copy Markdown
Contributor

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants