Qwen3-VL Sequence Packing Example scripts by kamran-nvidia · Pull Request #2380 · NVIDIA-NeMo/Megatron-Bridge

kamran-nvidia · 2026-02-13T22:50:28Z

What does this PR do ?

Adds Qwen 3 VL script for testing Sequence Packing (examples/models/vlm/qwen3_vl/)

Results for Dense and MoE models with different Context Parallel sizes below:

Changelog

Adds Qwen 3 VL scripts for testing Sequence Packing under (examples/models/vlm/qwen3_vl/):

peft_seq_packed.sh : Script to verify sequence packing with LoRA fine-tuning and different CPs
peft_seq_unpacked.sh: Script to verify different parallelism configs with LoRA fine-tuning
sft_seq_packed.sh : Script to verify sequence packing with full fine-tuning and different CPs
sft_seq_unpacked.sh: Script to verify different parallelism configs with full fine-tuning

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Documentation
- Added documentation for sequence-packed parameter-efficient fine-tuning with LoRA for vision-language models.
New Features
- Added training script for orchestrating LoRA fine-tuning experiments with sequence packing across multiple model configurations.
- Extended support for Qwen3 VL model training with sequence packing capability.

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

copy-pr-bot · 2026-02-13T22:50:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

kamran-nvidia · 2026-02-13T22:51:30Z

/ok to test afe227a

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

kamran-nvidia · 2026-02-13T23:09:52Z

/ok to test d45ca41

coderabbitai · 2026-02-17T16:58:50Z

📝 Walkthrough

Walkthrough

This PR adds sequence-packed parameter-efficient fine-tuning support for Qwen3-VL by introducing documentation, a new bash script for orchestrating LoRA finetuning experiments with sequence packing configurations, registering a new forward step function in the training dispatcher, and modifying the forward step to reshape loss_mask during sequence packing.

Changes

Cohort / File(s)	Summary
Documentation `examples/models/vlm/qwen3_vl/README.md`	Added documentation section describing Sequence-Packed Parameter-Efficient Fine-Tuning (PEFT) with LoRA, linking to seq_packing.sh script and W&B reporting.
Example Scripts `examples/models/vlm/qwen3_vl/seq_packing.sh`	New Bash script that orchestrates LoRA finetuning experiments with sequence packing. Defines workspace, model checkpoints, shared hyperparameters, and nested loops over SEQ_PACKING_CONFIGS and PARALLELISM_CONFIGS to launch distributed training runs for 8B and 30B model variants with varying parallelism strategies.
Training Integration `scripts/training/run_recipe.py`	Registered new Qwen3-VL forward step function (qwen3_vl_step) in STEP_FUNCTIONS dispatcher by importing qwen3_vl_forward_step.
Model Forward Logic `src/megatron/bridge/models/qwen_vl/qwen3_vl_step.py`	Modified forward_step to reshape loss_mask to shape (1, -1) in forward_args when pack_sequences_in_batch is enabled, ensuring loss_mask alignment with model expectations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

cp: add peft to recipe qwen3vl (2023) into r0.3.0 #2220: Adds PEFT (LoRA) support for Qwen3-VL and modifies forward path with loss_mask handling.
cp: [training] fix: Add cu_seqlens_argmin to vlm packed sequence (2246) into r0.3.0 #2320: Modifies VLM packed-sequence handling and sequence packing support for Qwen3-VL.
[docs] Cherrypick 2267 2362 #2382: Modifies Qwen3-VL examples and adds example scripts in the same directory.

Suggested labels

Run CICD

Suggested reviewers

cuichenx
yashaswikarnati

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR description includes training loss curves for Dense and MoE models across multiple configurations demonstrating normal convergence without regressions.	Critical bug identified at line 286 in qwen3_vl_step.py: loss_mask reshaped without None guard when pipeline_model_parallel_size > 1, causing AttributeError. Must be fixed before merge.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding sequence packing example scripts for Qwen3-VL models.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kamran/qwen3_vl_seq_packing

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/training/run_recipe.py (1)

109-109: ⚠️ Potential issue | 🟡 Minor

Help text for --step_func doesn't mention the new qwen3_vl_step option.

The help string still only lists gpt_step, vlm_step, and llava_step. Users won't discover qwen3_vl_step from --help.

Proposed fix

-        help="Step function: gpt_step (text-only), vlm_step (vision-language), or llava_step (LLaVA models)",
+        help="Step function: gpt_step (text-only), vlm_step (vision-language), llava_step (LLaVA models), or qwen3_vl_step (Qwen3 VL models)",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/training/run_recipe.py` at line 109, Update the help string for the
--step_func argument to include the new qwen3_vl_step option so users see it in
--help; locate the argparse add_argument call that defines "--step_func" in
run_recipe.py (the help parameter mentioning gpt_step, vlm_step, llava_step) and
append ", qwen3_vl_step (Qwen-3 vision-language)" or similar to the existing
help text.

🧹 Nitpick comments (3)

examples/models/vlm/qwen3_vl/seq_packing.sh (3)
1-1: Add set -euo pipefail for safer script execution.

Per Google Shell Style Guide, scripts should fail early on errors. Without this, a failing training run will be silently ignored and the loop will continue to the next configuration.
Proposed fix
 #!/usr/bin/env bash
+set -euo pipefail
 # Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
As per coding guidelines: **/*.sh: Follow Google Shell Style Guide.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/vlm/qwen3_vl/seq_packing.sh` at line 1, The script lacks
strict error handling; update seq_packing.sh to enable safe-failure mode by
adding a shell option line immediately after the shebang to set -euo pipefail
(and optionally set IFS=$'\n\t') so the script exits on errors, treats unset
variables as failures, and propagates pipe errors; place this change at the top
of seq_packing.sh (right after #!/usr/bin/env bash) so all subsequent commands
run under these safer semantics.
74-87: Hyperparameters are duplicated between dense and MoE sections.

Lines 22–31 and 77–86 define identical values for DATASET_NAME, SEQ_LENGTH, TRAIN_ITERS, GLOBAL_BATCH_SIZE, MICRO_BATCH_SIZE, EVAL_ITERS, LR, MIN_LR, LR_WARMUP_ITERS, LOG_INTERVAL, and WANDB_PROJECT. Consider defining shared defaults once at the top and only overriding what differs (checkpoint path, model name, parallelism configs).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/vlm/qwen3_vl/seq_packing.sh` around lines 74 - 87, Move the
duplicated hyperparameter definitions into a single shared defaults block and
reference those variables in both the dense and MoE sections instead of
redefining them; specifically keep only differing values like
PRETRAINED_CHECKPOINT and MODEL_NAME (and any parallelism-specific vars) in each
section and remove the repeated definitions of DATASET_NAME, SEQ_LENGTH,
TRAIN_ITERS, GLOBAL_BATCH_SIZE, MICRO_BATCH_SIZE, EVAL_ITERS, LR, MIN_LR,
LR_WARMUP_ITERS, LOG_INTERVAL, and WANDB_PROJECT so the top-level defaults are
used and individual sections only override what changes.
43-43: Use uv run instead of python directly.

The coding guidelines require using uv run to execute scripts instead of activating a virtual environment and calling python directly. Both invocations on lines 43 and 98 should be updated.
Proposed fix
-        python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
+        uv run -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
As per coding guidelines: {**/*.sh,examples/**/*.py}: Use 'uv run' to execute scripts instead of activating a virtual environment and calling 'python' directly.

Also applies to: 98-98
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/vlm/qwen3_vl/seq_packing.sh` at line 43, Replace direct
Python invocations with uv run: find the two occurrences of the command "python
-m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py" (the
one at line 43 and the one at line 98) and update them to invoke the interpreter
via uv run (e.g., "uv run -- python -m torch.distributed.run --nproc_per_node=8
scripts/training/run_recipe.py") so the script is executed through uv rather
than calling python directly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/models/qwen_vl/qwen3_vl_step.py`:
- Around line 286-287: The reshape of forward_args["loss_mask"] is missing a
None guard and will raise on non-last pipeline-parallel stages; mirror the
existing guard used for forward_args["labels"] (see the check around the labels
reshape in qwen3_vl_step.py) by wrapping the line that does
forward_args["loss_mask"] = forward_args["loss_mask"].reshape(1, -1) in an if
forward_args["loss_mask"] is not None: block so loss_mask is only reshaped when
present.

---

Outside diff comments:
In `@scripts/training/run_recipe.py`:
- Line 109: Update the help string for the --step_func argument to include the
new qwen3_vl_step option so users see it in --help; locate the argparse
add_argument call that defines "--step_func" in run_recipe.py (the help
parameter mentioning gpt_step, vlm_step, llava_step) and append ", qwen3_vl_step
(Qwen-3 vision-language)" or similar to the existing help text.

---

Nitpick comments:
In `@examples/models/vlm/qwen3_vl/seq_packing.sh`:
- Line 1: The script lacks strict error handling; update seq_packing.sh to
enable safe-failure mode by adding a shell option line immediately after the
shebang to set -euo pipefail (and optionally set IFS=$'\n\t') so the script
exits on errors, treats unset variables as failures, and propagates pipe errors;
place this change at the top of seq_packing.sh (right after #!/usr/bin/env bash)
so all subsequent commands run under these safer semantics.
- Around line 74-87: Move the duplicated hyperparameter definitions into a
single shared defaults block and reference those variables in both the dense and
MoE sections instead of redefining them; specifically keep only differing values
like PRETRAINED_CHECKPOINT and MODEL_NAME (and any parallelism-specific vars) in
each section and remove the repeated definitions of DATASET_NAME, SEQ_LENGTH,
TRAIN_ITERS, GLOBAL_BATCH_SIZE, MICRO_BATCH_SIZE, EVAL_ITERS, LR, MIN_LR,
LR_WARMUP_ITERS, LOG_INTERVAL, and WANDB_PROJECT so the top-level defaults are
used and individual sections only override what changes.
- Line 43: Replace direct Python invocations with uv run: find the two
occurrences of the command "python -m torch.distributed.run --nproc_per_node=8
scripts/training/run_recipe.py" (the one at line 43 and the one at line 98) and
update them to invoke the interpreter via uv run (e.g., "uv run -- python -m
torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py") so the
script is executed through uv rather than calling python directly.

src/megatron/bridge/models/qwen_vl/qwen3_vl_step.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: kamran-nvidia <kjafarisadeg@nvidia.com>

examples/models/vlm/qwen3_vl/peft_seq_packed.sh

…urations Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

…Megatron-Bridge into kamran/qwen3_vl_seq_packing Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

kamran-nvidia · 2026-02-17T18:35:01Z

/ok to test e36e86b

cuichenx

LGTM thanks!

init commit

afe227a

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 13, 2026 22:51 Error

Fix linting issue

d45ca41

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

copy-pr-bot bot temporarily deployed to test February 13, 2026 23:10 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 23:11 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 23:31 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 13, 2026 23:52 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 00:40 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 00:50 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 14, 2026 00:50 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 00:50 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 05:42 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 14, 2026 05:42 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 05:42 Inactive

kamran-nvidia added 2 commits February 17, 2026 08:20

Merge branch 'main' into kamran/qwen3_vl_seq_packing

cd1aaf9

Merge branch 'main' into kamran/qwen3_vl_seq_packing

782f179

kamran-nvidia marked this pull request as ready for review February 17, 2026 16:55

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

src/megatron/bridge/models/qwen_vl/qwen3_vl_step.py Outdated Show resolved Hide resolved

Update src/megatron/bridge/models/qwen_vl/qwen3_vl_step.py

0b169ae

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: kamran-nvidia <kjafarisadeg@nvidia.com>

kamran-nvidia requested a review from cuichenx February 17, 2026 17:00

cuichenx reviewed Feb 17, 2026

View reviewed changes

examples/models/vlm/qwen3_vl/peft_seq_packed.sh Show resolved Hide resolved

kamran-nvidia added 4 commits February 17, 2026 10:00

Add new scripts for sequence-packing and unpacking fine-tuning config…

92c8442

…urations Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

Merge branch 'kamran/qwen3_vl_seq_packing' of github.com:NVIDIA-NeMo/…

717769b

…Megatron-Bridge into kamran/qwen3_vl_seq_packing Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

rename scripts

1059538

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>

Merge branch 'main' into kamran/qwen3_vl_seq_packing

e36e86b

cuichenx approved these changes Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-VL Sequence Packing Example scripts#2380

Qwen3-VL Sequence Packing Example scripts#2380
cuichenx merged 10 commits intomainfrom
kamran/qwen3_vl_seq_packing

kamran-nvidia commented Feb 13, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 13, 2026

Uh oh!

kamran-nvidia commented Feb 13, 2026

Uh oh!

kamran-nvidia commented Feb 13, 2026

Uh oh!

coderabbitai bot commented Feb 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

kamran-nvidia commented Feb 17, 2026

Uh oh!

cuichenx left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kamran-nvidia commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 13, 2026

Uh oh!

kamran-nvidia commented Feb 13, 2026

Uh oh!

kamran-nvidia commented Feb 13, 2026

Uh oh!

coderabbitai bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kamran-nvidia commented Feb 17, 2026

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kamran-nvidia commented Feb 13, 2026 •

edited

Loading

coderabbitai bot commented Feb 17, 2026 •

edited

Loading