[docs] Cherrypick 2267 2362 by cuichenx · Pull Request #2382 · NVIDIA-NeMo/Megatron-Bridge

cuichenx · 2026-02-14T01:03:41Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Documentation
- Streamlined Qwen3-VL documentation with consolidated examples section.
- Updated training example code to reflect API changes.
New Features
- Added comprehensive Qwen3-VL example workflows including checkpoint conversion, inference, and finetuning (LoRA and full finetuning).
- Added new README with setup and usage guidance for Qwen3-VL models.
Chores
- Added WandB logging configuration notes to training example scripts.
- Improved execution reliability of Ministral3 example scripts.

coderabbitai · 2026-02-14T01:08:06Z

📝 Walkthrough

Walkthrough

The PR consolidates Qwen3-VL documentation by redirecting detailed examples to an external README, updates training example code for API changes, adds WandB configuration guidance to multiple VLM example scripts, integrates the --no-sync flag into Ministral3 scripts, and introduces comprehensive new example scripts for Qwen3-VL model workflows.

Changes

Cohort / File(s)	Summary
Documentation consolidation `docs/models/vlm/qwen3-vl.md`	Replaces multi-section conversion, inference, and finetuning details with single "Examples" section pointing to external Qwen3-VL Examples README.
Training example updates `docs/training/multi-token-prediction.md`	Updates imports, config structure (MTP parameters nested under `cfg.model`), removes logger line, converts f-string to plain string, and adds `forward_step` argument to `pretrain()` call.
WandB logging guidance `examples/models/vlm/gemma3_vl/peft.sh`, `examples/models/vlm/gemma3_vl/sft.sh`, `examples/models/vlm/glm_45v/slurm_peft.sh`, `examples/models/vlm/glm_45v/slurm_sft.sh`	Adds commented instructions to set WANDB_API_KEY or disable wandb logging with `WANDB_MODE=disabled` export example.
Ministral3 uv run flag additions `examples/models/vlm/ministral3/conversion.sh`, `examples/models/vlm/ministral3/inference.sh`, `examples/models/vlm/ministral3/peft.sh`, `examples/models/vlm/ministral3/sft.sh`	Adds `--no-sync` flag to all `uv run` invocations with explanatory comments about Transformers 5 requirement and virtual-environment conflict avoidance.
Qwen3-VL comprehensive examples `examples/models/vlm/qwen3_vl/README.md`, `examples/models/vlm/qwen3_vl/conversion.sh`, `examples/models/vlm/qwen3_vl/inference.sh`, `examples/models/vlm/qwen3_vl/peft.sh`, `examples/models/vlm/qwen3_vl/sft.sh`	New example package with README covering setup, conversion, inference, and finetuning workflows; shell scripts orchestrate checkpoint conversion (HF ↔ Megatron), distributed inference across model variants, and LoRA/SFT training with configurable parallelism (TP/PP/EP).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[docs, model] Add Qwen 3 VL Examples #2267: Directly related—collapses qwen3-vl.md docs and adds identical Qwen3-VL examples package (README and scripts).
cp: [docs, model] Add Ministral 3 Examples (2139) into r0.3.0 #2204: Related—modifies same Ministral3 example scripts (conversion.sh, inference.sh, peft.sh, sft.sh) with overlapping uv run and training configuration changes.
[fix] Example scripts miscellaneous enhancement #2362: Related—modifies same MTP training example with overlapping imports, config structure, and API call updates.

Suggested labels

r0.3.0, cherry-pick

Suggested reviewers

kamran-nvidia

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title '[docs] Cherrypick 2267 2362' is vague and does not clearly communicate the main changes; it references internal PR numbers without explaining what was cherrypicked or why.	Use a more descriptive title that explains the primary change, e.g., '[docs] Add Qwen 3 VL examples and enhance example scripts' or similar to clarify the actual content being merged.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `r0.3.0`
Test Results For Major Changes	✅ Passed	All changes are documentation and example script updates without core functionality modifications or breaking changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chcui/cp-2267-2362

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@examples/models/vlm/qwen3_vl/README.md`:
- Around line 24-27: Update the README code blocks that invoke scripts to use
"uv run python" instead of bare "python": replace the call for
convert_checkpoints.py (the block showing "python
examples/conversion/convert_checkpoints.py import ... --hf-model
Qwen/Qwen3-VL-8B-Instruct --megatron-path ..."), the export example mentioned
around the export command, and the inference example so they all invoke "uv run
python" consistently with the conversion.sh and inference.sh scripts; ensure the
exact script names (convert_checkpoints.py, the export script, and the inference
script shown) are changed in their respective fenced code blocks to use "uv run
python".

🧹 Nitpick comments (10)

examples/models/vlm/gemma3_vl/peft.sh (1)
1-1: Consider adding error handling setup.

Following the Google Shell Style Guide, consider adding set -euo pipefail after the copyright header to enable robust error handling. This causes the script to exit on errors, undefined variables, and pipe failures.
🛡️ Suggested addition for error handling
 # limitations under the License.
+
+set -euo pipefail
 
 # Workspace directory for checkpoints and results
As per coding guidelines: Follow Google Shell Style Guide.
examples/models/vlm/qwen3_vl/conversion.sh (2)
16-17: Add set -euo pipefail after the license header.

Per the Google Shell Style Guide, scripts should fail early on errors. Without this, a failed conversion step will silently continue to the next command, potentially operating on missing/corrupt checkpoints.

Also, quote ${WORKSPACE} expansions throughout to guard against paths with spaces.
Proposed fix
+set -euo pipefail
+
 # Workspace directory for checkpoints and results
 WORKSPACE=${WORKSPACE:-/workspace}
And for variable expansions (apply similarly to all occurrences):
-    --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct
+    --megatron-path "${WORKSPACE}/models/Qwen3-VL-8B-Instruct"
46-47: Missing trailing newline at end of file.

POSIX requires a trailing newline; some tools may warn or misbehave without one.
examples/models/vlm/qwen3_vl/README.md (1)

60-60: Add a language specifier to the fenced code block.

Static analysis flags this as MD040. Since this shows terminal output, use ```text or ```console.
examples/models/vlm/qwen3_vl/sft.sh (2)
16-17: Add set -euo pipefail after the license header.

Same as conversion.sh — without this, a failed training run won't stop the loop, and subsequent iterations will launch on a potentially unhealthy node. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.
Proposed fix
+set -euo pipefail
+
 # Workspace directory for checkpoints and results
 WORKSPACE=${WORKSPACE:-/workspace}
23-36: Dense and MoE configuration blocks are near-duplicates.

The two variable blocks (lines 23–36 and 68–81) are almost identical. Consider extracting shared defaults into a common block at the top and only overriding PRETRAINED_CHECKPOINT and MODEL_NAME per model. This would reduce maintenance burden if hyperparameters change.

Also applies to: 68-81
examples/models/vlm/qwen3_vl/inference.sh (2)

16-17: Add set -euo pipefail after the license header.

Consistent with the other new scripts — fail fast on errors. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.

69-70: Missing trailing newline at end of file.
examples/models/vlm/qwen3_vl/peft.sh (2)
16-17: Add set -euo pipefail after the license header.

Same recommendation as the other scripts. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.

104-104: Trailing extra space before the backslash.

Line 104 has a double space before \. Minor formatting nit.
-        checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_ep${EP}_tp${TP}_pp${PP}  \
+        checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_ep${EP}_tp${TP}_pp${PP} \

coderabbitai · 2026-02-14T01:08:08Z

examples/models/vlm/qwen3_vl/README.md

+python examples/conversion/convert_checkpoints.py import \
+  --hf-model Qwen/Qwen3-VL-8B-Instruct \
+  --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct
+```


⚠️ Potential issue | 🟡 Minor

README code snippets use bare python but scripts use uv run python.

The actual shell scripts (conversion.sh, inference.sh) consistently use uv run python, but the README examples use bare python. This inconsistency will confuse users who copy-paste from the README. Update all code blocks to match:

-python examples/conversion/convert_checkpoints.py import \ +uv run python examples/conversion/convert_checkpoints.py import \

Apply the same change to the export example (line 31) and the inference example (line 42). As per coding guidelines, uv run should be used to execute scripts instead of calling python directly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

python examples/conversion/convert_checkpoints.py import \

--hf-model Qwen/Qwen3-VL-8B-Instruct \

--megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct

```

uv run python examples/conversion/convert_checkpoints.py import \

--hf-model Qwen/Qwen3-VL-8B-Instruct \

--megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct

🤖 Prompt for AI Agents

In `@examples/models/vlm/qwen3_vl/README.md` around lines 24 - 27, Update the README code blocks that invoke scripts to use "uv run python" instead of bare "python": replace the call for convert_checkpoints.py (the block showing "python examples/conversion/convert_checkpoints.py import ... --hf-model Qwen/Qwen3-VL-8B-Instruct --megatron-path ..."), the export example mentioned around the export command, and the inference example so they all invoke "uv run python" consistently with the conversion.sh and inference.sh scripts; ensure the exact script names (convert_checkpoints.py, the export script, and the inference script shown) are changed in their respective fenced code blocks to use "uv run python".

cuichenx added 2 commits February 13, 2026 17:00

[docs, model] Add Qwen 3 VL Examples (#2267)

837038b

[fix] Example scripts miscellaneous enhancement (#2362)

ac24d05

cuichenx added the docs-only With great power comes great responsibility. label Feb 14, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 14, 2026 01:04 Inactive

ko3n1g approved these changes Feb 14, 2026

View reviewed changes

ko3n1g merged commit 31ca75b into r0.3.0 Feb 14, 2026
22 checks passed

ko3n1g deleted the chcui/cp-2267-2362 branch February 14, 2026 01:07

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

This was referenced Feb 17, 2026

Qwen3-VL Sequence Packing Example scripts #2380

Merged

[misc] fix: Fix CLI_OVERRIDES quoting in slurm scripts and use uv wrapper #2467

Merged

Fix Energon Support in Qwen3-VL #2440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Cherrypick 2267 2362#2382

[docs] Cherrypick 2267 2362#2382
ko3n1g merged 2 commits intor0.3.0from
chcui/cp-2267-2362

cuichenx commented Feb 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Uh oh!

coderabbitai bot commented Feb 14, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cuichenx commented Feb 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai bot commented Feb 14, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuichenx commented Feb 14, 2026 •

edited by coderabbitai bot

Loading