Skip to content

[docs] Cherrypick 2267 2362#2382

Merged
ko3n1g merged 2 commits intor0.3.0from
chcui/cp-2267-2362
Feb 14, 2026
Merged

[docs] Cherrypick 2267 2362#2382
ko3n1g merged 2 commits intor0.3.0from
chcui/cp-2267-2362

Conversation

@cuichenx
Copy link
Copy Markdown
Contributor

@cuichenx cuichenx commented Feb 14, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Documentation

    • Streamlined Qwen3-VL documentation with consolidated examples section.
    • Updated training example code to reflect API changes.
  • New Features

    • Added comprehensive Qwen3-VL example workflows including checkpoint conversion, inference, and finetuning (LoRA and full finetuning).
    • Added new README with setup and usage guidance for Qwen3-VL models.
  • Chores

    • Added WandB logging configuration notes to training example scripts.
    • Improved execution reliability of Ministral3 example scripts.

@cuichenx cuichenx added the docs-only With great power comes great responsibility. label Feb 14, 2026
@ko3n1g ko3n1g merged commit 31ca75b into r0.3.0 Feb 14, 2026
22 checks passed
@ko3n1g ko3n1g deleted the chcui/cp-2267-2362 branch February 14, 2026 01:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 14, 2026

📝 Walkthrough

Walkthrough

The PR consolidates Qwen3-VL documentation by redirecting detailed examples to an external README, updates training example code for API changes, adds WandB configuration guidance to multiple VLM example scripts, integrates the --no-sync flag into Ministral3 scripts, and introduces comprehensive new example scripts for Qwen3-VL model workflows.

Changes

Cohort / File(s) Summary
Documentation consolidation
docs/models/vlm/qwen3-vl.md
Replaces multi-section conversion, inference, and finetuning details with single "Examples" section pointing to external Qwen3-VL Examples README.
Training example updates
docs/training/multi-token-prediction.md
Updates imports, config structure (MTP parameters nested under cfg.model), removes logger line, converts f-string to plain string, and adds forward_step argument to pretrain() call.
WandB logging guidance
examples/models/vlm/gemma3_vl/peft.sh, examples/models/vlm/gemma3_vl/sft.sh, examples/models/vlm/glm_45v/slurm_peft.sh, examples/models/vlm/glm_45v/slurm_sft.sh
Adds commented instructions to set WANDB_API_KEY or disable wandb logging with WANDB_MODE=disabled export example.
Ministral3 uv run flag additions
examples/models/vlm/ministral3/conversion.sh, examples/models/vlm/ministral3/inference.sh, examples/models/vlm/ministral3/peft.sh, examples/models/vlm/ministral3/sft.sh
Adds --no-sync flag to all uv run invocations with explanatory comments about Transformers 5 requirement and virtual-environment conflict avoidance.
Qwen3-VL comprehensive examples
examples/models/vlm/qwen3_vl/README.md, examples/models/vlm/qwen3_vl/conversion.sh, examples/models/vlm/qwen3_vl/inference.sh, examples/models/vlm/qwen3_vl/peft.sh, examples/models/vlm/qwen3_vl/sft.sh
New example package with README covering setup, conversion, inference, and finetuning workflows; shell scripts orchestrate checkpoint conversion (HF ↔ Megatron), distributed inference across model variants, and LoRA/SFT training with configurable parallelism (TP/PP/EP).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

r0.3.0, cherry-pick

Suggested reviewers

  • kamran-nvidia
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title '[docs] Cherrypick 2267 2362' is vague and does not clearly communicate the main changes; it references internal PR numbers without explaining what was cherrypicked or why. Use a more descriptive title that explains the primary change, e.g., '[docs] Add Qwen 3 VL examples and enhance example scripts' or similar to clarify the actual content being merged.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into r0.3.0
Test Results For Major Changes ✅ Passed All changes are documentation and example script updates without core functionality modifications or breaking changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chcui/cp-2267-2362

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@examples/models/vlm/qwen3_vl/README.md`:
- Around line 24-27: Update the README code blocks that invoke scripts to use
"uv run python" instead of bare "python": replace the call for
convert_checkpoints.py (the block showing "python
examples/conversion/convert_checkpoints.py import ... --hf-model
Qwen/Qwen3-VL-8B-Instruct --megatron-path ..."), the export example mentioned
around the export command, and the inference example so they all invoke "uv run
python" consistently with the conversion.sh and inference.sh scripts; ensure the
exact script names (convert_checkpoints.py, the export script, and the inference
script shown) are changed in their respective fenced code blocks to use "uv run
python".
🧹 Nitpick comments (10)
examples/models/vlm/gemma3_vl/peft.sh (1)

1-1: Consider adding error handling setup.

Following the Google Shell Style Guide, consider adding set -euo pipefail after the copyright header to enable robust error handling. This causes the script to exit on errors, undefined variables, and pipe failures.

🛡️ Suggested addition for error handling
 # limitations under the License.
+
+set -euo pipefail
 
 # Workspace directory for checkpoints and results

As per coding guidelines: Follow Google Shell Style Guide.

examples/models/vlm/qwen3_vl/conversion.sh (2)

16-17: Add set -euo pipefail after the license header.

Per the Google Shell Style Guide, scripts should fail early on errors. Without this, a failed conversion step will silently continue to the next command, potentially operating on missing/corrupt checkpoints.

Also, quote ${WORKSPACE} expansions throughout to guard against paths with spaces.

Proposed fix
+set -euo pipefail
+
 # Workspace directory for checkpoints and results
 WORKSPACE=${WORKSPACE:-/workspace}

And for variable expansions (apply similarly to all occurrences):

-    --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct
+    --megatron-path "${WORKSPACE}/models/Qwen3-VL-8B-Instruct"

46-47: Missing trailing newline at end of file.

POSIX requires a trailing newline; some tools may warn or misbehave without one.

examples/models/vlm/qwen3_vl/README.md (1)

60-60: Add a language specifier to the fenced code block.

Static analysis flags this as MD040. Since this shows terminal output, use ```text or ```console.

examples/models/vlm/qwen3_vl/sft.sh (2)

16-17: Add set -euo pipefail after the license header.

Same as conversion.sh — without this, a failed training run won't stop the loop, and subsequent iterations will launch on a potentially unhealthy node. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.

Proposed fix
+set -euo pipefail
+
 # Workspace directory for checkpoints and results
 WORKSPACE=${WORKSPACE:-/workspace}

23-36: Dense and MoE configuration blocks are near-duplicates.

The two variable blocks (lines 23–36 and 68–81) are almost identical. Consider extracting shared defaults into a common block at the top and only overriding PRETRAINED_CHECKPOINT and MODEL_NAME per model. This would reduce maintenance burden if hyperparameters change.

Also applies to: 68-81

examples/models/vlm/qwen3_vl/inference.sh (2)

16-17: Add set -euo pipefail after the license header.

Consistent with the other new scripts — fail fast on errors. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.


69-70: Missing trailing newline at end of file.

examples/models/vlm/qwen3_vl/peft.sh (2)

16-17: Add set -euo pipefail after the license header.

Same recommendation as the other scripts. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.


104-104: Trailing extra space before the backslash.

Line 104 has a double space before \. Minor formatting nit.

-        checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_ep${EP}_tp${TP}_pp${PP}  \
+        checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_ep${EP}_tp${TP}_pp${PP} \

Comment on lines +24 to +27
python examples/conversion/convert_checkpoints.py import \
--hf-model Qwen/Qwen3-VL-8B-Instruct \
--megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

README code snippets use bare python but scripts use uv run python.

The actual shell scripts (conversion.sh, inference.sh) consistently use uv run python, but the README examples use bare python. This inconsistency will confuse users who copy-paste from the README. Update all code blocks to match:

-python examples/conversion/convert_checkpoints.py import \
+uv run python examples/conversion/convert_checkpoints.py import \

Apply the same change to the export example (line 31) and the inference example (line 42). As per coding guidelines, uv run should be used to execute scripts instead of calling python directly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python examples/conversion/convert_checkpoints.py import \
--hf-model Qwen/Qwen3-VL-8B-Instruct \
--megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct
```
uv run python examples/conversion/convert_checkpoints.py import \
--hf-model Qwen/Qwen3-VL-8B-Instruct \
--megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct
🤖 Prompt for AI Agents
In `@examples/models/vlm/qwen3_vl/README.md` around lines 24 - 27, Update the
README code blocks that invoke scripts to use "uv run python" instead of bare
"python": replace the call for convert_checkpoints.py (the block showing "python
examples/conversion/convert_checkpoints.py import ... --hf-model
Qwen/Qwen3-VL-8B-Instruct --megatron-path ..."), the export example mentioned
around the export command, and the inference example so they all invoke "uv run
python" consistently with the conversion.sh and inference.sh scripts; ensure the
exact script names (convert_checkpoints.py, the export script, and the inference
script shown) are changed in their respective fenced code blocks to use "uv run
python".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants