Conversation
📝 WalkthroughWalkthroughThe PR consolidates Qwen3-VL documentation by redirecting detailed examples to an external README, updates training example code for API changes, adds WandB configuration guidance to multiple VLM example scripts, integrates the --no-sync flag into Ministral3 scripts, and introduces comprehensive new example scripts for Qwen3-VL model workflows. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@examples/models/vlm/qwen3_vl/README.md`:
- Around line 24-27: Update the README code blocks that invoke scripts to use
"uv run python" instead of bare "python": replace the call for
convert_checkpoints.py (the block showing "python
examples/conversion/convert_checkpoints.py import ... --hf-model
Qwen/Qwen3-VL-8B-Instruct --megatron-path ..."), the export example mentioned
around the export command, and the inference example so they all invoke "uv run
python" consistently with the conversion.sh and inference.sh scripts; ensure the
exact script names (convert_checkpoints.py, the export script, and the inference
script shown) are changed in their respective fenced code blocks to use "uv run
python".
🧹 Nitpick comments (10)
examples/models/vlm/gemma3_vl/peft.sh (1)
1-1: Consider adding error handling setup.Following the Google Shell Style Guide, consider adding
set -euo pipefailafter the copyright header to enable robust error handling. This causes the script to exit on errors, undefined variables, and pipe failures.🛡️ Suggested addition for error handling
# limitations under the License. + +set -euo pipefail # Workspace directory for checkpoints and resultsAs per coding guidelines: Follow Google Shell Style Guide.
examples/models/vlm/qwen3_vl/conversion.sh (2)
16-17: Addset -euo pipefailafter the license header.Per the Google Shell Style Guide, scripts should fail early on errors. Without this, a failed conversion step will silently continue to the next command, potentially operating on missing/corrupt checkpoints.
Also, quote
${WORKSPACE}expansions throughout to guard against paths with spaces.Proposed fix
+set -euo pipefail + # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace}And for variable expansions (apply similarly to all occurrences):
- --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct + --megatron-path "${WORKSPACE}/models/Qwen3-VL-8B-Instruct"
46-47: Missing trailing newline at end of file.POSIX requires a trailing newline; some tools may warn or misbehave without one.
examples/models/vlm/qwen3_vl/README.md (1)
60-60: Add a language specifier to the fenced code block.Static analysis flags this as MD040. Since this shows terminal output, use
```textor```console.examples/models/vlm/qwen3_vl/sft.sh (2)
16-17: Addset -euo pipefailafter the license header.Same as conversion.sh — without this, a failed training run won't stop the loop, and subsequent iterations will launch on a potentially unhealthy node. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.
Proposed fix
+set -euo pipefail + # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace}
23-36: Dense and MoE configuration blocks are near-duplicates.The two variable blocks (lines 23–36 and 68–81) are almost identical. Consider extracting shared defaults into a common block at the top and only overriding
PRETRAINED_CHECKPOINTandMODEL_NAMEper model. This would reduce maintenance burden if hyperparameters change.Also applies to: 68-81
examples/models/vlm/qwen3_vl/inference.sh (2)
16-17: Addset -euo pipefailafter the license header.Consistent with the other new scripts — fail fast on errors. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.
69-70: Missing trailing newline at end of file.examples/models/vlm/qwen3_vl/peft.sh (2)
16-17: Addset -euo pipefailafter the license header.Same recommendation as the other scripts. As per coding guidelines, shell scripts should follow the Google Shell Style Guide.
104-104: Trailing extra space before the backslash.Line 104 has a double space before
\. Minor formatting nit.- checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_ep${EP}_tp${TP}_pp${PP} \ + checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_ep${EP}_tp${TP}_pp${PP} \
| python examples/conversion/convert_checkpoints.py import \ | ||
| --hf-model Qwen/Qwen3-VL-8B-Instruct \ | ||
| --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct | ||
| ``` |
There was a problem hiding this comment.
README code snippets use bare python but scripts use uv run python.
The actual shell scripts (conversion.sh, inference.sh) consistently use uv run python, but the README examples use bare python. This inconsistency will confuse users who copy-paste from the README. Update all code blocks to match:
-python examples/conversion/convert_checkpoints.py import \
+uv run python examples/conversion/convert_checkpoints.py import \Apply the same change to the export example (line 31) and the inference example (line 42). As per coding guidelines, uv run should be used to execute scripts instead of calling python directly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| python examples/conversion/convert_checkpoints.py import \ | |
| --hf-model Qwen/Qwen3-VL-8B-Instruct \ | |
| --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct | |
| ``` | |
| uv run python examples/conversion/convert_checkpoints.py import \ | |
| --hf-model Qwen/Qwen3-VL-8B-Instruct \ | |
| --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct |
🤖 Prompt for AI Agents
In `@examples/models/vlm/qwen3_vl/README.md` around lines 24 - 27, Update the
README code blocks that invoke scripts to use "uv run python" instead of bare
"python": replace the call for convert_checkpoints.py (the block showing "python
examples/conversion/convert_checkpoints.py import ... --hf-model
Qwen/Qwen3-VL-8B-Instruct --megatron-path ..."), the export example mentioned
around the export command, and the inference example so they all invoke "uv run
python" consistently with the conversion.sh and inference.sh scripts; ensure the
exact script names (convert_checkpoints.py, the export script, and the inference
script shown) are changed in their respective fenced code blocks to use "uv run
python".
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
Documentation
New Features
Chores