[doc, model] feat: Add GLM-4.5V VL examples and update Gemma 3 VL docs#2151
[doc, model] feat: Add GLM-4.5V VL examples and update Gemma 3 VL docs#2151
Conversation
… distillation, decentralized_pg Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Add complete GLM-4.5V VLM example folder: - README.md: Model documentation, architecture details, usage instructions - slurm_sft.sh: Slurm job script for full SFT (16 nodes, 128 GPUs, TP=1/PP=2/EP=16) - slurm_peft.sh: Slurm job script for LoRA (4 nodes, 32 GPUs, TP=1/PP=2/EP=4) - conversion.sh: Checkpoint conversion scripts (HF <-> Megatron) - inference.sh: Inference examples Update Gemma 3 VL example scripts: - README.md and peft.sh updates GLM-4.5V is a 106B parameter MoE vision-language model based on GLM-4.5 Air.
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Update PP values from PP=2 to PP=8 for both SFT and PEFT configurations to match the actual slurm scripts.
📝 WalkthroughWalkthroughThis pull request restructures script and documentation references, moving examples from the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
examples/models/vlm/qwen_vl/data/convert_to_qwenvl_wds.py (1)
20-24:⚠️ Potential issue | 🟡 MinorUse
uv runin the example command. The usage block currently invokes the script with barepython.Suggested fix
- python examples/models/vlm/qwen_vl/data/convert_to_qwenvl_wds.py \ + uv run python examples/models/vlm/qwen_vl/data/convert_to_qwenvl_wds.py \As per coding guidelines, "{/*.sh,examples//*.py}: Use 'uv run' to execute scripts instead of activating a virtual environment and calling 'python' directly".
docs/models/vlm/nemotron-nano-v2-vl.md (1)
129-140:⚠️ Potential issue | 🟡 MinorFix the LoRA flag typo in the PEFT command.
—-lora-on-vision-modeluses an em dash; the CLI expects--lora-on-vision-model, so this command will fail as written.💡 Suggested fix
---—-lora-on-vision-model \ +--lora-on-vision-model \
🤖 Fix all issues with AI agents
In `@docs/models/vlm/index.md`:
- Around line 5-10: Update the toctree entry string
'../../../../examples/models/vlm/gemma3_vl/README.md' to
'../../../examples/models/vlm/gemma3_vl/README.md' in the toctree block so the
relative path matches the one used in README.md and no longer resolves outside
the repository; locate the toctree block containing the existing
'../../../../examples/models/vlm/gemma3_vl/README.md' entry and change that path
to '../../../examples/models/vlm/gemma3_vl/README.md'.
In `@docs/models/vlm/README.md`:
- Around line 9-12: The relative link
"../../../examples/models/vlm/gemma3_vl/README.md" in docs/models/vlm/README.md
will break on the published docs site; update the table row for "Gemma 3 VL"
(and similarly check "Nemotron Nano V2 VL") to use a stable target: either
replace the relative path with the absolute GitHub URL to the examples repo
README or create a docs-local wrapper page (e.g., a small Markdown file inside
docs/models/vlm/ that redirects/links to the example) and point the table link
to that wrapper; ensure the link text and the table entry for the
Model/Documentation remain unchanged.
In `@examples/models/vlm/gemma3_vl/README.md`:
- Around line 173-175: Replace the bare URLs for the Gemma model cards with
inline Markdown links to satisfy MD034; for each line like "Gemma 3 VL 4B:
https://huggingface.co/google/gemma-3-4b-it" change it to use an inline link
format (e.g., "Gemma 3 VL 4B: [Gemma 3 VL
4B](https://huggingface.co/google/gemma-3-4b-it)"), and do the same for the
"Gemma 3 VL 12B" and "Gemma 3 VL 27B" entries so the model labels are linked
rather than presenting bare URLs.
- Around line 156-163: The markdown table's separator row is misaligned with the
header spacing and fails MD060; update the separator row under the header "|
Model | Mode | TP | PP | Global Batch Size | Learning Rate | Hardware |" so each
column divider and surrounding spaces match the header exactly (e.g., use
"|-------|------|----|----|-------------------|---------------|----------|" with
spacing consistent for each column), ensuring the separator columns align with
the header text for the table containing "Gemma 3 VL 4B/12B/27B" entries.
In `@examples/models/vlm/glm_45v/inference.sh`:
- Around line 19-21: The header comment currently specifies "TP=1, PP=4, EP=2"
but the later inference invocation blocks use "PP=2, EP=4", causing confusion;
either make the header match the later blocks or update the script to
consistently use one configuration and/or add a clarifying comment explaining
both valid PP/EP permutations for different inference variants. Locate the
TP/PP/EP flag mentions (search for "TP=", "PP=", "EP=" and the string "TP=1,
PP=4, EP=2" and the blocks using "PP=2, EP=4") and then (a) update the header
comment to list both variants and when to use each, or (b) change the later
blocks to match the header so all PP/EP values are consistent across the script.
In `@examples/models/vlm/glm_45v/README.md`:
- Around line 170-172: Replace the bare URL in the README entry "GLM-4.5V:
https://huggingface.co/zai-org/GLM-4.5V" with a Markdown link (e.g., GLM-4.5V:
[GLM-4.5V](https://huggingface.co/zai-org/GLM-4.5V)) so the URL is not displayed
raw and the file satisfies MD034; update the line containing "GLM-4.5V"
accordingly.
- Around line 164-165: Typo in the LoRA/DoRA note: replace the incorrect
fragment "allowing fo fewer GPUs" with the correct phrase "allowing for fewer
GPUs" in the README sentence that reads "**Note:** LoRA/DoRA significantly
reduces memory requirements, allowing fo fewer GPUs. Expert parallelism (EP) is
essential for efficient training of this MoE model." to fix the spelling error.
- Around line 123-126: Make the wording consistent by updating the heading or
the sentence so both use the same form; e.g., change the heading "Pretrain" to
"Pretraining" (or alter the sentence "Pretraining is not verified for this
model." to "Pretrain is not verified for this model.") so the heading text
"Pretrain" and the following sentence use the identical term.
- Around line 92-109: The fenced code block under the "Expected output:" section
in examples/models/vlm/glm_45v/README.md is missing a language tag; update the
opening triple-backtick for that block to include the language "text" (i.e.,
change ``` to ```text) so the block passes the MD040 lint rule and renders
correctly.
In `@examples/models/vlm/glm_45v/slurm_peft.sh`:
- Around line 23-26: Add a pre-submit guard at the top of slurm_peft.sh that
creates the logs/ directory when the script is run locally and then submits
itself with sbatch; specifically, detect absence of SLURM_JOB_ID (e.g., if [ -z
"$SLURM_JOB_ID" ]; then mkdir -p logs && sbatch "$0" && exit; fi), so the logs/
directory exists before Slurm opens --output/--error files; place this near the
top before any `#SBATCH` directives or the submission path logic and ensure it
won’t run inside the allocated Slurm job.
- Around line 19-22: Header guidance ("Recommended: TP=1, PP=2, EP=4") conflicts
with the runtime config (PP=8). Update either the header or the configuration so
they match: either change the header recommendation to "TP=1, PP=8, EP=4" to
reflect the script's PP=8 setting, or change the configured value PP=8 to PP=2
to match the header; ensure the related mentions (the same header text and the
occurrence of PP in the script) are updated consistently.
In `@examples/models/vlm/glm_45v/slurm_sft.sh`:
- Around line 23-26: Add a pre-submit step to ensure the logs/ directory exists
to prevent Slurm from failing when it opens --output/--error files; update the
top of slurm_sft.sh to run a safe command like mkdir -p logs (or an equivalent
check) before any SBATCH directives or before submission, and apply the same
precaution for the other output/error paths referenced around the
CONTAINER_IMAGE usage and lines 36-37 so all log paths are created prior to
Slurm opening them.
- Around line 19-21: Update the mismatch between the header guidance and the
actual parallelism configuration by either changing the header comment or the
variables so they match; specifically reconcile the PP value referenced in the
header ("PP=2") with the PP variable used in the script (PP=8) and ensure TP and
EP recommendations (TP=1, EP=16) in the header match the TP and EP values set in
the script (refer to the PP, TP, and EP variable assignments in slurm_sft.sh);
pick one source of truth (header or variables), make them consistent, and update
any other related header lines (63-65) to reflect the chosen configuration.
In `@src/megatron/bridge/recipes/__init__.py`:
- Around line 21-26: Static analysis is failing due to star imports from glm and
glm_vl in megatron.bridge.recipes.__init__ (star imports from deepseek, gemma,
gemma3_vl, glm, glm_vl, gpt); fix by either adding "# noqa: F401,F403" to the
glm and glm_vl import lines to silence Ruff/Flake8, or replace the star imports
with explicit exported symbols and an __all__ list (e.g., import the specific
functions/classes from megatron.bridge.recipes.glm and
megatron.bridge.recipes.glm_vl and add them to __all__) so the lint tools no
longer report F401/F403.
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ko3n1g <16716991+ko3n1g@users.noreply.github.com>
…)" This reverts commit bfbc759.
Signed-off-by: Chen Cui <chcui@nvidia.com>
…P8-CS (#2175) Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
- Split docs into model introduction (docs/) and examples (examples/) - docs/models/vlm/: Model overview and architecture details - examples/models/vlm/: Training scripts, conversion, and step-by-step guides - Update GLM-4.5V pipeline layout for better vision encoder balance - Update hardware requirements: GLM-4.5V SFT 64 nodes, LoRA 32 nodes - Add multi-node uv cache setup instructions - Update recommended configurations with actual script values
Signed-off-by: Ao Tang <aot@nvidia.com>
|
/ok to test b5030db |
|
/ok to test bb011ea |
Summary
This PR adds GLM-4.5V Vision-Language model examples and updates Gemma 3 VL documentation.
Changes
New: GLM-4.5V VL Examples
Updated: Gemma 3 VL Examples
Summary by CodeRabbit
Release Notes
New Features
Documentation
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.