NVIDIA-NeMo · yaoyu-33 · Feb 5, 2026 · Feb 5, 2026
diff --git a/docs/models/llm/gemma3.md b/docs/models/llm/gemma3.md
@@ -180,7 +180,7 @@ torchrun --nproc-per-node=8 run/run_recipe.py \
 - Gemma 3 1B: https://huggingface.co/google/gemma-3-1b-it
 
 ## Related Docs
-- Gemma3 Vision-Language Models: [Gemma 3 VL](../vlm/gemma3-vl.md)
+- Gemma3 Vision-Language Models: [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md)
 - Recipe usage: [Recipe usage](../../recipe-usage.md)
 - Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
 - Training entry points: [Entry points](../../training/entry-points.md)
diff --git a/docs/models/vlm/gemma3-vl.md b/docs/models/vlm/gemma3-vl.md
@@ -44,163 +44,9 @@ Gemma 3 VL builds on the Gemma 3 architecture with additional multimodal capabil
 - **Multimodal Integration**: Seamless integration of visual and textual information through learned projection layers
 - **Flexible Image Handling**: Supports variable resolution images and multiple images per conversation
 
-## Conversion with 🤗 Hugging Face
-
-### Import HF → Megatron
-To import the HF VL model to your desired Megatron path:
-```bash
-python examples/conversion/convert_checkpoints.py import \
---hf-model google/gemma-3-4b-it \
---megatron-path /models/gemma-3-4b-it
-```
-
-### Export Megatron → HF
-```bash
-python examples/conversion/convert_checkpoints.py export \
---hf-model google/gemma-3-4b-it \
---megatron-path /results/gemma3_vl_4b/checkpoints/iter_00001000 \
---hf-path ./gemma3-vl-hf-export
-```
-
-### Run Inference on Converted Checkpoint
-
-```bash
-python examples/conversion/hf_to_megatron_generate_vlm.py \
---hf_model_path google/gemma-3-4b-it \
---megatron_model_path /models/gemma-3-4b-it \
---image_path <example image path> \
---prompt "Describe this image." \
---max_new_tokens 100
-```
-
-Note:
-- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
-- You can also use image URLs: `--image_path="https://example.com/image.jpg"`
-
-## Finetune Recipes
-
-- See: [bridge.recipes.gemma3_vl](../../apidocs/bridge/bridge.recipes.gemma3_vl.md)
-- Available recipes:
-  - `gemma3_vl_4b_finetune_config`: Finetuning for 4B VL model with PEFT support
-  - `gemma3_vl_12b_finetune_config`: Finetuning for 12B VL model with PEFT support
-  - `gemma3_vl_27b_finetune_config`: Finetuning for 27B VL model with PEFT support
-
-Before training, ensure the following environment variables are set:
-1. `SAVE_DIR`: checkpoint and log saving directory
-2. `HF_TOKEN`: to download models from HF Hub (if required)
-3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
-4. `WANDB_API_KEY`: (optional) to enable WandB logging
-
-### Full Finetuning
-
-```bash
-torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
---pretrained-checkpoint /models/gemma-3-4b-it \
---recipe gemma3_vl_4b_finetune_config \
---dataset-type hf \
-dataset.maker_name=make_cord_v2_dataset \
-train.global_batch_size=64 \
-train.train_iters=1000 \
-checkpoint.save=$SAVE_DIR/gemma3_vl_4b_finetune
-```
-
-Or programmatically:
-```python
-from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config
-
-# Full finetuning
-config = gemma3_vl_4b_finetune_config(
-    name="gemma3_vl_4b_full_finetune",
-    pretrained_checkpoint="/models/gemma-3-4b-it",
-    dataset_type="hf",
-    peft=None,
-    train_iters=1000,
-    global_batch_size=64,
-)
-```
-
-### Parameter-Efficient Finetuning (PEFT) with LoRA
-
-```bash
-torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
---pretrained-checkpoint /models/gemma-3-4b-it \
---recipe gemma3_vl_4b_finetune_config \
---peft_scheme lora \
---dataset-type hf \
-dataset.maker_name=make_cord_v2_dataset \
-train.global_batch_size=128 \
-checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora
-```
-
-PEFT options:
-- `--peft_scheme`: Set to `lora` for LoRA or `dora` for DoRA. Omit for full finetuning.
-
-You can also combine PEFT with freeze options:
-- `model.freeze_language_model=True`: Freeze the language model
-- `model.freeze_vision_model=True`: Freeze the vision encoder
-- `model.freeze_vision_projection=True`: Freeze the vision projection layer
-
-Example with freeze options:
-```bash
-torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
---pretrained-checkpoint /models/gemma-3-4b-it \
---recipe gemma3_vl_4b_finetune_config \
---peft_scheme lora \
-model.freeze_language_model=True \
-model.freeze_vision_model=False \
-checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora_vision
-```
-
-Programmatic configuration:
-```python
-from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config
-
-# LoRA finetuning
-config = gemma3_vl_4b_finetune_config(
-    name="gemma3_vl_4b_lora_finetune",
-    pretrained_checkpoint="/models/gemma-3-4b-it",
-    dataset_type="hf",
-    peft="lora",  # or "dora"
-    train_iters=1000,
-    global_batch_size=128,
-)
-
-# LoRA with vision model frozen
-config = gemma3_vl_4b_finetune_config(
-    name="gemma3_vl_4b_lora_language_only",
-    pretrained_checkpoint="/models/gemma-3-4b-it",
-    peft="lora",
-    freeze_vision_model=True,
-    freeze_vision_projection=True,
-)
-```
-
-### Recommended Configurations
-
-| Model | Mode | TP | PP | Global Batch Size | Learning Rate | Hardware |
-|-------|------|----|----|-------------------|---------------|----------|
-| Gemma 3 VL 4B | Full SFT | 1 | 1 | 32-64 | 5e-6 | 8 GPUs |
-| Gemma 3 VL 4B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
-| Gemma 3 VL 12B | Full SFT | 4 | 1 | 32-64 | 5e-6 | 8 GPUs |
-| Gemma 3 VL 12B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
-| Gemma 3 VL 27B | Full SFT | 8 | 2 | 16-32 | 5e-6 | 16 GPUs |
-| Gemma 3 VL 27B | LoRA/DoRA | 4 | 1 | 32-64 | 1e-4 | 16 GPUs |
-
-**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs.
-
-## Example Datasets
-
-| Dataset | Maker Name | Description |
-|---------|------------|-------------|
-| [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) | `make_cord_v2_dataset` | OCR receipts: Single-image-text dataset for receipt understanding |
-| [MedPix-VQA](https://huggingface.co/datasets/mmoukouba/MedPix-VQA) | `make_medpix_dataset` | Medical VQA: Single-image Q&A for clinical images |
-| [The Cauldron (Raven subset)](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | `make_raven_dataset` | Visual reasoning: Multi-image analogical reasoning |
-
-To change the dataset, specify `dataset.maker_name=<maker_name>` in your command.
-
 ## Examples
-- Checkpoint import/export: [examples/conversion/convert_checkpoints.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/convert_checkpoints.py)
-- Generate with VLM (HF→Megatron): [examples/conversion/hf_to_megatron_generate_vlm.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/hf_to_megatron_generate_vlm.py)
+
+For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [Gemma 3 VL Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md).
 
 ## Hugging Face Model Cards
 
@@ -213,4 +59,3 @@ To change the dataset, specify `dataset.maker_name=<maker_name>` in your command
 - Recipe usage: [Recipe usage](../../recipe-usage.md)
 - Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
 - Training entry points: [Entry points](../../training/entry-points.md)
-
diff --git a/docs/models/vlm/glm-45v.md b/docs/models/vlm/glm-45v.md
@@ -19,7 +19,7 @@ Please update `transformers` version to 4.57.1 or higher in order to use the GLM
   - 128 MoE experts with shared experts
   - ~12B active parameters per token
   - Sequence length: 131,072 tokens
-  - Recommended: 4 nodes, 32 GPUs (LoRA/DoRA) or 16 nodes, 128 GPUs (Full SFT)
+  - Recommended: 32 nodes, 256 GPUs (LoRA/DoRA) or 64 nodes, 512 GPUs (Full SFT)
 
 ## Model Architecture Features
 
@@ -39,134 +39,9 @@ GLM-4.5V combines efficient sparse MoE language modeling with multimodal capabil
 - **Image and Video Support**: Handles both static images and video inputs
 - **Flexible Image Handling**: Supports variable resolution images and multiple images per conversation
 
-## Conversion with 🤗 Hugging Face
-
-### Import HF → Megatron
-To import the HF VL model to your desired Megatron path:
-```bash
-python examples/conversion/convert_checkpoints.py import \
---hf-model zai-org/GLM-4.5V \
---megatron-path /models/glm-45v
-```
-
-### Export Megatron → HF
-```bash
-python examples/conversion/convert_checkpoints.py export \
---hf-model zai-org/GLM-4.5V \
---megatron-path /results/glm_45v/checkpoints/iter_0001000 \
---hf-path ./glm-45v-hf-export
-```
-
-### Run Inference on Converted Checkpoint
-
-```bash
-python examples/conversion/hf_to_megatron_generate_vlm.py \
---hf_model_path zai-org/GLM-4.5V \
---megatron_model_path /models/glm-45v \
---image_path <example image path> \
---prompt "Describe this image." \
---max_new_tokens 100
-```
-
-Note:
-- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
-- You can also use image URLs: `--image_path="https://example.com/image.jpg"`
-
-## Finetune Recipes
-
-- See: [bridge.recipes.glm_vl](../../apidocs/bridge/bridge.recipes.glm_vl.md)
-- Available recipes:
-  - `glm_45v_finetune_config`: Finetuning for GLM-4.5V model with PEFT support
-
-Before training, ensure the following environment variables are set:
-1. `SAVE_DIR`: checkpoint and log saving directory
-2. `HF_TOKEN`: to download models from HF Hub (if required)
-3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
-4. `WANDB_API_KEY`: (optional) to enable WandB logging
-
-### Full Finetuning
-
-```python
-from megatron.bridge.recipes.glm_vl import glm_45v_finetune_config
-
-# Full finetuning
-config = glm_45v_finetune_config(
-    name="glm_45v_full_finetune",
-    pretrained_checkpoint="/models/glm-45v",
-    dataset_type="hf",
-    peft=None,
-    train_iters=1000,
-    global_batch_size=32,
-)
-```
-
-### Parameter-Efficient Finetuning (PEFT) with LoRA
-
-```python
-config = glm_45v_finetune_config(
-    name="glm_45v_full_finetune",
-    pretrained_checkpoint="/models/glm-45v",
-    dataset_type="hf",
-    peft='lora',
-    train_iters=1000,
-    global_batch_size=32,
-)
-```
-
-PEFT options:
-- `--peft-scheme`: Set to `lora` for LoRA or `dora` for DoRA. Omit for full finetuning.
-
-You can also combine PEFT with freeze options:
-- `--freeze-language-model`: Freeze the language model
-- `--freeze-vision-model`: Freeze the vision encoder
-- `--freeze-vision-projection`: Freeze the vision projection layer
-
-Example with freeze options:
-```python
-from megatron.bridge.recipes.glm_vl import glm_45v_finetune_config
-
-# LoRA finetuning
-config = glm_45v_finetune_config(
-    name="glm_45v_lora_finetune",
-    pretrained_checkpoint="/models/glm-45v",
-    dataset_type="hf",
-    peft="lora",  # or "dora"
-    train_iters=1000,
-    global_batch_size=64,
-)
-
-# LoRA with vision model frozen
-config = glm_45v_finetune_config(
-    name="glm_45v_lora_language_only",
-    pretrained_checkpoint="/models/glm-45v",
-    peft="lora",
-    freeze_vision_model=True,
-    freeze_vision_projection=True,
-)
-```
-
-### Recommended Configurations
-
-| Model | Mode | TP | PP | EP | Global Batch Size | Learning Rate | Hardware |
-|-------|------|----|----|-----|-------------------|---------------|----------|
-| GLM-4.5V | Full SFT | 1 | 8 | 16 | 16-32 | 5e-6 | 128 GPUs (16 nodes) |
-| GLM-4.5V | LoRA/DoRA | 1 | 8 | 4 | 32-64 | 1e-4 | 32 GPUs (4 nodes) |
-
-**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs. The sparse MoE architecture requires Expert Parallelism (EP) for efficient training.
-
-## Example Datasets
-
-| Dataset | Maker Name | Description |
-|---------|------------|-------------|
-| [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) | `make_cord_v2_dataset` | OCR receipts: Single-image-text dataset for receipt understanding |
-| [MedPix-VQA](https://huggingface.co/datasets/mmoukouba/MedPix-VQA) | `make_medpix_dataset` | Medical VQA: Single-image Q&A for clinical images |
-| [The Cauldron (Raven subset)](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | `make_raven_dataset` | Visual reasoning: Multi-image analogical reasoning |
-
-To change the dataset, specify `dataset.maker_name=<maker_name>` in your command.
-
 ## Examples
-- Checkpoint import/export: [examples/conversion/convert_checkpoints.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/convert_checkpoints.py)
-- Generate with VLM (HF→Megatron): [examples/conversion/hf_to_megatron_generate_vlm.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/hf_to_megatron_generate_vlm.py)
+
+For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [GLM-4.5V Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/glm_45v/README.md).
 
 ## Hugging Face Model Cards