diff --git a/docs/models/llm/gemma3.md b/docs/models/llm/gemma3.md
index 60e726b5c9..0d623920a4 100644
--- a/docs/models/llm/gemma3.md
+++ b/docs/models/llm/gemma3.md
@@ -180,7 +180,7 @@ torchrun --nproc-per-node=8 run/run_recipe.py \
 - Gemma 3 1B: https://huggingface.co/google/gemma-3-1b-it
 
 ## Related Docs
-- Gemma3 Vision-Language Models: [Gemma 3 VL](../vlm/gemma3-vl.md)
+- Gemma3 Vision-Language Models: [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md)
 - Recipe usage: [Recipe usage](../../recipe-usage.md)
 - Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
 - Training entry points: [Entry points](../../training/entry-points.md)
diff --git a/docs/models/vlm/gemma3-vl.md b/docs/models/vlm/gemma3-vl.md
index 4a04f59c08..d38488fd86 100644
--- a/docs/models/vlm/gemma3-vl.md
+++ b/docs/models/vlm/gemma3-vl.md
@@ -44,163 +44,9 @@ Gemma 3 VL builds on the Gemma 3 architecture with additional multimodal capabil
 - **Multimodal Integration**: Seamless integration of visual and textual information through learned projection layers
 - **Flexible Image Handling**: Supports variable resolution images and multiple images per conversation
 
-## Conversion with 🤗 Hugging Face
-
-### Import HF → Megatron
-To import the HF VL model to your desired Megatron path:
-```bash
-python examples/conversion/convert_checkpoints.py import \
---hf-model google/gemma-3-4b-it \
---megatron-path /models/gemma-3-4b-it
-```
-
-### Export Megatron → HF
-```bash
-python examples/conversion/convert_checkpoints.py export \
---hf-model google/gemma-3-4b-it \
---megatron-path /results/gemma3_vl_4b/checkpoints/iter_00001000 \
---hf-path ./gemma3-vl-hf-export
-```
-
-### Run Inference on Converted Checkpoint
-
-```bash
-python examples/conversion/hf_to_megatron_generate_vlm.py \
---hf_model_path google/gemma-3-4b-it \
---megatron_model_path /models/gemma-3-4b-it \
---image_path <example image path> \
---prompt "Describe this image." \
---max_new_tokens 100
-```
-
-Note:
-- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
-- You can also use image URLs: `--image_path="https://example.com/image.jpg"`
-
-## Finetune Recipes
-
-- See: [bridge.recipes.gemma3_vl](../../apidocs/bridge/bridge.recipes.gemma3_vl.md)
-- Available recipes:
-  - `gemma3_vl_4b_finetune_config`: Finetuning for 4B VL model with PEFT support
-  - `gemma3_vl_12b_finetune_config`: Finetuning for 12B VL model with PEFT support
-  - `gemma3_vl_27b_finetune_config`: Finetuning for 27B VL model with PEFT support
-
-Before training, ensure the following environment variables are set:
-1. `SAVE_DIR`: checkpoint and log saving directory
-2. `HF_TOKEN`: to download models from HF Hub (if required)
-3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
-4. `WANDB_API_KEY`: (optional) to enable WandB logging
-
-### Full Finetuning
-
-```bash
-torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
---pretrained-checkpoint /models/gemma-3-4b-it \
---recipe gemma3_vl_4b_finetune_config \
---dataset-type hf \
-dataset.maker_name=make_cord_v2_dataset \
-train.global_batch_size=64 \
-train.train_iters=1000 \
-checkpoint.save=$SAVE_DIR/gemma3_vl_4b_finetune
-```
-
-Or programmatically:
-```python
-from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config
-
-# Full finetuning
-config = gemma3_vl_4b_finetune_config(
-    name="gemma3_vl_4b_full_finetune",
-    pretrained_checkpoint="/models/gemma-3-4b-it",
-    dataset_type="hf",
-    peft=None,
-    train_iters=1000,
-    global_batch_size=64,
-)
-```
-
-### Parameter-Efficient Finetuning (PEFT) with LoRA
-
-```bash
-torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
---pretrained-checkpoint /models/gemma-3-4b-it \
---recipe gemma3_vl_4b_finetune_config \
---peft_scheme lora \
---dataset-type hf \
-dataset.maker_name=make_cord_v2_dataset \
-train.global_batch_size=128 \
-checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora
-```
-
-PEFT options:
-- `--peft_scheme`: Set to `lora` for LoRA or `dora` for DoRA. Omit for full finetuning.
-
-You can also combine PEFT with freeze options:
-- `model.freeze_language_model=True`: Freeze the language model
-- `model.freeze_vision_model=True`: Freeze the vision encoder
-- `model.freeze_vision_projection=True`: Freeze the vision projection layer
-
-Example with freeze options:
-```bash
-torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
---pretrained-checkpoint /models/gemma-3-4b-it \
---recipe gemma3_vl_4b_finetune_config \
---peft_scheme lora \
-model.freeze_language_model=True \
-model.freeze_vision_model=False \
-checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora_vision
-```
-
-Programmatic configuration:
-```python
-from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config
-
-# LoRA finetuning
-config = gemma3_vl_4b_finetune_config(
-    name="gemma3_vl_4b_lora_finetune",
-    pretrained_checkpoint="/models/gemma-3-4b-it",
-    dataset_type="hf",
-    peft="lora",  # or "dora"
-    train_iters=1000,
-    global_batch_size=128,
-)
-
-# LoRA with vision model frozen
-config = gemma3_vl_4b_finetune_config(
-    name="gemma3_vl_4b_lora_language_only",
-    pretrained_checkpoint="/models/gemma-3-4b-it",
-    peft="lora",
-    freeze_vision_model=True,
-    freeze_vision_projection=True,
-)
-```
-
-### Recommended Configurations
-
-| Model | Mode | TP | PP | Global Batch Size | Learning Rate | Hardware |
-|-------|------|----|----|-------------------|---------------|----------|
-| Gemma 3 VL 4B | Full SFT | 1 | 1 | 32-64 | 5e-6 | 8 GPUs |
-| Gemma 3 VL 4B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
-| Gemma 3 VL 12B | Full SFT | 4 | 1 | 32-64 | 5e-6 | 8 GPUs |
-| Gemma 3 VL 12B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
-| Gemma 3 VL 27B | Full SFT | 8 | 2 | 16-32 | 5e-6 | 16 GPUs |
-| Gemma 3 VL 27B | LoRA/DoRA | 4 | 1 | 32-64 | 1e-4 | 16 GPUs |
-
-**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs.
-
-## Example Datasets
-
-| Dataset | Maker Name | Description |
-|---------|------------|-------------|
-| [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) | `make_cord_v2_dataset` | OCR receipts: Single-image-text dataset for receipt understanding |
-| [MedPix-VQA](https://huggingface.co/datasets/mmoukouba/MedPix-VQA) | `make_medpix_dataset` | Medical VQA: Single-image Q&A for clinical images |
-| [The Cauldron (Raven subset)](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | `make_raven_dataset` | Visual reasoning: Multi-image analogical reasoning |
-
-To change the dataset, specify `dataset.maker_name=<maker_name>` in your command.
-
 ## Examples
-- Checkpoint import/export: [examples/conversion/convert_checkpoints.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/convert_checkpoints.py)
-- Generate with VLM (HF→Megatron): [examples/conversion/hf_to_megatron_generate_vlm.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/hf_to_megatron_generate_vlm.py)
+
+For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [Gemma 3 VL Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md).
 
 ## Hugging Face Model Cards
 
@@ -213,4 +59,3 @@ To change the dataset, specify `dataset.maker_name=<maker_name>` in your command
 - Recipe usage: [Recipe usage](../../recipe-usage.md)
 - Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
 - Training entry points: [Entry points](../../training/entry-points.md)
-
diff --git a/docs/models/vlm/glm-45v.md b/docs/models/vlm/glm-45v.md
index 5bf400870d..8879862739 100644
--- a/docs/models/vlm/glm-45v.md
+++ b/docs/models/vlm/glm-45v.md
@@ -19,7 +19,7 @@ Please update `transformers` version to 4.57.1 or higher in order to use the GLM
   - 128 MoE experts with shared experts
   - ~12B active parameters per token
   - Sequence length: 131,072 tokens
-  - Recommended: 4 nodes, 32 GPUs (LoRA/DoRA) or 16 nodes, 128 GPUs (Full SFT)
+  - Recommended: 32 nodes, 256 GPUs (LoRA/DoRA) or 64 nodes, 512 GPUs (Full SFT)
 
 ## Model Architecture Features
 
@@ -39,134 +39,9 @@ GLM-4.5V combines efficient sparse MoE language modeling with multimodal capabil
 - **Image and Video Support**: Handles both static images and video inputs
 - **Flexible Image Handling**: Supports variable resolution images and multiple images per conversation
 
-## Conversion with 🤗 Hugging Face
-
-### Import HF → Megatron
-To import the HF VL model to your desired Megatron path:
-```bash
-python examples/conversion/convert_checkpoints.py import \
---hf-model zai-org/GLM-4.5V \
---megatron-path /models/glm-45v
-```
-
-### Export Megatron → HF
-```bash
-python examples/conversion/convert_checkpoints.py export \
---hf-model zai-org/GLM-4.5V \
---megatron-path /results/glm_45v/checkpoints/iter_0001000 \
---hf-path ./glm-45v-hf-export
-```
-
-### Run Inference on Converted Checkpoint
-
-```bash
-python examples/conversion/hf_to_megatron_generate_vlm.py \
---hf_model_path zai-org/GLM-4.5V \
---megatron_model_path /models/glm-45v \
---image_path <example image path> \
---prompt "Describe this image." \
---max_new_tokens 100
-```
-
-Note:
-- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
-- You can also use image URLs: `--image_path="https://example.com/image.jpg"`
-
-## Finetune Recipes
-
-- See: [bridge.recipes.glm_vl](../../apidocs/bridge/bridge.recipes.glm_vl.md)
-- Available recipes:
-  - `glm_45v_finetune_config`: Finetuning for GLM-4.5V model with PEFT support
-
-Before training, ensure the following environment variables are set:
-1. `SAVE_DIR`: checkpoint and log saving directory
-2. `HF_TOKEN`: to download models from HF Hub (if required)
-3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
-4. `WANDB_API_KEY`: (optional) to enable WandB logging
-
-### Full Finetuning
-
-```python
-from megatron.bridge.recipes.glm_vl import glm_45v_finetune_config
-
-# Full finetuning
-config = glm_45v_finetune_config(
-    name="glm_45v_full_finetune",
-    pretrained_checkpoint="/models/glm-45v",
-    dataset_type="hf",
-    peft=None,
-    train_iters=1000,
-    global_batch_size=32,
-)
-```
-
-### Parameter-Efficient Finetuning (PEFT) with LoRA
-
-```python
-config = glm_45v_finetune_config(
-    name="glm_45v_full_finetune",
-    pretrained_checkpoint="/models/glm-45v",
-    dataset_type="hf",
-    peft='lora',
-    train_iters=1000,
-    global_batch_size=32,
-)
-```
-
-PEFT options:
-- `--peft-scheme`: Set to `lora` for LoRA or `dora` for DoRA. Omit for full finetuning.
-
-You can also combine PEFT with freeze options:
-- `--freeze-language-model`: Freeze the language model
-- `--freeze-vision-model`: Freeze the vision encoder
-- `--freeze-vision-projection`: Freeze the vision projection layer
-
-Example with freeze options:
-```python
-from megatron.bridge.recipes.glm_vl import glm_45v_finetune_config
-
-# LoRA finetuning
-config = glm_45v_finetune_config(
-    name="glm_45v_lora_finetune",
-    pretrained_checkpoint="/models/glm-45v",
-    dataset_type="hf",
-    peft="lora",  # or "dora"
-    train_iters=1000,
-    global_batch_size=64,
-)
-
-# LoRA with vision model frozen
-config = glm_45v_finetune_config(
-    name="glm_45v_lora_language_only",
-    pretrained_checkpoint="/models/glm-45v",
-    peft="lora",
-    freeze_vision_model=True,
-    freeze_vision_projection=True,
-)
-```
-
-### Recommended Configurations
-
-| Model | Mode | TP | PP | EP | Global Batch Size | Learning Rate | Hardware |
-|-------|------|----|----|-----|-------------------|---------------|----------|
-| GLM-4.5V | Full SFT | 1 | 8 | 16 | 16-32 | 5e-6 | 128 GPUs (16 nodes) |
-| GLM-4.5V | LoRA/DoRA | 1 | 8 | 4 | 32-64 | 1e-4 | 32 GPUs (4 nodes) |
-
-**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs. The sparse MoE architecture requires Expert Parallelism (EP) for efficient training.
-
-## Example Datasets
-
-| Dataset | Maker Name | Description |
-|---------|------------|-------------|
-| [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) | `make_cord_v2_dataset` | OCR receipts: Single-image-text dataset for receipt understanding |
-| [MedPix-VQA](https://huggingface.co/datasets/mmoukouba/MedPix-VQA) | `make_medpix_dataset` | Medical VQA: Single-image Q&A for clinical images |
-| [The Cauldron (Raven subset)](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | `make_raven_dataset` | Visual reasoning: Multi-image analogical reasoning |
-
-To change the dataset, specify `dataset.maker_name=<maker_name>` in your command.
-
 ## Examples
-- Checkpoint import/export: [examples/conversion/convert_checkpoints.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/convert_checkpoints.py)
-- Generate with VLM (HF→Megatron): [examples/conversion/hf_to_megatron_generate_vlm.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/hf_to_megatron_generate_vlm.py)
+
+For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [GLM-4.5V Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/glm_45v/README.md).
 
 ## Hugging Face Model Cards
 
diff --git a/examples/models/vlm/gemma3_vl/README.md b/examples/models/vlm/gemma3_vl/README.md
index 285688e4c6..b389ecbd1d 100644
--- a/examples/models/vlm/gemma3_vl/README.md
+++ b/examples/models/vlm/gemma3_vl/README.md
@@ -1,6 +1,8 @@
-# Gemma 3 VL - Vision Language Model
+# Gemma 3 VL Examples
 
-This directory contains examples for Gemma 3 Vision Language Model, including checkpoint conversion, inference, and fine-tuning.
+This directory contains example scripts for Gemma 3 VL vision-language models.
+
+For model introduction and architecture details, see the [Gemma 3 VL documentation](../../../../docs/models/vlm/gemma3-vl.md).
 
 ## Workspace Configuration
 
@@ -16,15 +18,43 @@ Directory structure:
 
 ## Checkpoint Conversion
 
-See the [conversion.sh](conversion.sh) script for commands to:
-- Import Hugging Face checkpoints to Megatron format
-- Export Megatron checkpoints back to Hugging Face format
-- Run multi-GPU round-trip validation between formats
+### Import HF → Megatron
+To import the HF VL model to your desired Megatron path:
+```bash
+python examples/conversion/convert_checkpoints.py import \
+--hf-model google/gemma-3-4b-it \
+--megatron-path /models/gemma-3-4b-it
+```
 
+### Export Megatron → HF
+```bash
+python examples/conversion/convert_checkpoints.py export \
+--hf-model google/gemma-3-4b-it \
+--megatron-path /results/gemma3_vl_4b/checkpoints/iter_00001000 \
+--hf-path ./gemma3-vl-hf-export
+```
+
+See the [conversion.sh](conversion.sh) script for more examples including:
+- Multi-GPU round-trip validation between formats
 
 ## Inference
 
-**See the [inference.sh](inference.sh) script for commands to:
+### Run Inference on Converted Checkpoint
+
+```bash
+python examples/conversion/hf_to_megatron_generate_vlm.py \
+--hf_model_path google/gemma-3-4b-it \
+--megatron_model_path /models/gemma-3-4b-it \
+--image_path <example image path> \
+--prompt "Describe this image." \
+--max_new_tokens 100
+```
+
+Note:
+- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
+- You can also use image URLs: `--image_path="https://example.com/image.jpg"`
+
+See the [inference.sh](inference.sh) script for commands to:
 - Run inference with Hugging Face checkpoints
 - Run inference with imported Megatron checkpoints
 - Run inference with exported Hugging Face checkpoints
@@ -51,22 +81,49 @@ The image is a table comparing the technical specifications of two
 =======================================
 ```
 
-## Pretrain
+## Finetune Recipes
+
+- See: [bridge.recipes.gemma3_vl](../../../../docs/apidocs/bridge/bridge.recipes.gemma3_vl.md)
+- Available recipes:
+  - `gemma3_vl_4b_finetune_config`: Finetuning for 4B VL model with PEFT support
+  - `gemma3_vl_12b_finetune_config`: Finetuning for 12B VL model with PEFT support
+  - `gemma3_vl_27b_finetune_config`: Finetuning for 27B VL model with PEFT support
+
+Before training, ensure the following environment variables are set:
+1. `SAVE_DIR`: checkpoint and log saving directory
+2. `HF_TOKEN`: to download models from HF Hub (if required)
+3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
+4. `WANDB_API_KEY`: (optional) to enable WandB logging
+
+### Pretrain
 
 Pretraining is not verified for this model.
 
-## Supervised Fine-Tuning (SFT)
+### Supervised Fine-Tuning (SFT)
 
 See the [sft.sh](sft.sh) script for full parameter fine-tuning with configurable model parallelisms.
 
-[W&B Report](TODO)
+W&B report coming soon.
 
-## Parameter-Efficient Fine-Tuning (PEFT)
+### Parameter-Efficient Fine-Tuning (PEFT) with LoRA
 
 See the [peft.sh](peft.sh) script for LoRA fine-tuning with configurable tensor and pipeline parallelism.
 
-[W&B Report](TODO)
+W&B report coming soon.
+
+### Recommended Configurations
+
+| Model | Mode | TP | PP | Global Batch Size | Learning Rate | Hardware |
+|-------|------|----|----|-------------------|---------------|----------|
+| Gemma 3 VL 4B | Full SFT | 2 | 1 | 32 | 5e-5 | 8 GPUs |
+| Gemma 3 VL 4B | LoRA/DoRA | 2 | 1 | 32 | 2e-4 | 8 GPUs |
+| Gemma 3 VL 12B | Full SFT | 4 | 1 | 32 | 5e-5 | 8 GPUs |
+| Gemma 3 VL 12B | LoRA/DoRA | 2 | 1 | 32 | 2e-4 | 8 GPUs |
+| Gemma 3 VL 27B | Full SFT | 8 | 2 | 32 | 5e-5 | 16 GPUs |
+| Gemma 3 VL 27B | LoRA/DoRA | 4 | 1 | 32 | 2e-4 | 8 GPUs |
+
+**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs.
 
 ## Evaluation
 
-TBD
\ No newline at end of file
+Coming soon.
diff --git a/examples/models/vlm/glm_45v/README.md b/examples/models/vlm/glm_45v/README.md
new file mode 100644
index 0000000000..615550ac4b
--- /dev/null
+++ b/examples/models/vlm/glm_45v/README.md
@@ -0,0 +1,177 @@
+# GLM-4.5V Examples
+
+This directory contains example scripts for GLM-4.5V vision-language model.
+
+For model introduction and architecture details, see the [GLM-4.5V documentation](../../../../docs/models/vlm/glm-45v.md).
+
+## Workspace Configuration
+
+All scripts use a `WORKSPACE` environment variable to define the base directory for checkpoints and results. By default, this is set to `/workspace`. You can override it:
+
+```bash
+export WORKSPACE=/your/custom/path
+```
+
+Directory structure:
+- `${WORKSPACE}/models/` - Converted checkpoints
+- `${WORKSPACE}/results/` - Training outputs and experiment results
+
+## Checkpoint Conversion
+
+### Import HF → Megatron
+To import the HF VL model to your desired Megatron path:
+```bash
+python examples/conversion/convert_checkpoints.py import \
+--hf-model zai-org/GLM-4.5V \
+--megatron-path /models/GLM-4.5V
+```
+
+### Export Megatron → HF
+```bash
+python examples/conversion/convert_checkpoints.py export \
+--hf-model zai-org/GLM-4.5V \
+--megatron-path /results/glm_45v/checkpoints/iter_00001000 \
+--hf-path ./glm-45v-hf-export
+```
+
+See the [conversion.sh](conversion.sh) script for more examples including:
+- Multi-GPU round-trip validation between formats
+
+## Inference
+
+### Run Inference on Converted Checkpoint
+
+```bash
+python examples/conversion/hf_to_megatron_generate_vlm.py \
+--hf_model_path zai-org/GLM-4.5V \
+--megatron_model_path /models/GLM-4.5V \
+--image_path <example image path> \
+--prompt "Describe this image." \
+--max_new_tokens 100 \
+--trust_remote_code
+```
+
+Note:
+- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
+- You can also use image URLs: `--image_path="https://example.com/image.jpg"`
+- GLM-4.5V requires `--trust_remote_code` flag
+
+See the [inference.sh](inference.sh) script for commands to:
+- Run inference with Hugging Face checkpoints
+- Run inference with imported Megatron checkpoints
+- Run inference with exported Hugging Face checkpoints
+
+**Expected output:**
+```text
+...
+Generation step 46
+Generation step 47
+Generation step 48
+Generation step 49
+======== GENERATED TEXT OUTPUT ========
+Image: https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png
+Prompt: Describe this image.
+Generated: [gMASK]<sop><|user|>
+<|begin_of_image|><|image|>...<|end_of_image|>Describe this image.<|assistant|>
+<think>The image shows a technical specifications table comparing two NVIDIA GPU models: H100 SXM and H100 NVL. The table is organized with rows representing different technical specifications and columns for each GPU model.
+
+Here's a breakdown of the information presented:
+
+=======================================
+```
+
+## Finetune Recipes
+
+- See: [bridge.recipes.glm_vl](../../../../docs/apidocs/bridge/bridge.recipes.glm_vl.md)
+- Available recipes:
+  - `glm_45v_finetune_config`: Finetuning for GLM-4.5V model with PEFT support
+
+Before training, ensure the following environment variables are set:
+1. `SAVE_DIR`: checkpoint and log saving directory
+2. `HF_TOKEN`: to download models from HF Hub (if required)
+3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
+4. `WANDB_API_KEY`: (optional) to enable WandB logging
+
+### Pretraining
+
+Pretraining is not verified for this model.
+
+### Supervised Fine-Tuning (SFT)
+
+Full parameter fine-tuning requires 64 nodes (512 GPUs) with TP=1, PP=8, EP=16.
+
+**Usage:**
+```bash
+# 1. Edit slurm_sft.sh to configure:
+#    - #SBATCH directives (partition, account, etc.)
+#    - CONTAINER_IMAGE path
+
+# 2. Submit the job:
+sbatch slurm_sft.sh
+```
+
+See [slurm_sft.sh](slurm_sft.sh) for the full Slurm job script.
+
+W&B report coming soon.
+
+### Parameter-Efficient Fine-Tuning (PEFT) with LoRA
+
+LoRA fine-tuning requires 8 nodes (64 GPUs) with TP=1, PP=8, EP=4.
+
+**Usage:**
+```bash
+# 1. Edit slurm_peft.sh to configure:
+#    - #SBATCH directives (partition, account, etc.)
+#    - CONTAINER_IMAGE path
+
+# 2. Submit the job:
+sbatch slurm_peft.sh
+```
+
+See [slurm_peft.sh](slurm_peft.sh) for the full Slurm job script.
+
+W&B report coming soon.
+
+
+**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for fewer GPUs. Expert parallelism (EP) is essential for efficient training of this MoE model.
+
+### Recommended Configurations
+
+| Model | Mode | TP | PP | EP | Global Batch Size | Learning Rate | Hardware |
+|-------|------|----|----|-----|-------------------|---------------|----------|
+| GLM-4.5V | Full SFT | 1 | 8 | 16 | 32 | 5e-6 | 512 GPUs (64 nodes) |
+| GLM-4.5V | LoRA/DoRA | 1 | 8 | 4 | 32 | 1e-4 | 64 GPUs (8 nodes) |
+
+### Multi-Node Setup with Local Repository
+
+If you are mounting a local Megatron Bridge repository, you must pre-sync the uv cache to avoid race conditions when multiple nodes attempt to sync simultaneously. Follow these steps:
+
+1. **Start a container with your mounts and run `uv sync`:**
+   ```bash
+   # Start an interactive container with the same mounts you'll use in Slurm
+   srun --nodes=1 --ntasks=1 --container-image=/path/to/container.sqsh \
+        --container-mounts=/path/to/Megatron-Bridge:/opt/Megatron-Bridge,/shared/uv_cache:/shared/uv_cache \
+        --pty bash
+   
+   # Inside the container, pre-sync to the shared cache
+   cd /opt/Megatron-Bridge
+   UV_CACHE_DIR=/shared/uv_cache uv sync
+   ```
+
+2. **Update the Slurm script with UV_CACHE_DIR and mounts:**
+   ```bash
+   # In slurm_sft.sh or slurm_peft.sh, set:
+   export UV_CACHE_DIR="/shared/uv_cache"
+   
+   # And configure container mounts:
+   CONTAINER_MOUNTS="/path/to/Megatron-Bridge:/opt/Megatron-Bridge,/shared/uv_cache:/shared/uv_cache"
+   ```
+
+3. **Submit the job:**
+   ```bash
+   sbatch slurm_sft.sh   # or slurm_peft.sh
+   ```
+
+## Evaluation
+
+Coming soon.
diff --git a/examples/models/vlm/glm_45v/conversion.sh b/examples/models/vlm/glm_45v/conversion.sh
new file mode 100755
index 0000000000..3947834215
--- /dev/null
+++ b/examples/models/vlm/glm_45v/conversion.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Workspace directory for checkpoints and results
+WORKSPACE=${WORKSPACE:-/workspace}
+
+# Import HF → Megatron
+uv run python examples/conversion/convert_checkpoints.py import \
+    --hf-model zai-org/GLM-4.5V \
+    --megatron-path ${WORKSPACE}/models/GLM-4.5V
+
+# Export Megatron → HF
+uv run python examples/conversion/convert_checkpoints.py export \
+    --hf-model zai-org/GLM-4.5V \
+    --megatron-path ${WORKSPACE}/models/GLM-4.5V/iter_0000000 \
+    --hf-path ${WORKSPACE}/models/GLM-4.5V-hf-export
+
+# Round-trip validation
+# Note: GLM-4.5V is a large MoE model, adjust parallelism as needed
+uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
+      --hf-model-id zai-org/GLM-4.5V --tp 1 --pp 2 --ep 4 --trust-remote-code
diff --git a/examples/models/vlm/glm_45v/inference.sh b/examples/models/vlm/glm_45v/inference.sh
new file mode 100755
index 0000000000..497c18134a
--- /dev/null
+++ b/examples/models/vlm/glm_45v/inference.sh
@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Workspace directory for checkpoints and results
+WORKSPACE=${WORKSPACE:-/workspace}
+
+# GLM-4.5V is a large MoE model (106B parameters)
+# Using TP=1, PP=4, EP=2 for inference (8 GPUs minimum)
+
+# Inference with Hugging Face checkpoints
+uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \
+    --hf_model_path zai-org/GLM-4.5V \
+    --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
+    --prompt "Describe this image." \
+    --max_new_tokens 50 \
+    --tp 1 \
+    --pp 4 \
+    --ep 2 \
+    --trust_remote_code
+
+# Inference with imported Megatron checkpoints
+uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \
+    --hf_model_path zai-org/GLM-4.5V \
+    --megatron_model_path ${WORKSPACE}/models/GLM-4.5V/iter_0000000 \
+    --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
+    --prompt "Describe this image." \
+    --max_new_tokens 50 \
+    --tp 1 \
+    --pp 2 \
+    --ep 4 \
+    --trust_remote_code
+
+# Inference with exported HF checkpoints
+uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \
+    --hf_model_path ${WORKSPACE}/models/GLM-4.5V-hf-export \
+    --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
+    --prompt "Describe this image." \
+    --max_new_tokens 50 \
+    --tp 1 \
+    --pp 2 \
+    --ep 4 \
+    --trust_remote_code
diff --git a/examples/models/vlm/glm_45v/slurm_peft.sh b/examples/models/vlm/glm_45v/slurm_peft.sh
new file mode 100755
index 0000000000..017da6a74c
--- /dev/null
+++ b/examples/models/vlm/glm_45v/slurm_peft.sh
@@ -0,0 +1,166 @@
+#!/bin/bash
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# ==============================================================================
+# GLM-4.5V Parameter-Efficient Fine-Tuning (PEFT) with LoRA
+#
+# GLM-4.5V is a large MoE model (106B parameters)
+# LoRA/DoRA significantly reduces memory requirements
+# Recommended: TP=1, PP=8, EP=4 for LoRA (64 GPUs, 8 nodes)
+#
+# Usage:
+#   1. Modify the #SBATCH directives below for your cluster
+#   2. Set CONTAINER_IMAGE to your container path
+#   3. Submit: sbatch slurm_peft.sh
+# ==============================================================================
+
+#SBATCH --job-name=glm45v-lora
+#SBATCH --nodes=8
+#SBATCH --ntasks-per-node=8
+#SBATCH --gpus-per-node=8
+#SBATCH --time=08:00:00
+#SBATCH --partition=gpu
+#SBATCH --account=my_account
+#SBATCH --output=logs/glm45v_lora_%j.out
+#SBATCH --error=logs/glm45v_lora_%j.err
+#SBATCH --exclusive
+
+# ==============================================================================
+# CONFIGURATION
+# ==============================================================================
+
+# Workspace directory for checkpoints and results
+WORKSPACE=${WORKSPACE:-/workspace}
+
+# Model and training configurations
+PRETRAINED_CHECKPOINT=${WORKSPACE}/models/GLM-4.5V
+MODEL_NAME=glm_45v
+DATASET_NAME=cord_v2
+SEQ_LENGTH=8192
+TRAIN_ITERS=50
+GLOBAL_BATCH_SIZE=32
+MICRO_BATCH_SIZE=1
+EVAL_ITERS=10
+LR=0.0001
+MIN_LR=0.00001
+LR_WARMUP_ITERS=10
+LOG_INTERVAL=1
+WANDB_PROJECT=megatron-bridge-${DATASET_NAME}
+
+# Parallelism configuration
+TP=1
+PP=8
+EP=4
+
+# Container image (required)
+CONTAINER_IMAGE=""
+# CONTAINER_IMAGE="/path/to/container.sqsh"
+
+# Container mounts (optional, space-separated)
+CONTAINER_MOUNTS=""
+# CONTAINER_MOUNTS="/data:/data /workspace:/workspace"
+
+# ==============================================================================
+# Environment Setup
+# ==============================================================================
+
+# NCCL optimizations for large-scale training
+export TORCH_NCCL_AVOID_RECORD_STREAMS=1
+export NCCL_NVLS_ENABLE=0
+
+# UV cache on shared filesystem (recommended for multi-node setups)
+# Pre-sync once before submitting jobs: UV_CACHE_DIR=/path/to/cache uv sync
+# export UV_CACHE_DIR="/path/to/shared/uv_cache"
+
+# HuggingFace cache directory (recommended for shared filesystem)
+# export HF_HOME="/path/to/shared/HF_HOME"
+
+# Authentication tokens (set these for your environment)
+# export HF_TOKEN="hf_your_token_here"
+# export WANDB_API_KEY="your_wandb_key_here"
+
+# ==============================================================================
+# Job Execution
+# ==============================================================================
+
+echo "======================================"
+echo "GLM-4.5V LoRA Fine-Tuning Job"
+echo "======================================"
+echo "Job ID: $SLURM_JOB_ID"
+echo "Nodes: $SLURM_JOB_NUM_NODES"
+echo "GPUs per node: $SLURM_GPUS_PER_NODE"
+echo "Model: $MODEL_NAME"
+echo "Parallelism: TP=$TP, PP=$PP, EP=$EP"
+echo "PEFT: LoRA"
+echo "======================================"
+
+# Create logs directory if it doesn't exist
+mkdir -p logs
+
+# Build CLI overrides
+CLI_OVERRIDES="
+    checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \
+    model.seq_length=$SEQ_LENGTH \
+    train.train_iters=$TRAIN_ITERS \
+    train.global_batch_size=$GLOBAL_BATCH_SIZE \
+    train.micro_batch_size=$MICRO_BATCH_SIZE \
+    train.eval_iters=$EVAL_ITERS \
+    optimizer.lr=$LR \
+    optimizer.min_lr=$MIN_LR \
+    scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \
+    checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_tp${TP}_pp${PP}_ep${EP} \
+    logger.log_interval=$LOG_INTERVAL \
+    logger.wandb_project=$WANDB_PROJECT \
+    logger.wandb_exp_name=${MODEL_NAME}_${DATASET_NAME}_lora_tp${TP}_pp${PP}_ep${EP} \
+    dataset.maker_name=make_${DATASET_NAME}_dataset \
+    dataset.seq_length=$SEQ_LENGTH \
+    model.tensor_model_parallel_size=$TP \
+    model.pipeline_model_parallel_size=$PP \
+    model.expert_model_parallel_size=$EP
+"
+
+# Build command
+# Only local rank 0 on each node runs uv sync, then all ranks run with --no-sync
+CMD="if [ \"\$SLURM_LOCALID\" -eq 0 ]; then uv sync; else sleep 2; fi && "
+CMD="$CMD uv run --no-sync python scripts/training/run_recipe.py"
+CMD="$CMD --recipe ${MODEL_NAME}_finetune_config"
+CMD="$CMD --step_func vlm_step"
+CMD="$CMD --peft_scheme lora"
+CMD="$CMD $CLI_OVERRIDES"
+
+echo "Executing command..."
+echo "======================================"
+
+# Require container image
+if [ -z "$CONTAINER_IMAGE" ]; then
+    echo "ERROR: CONTAINER_IMAGE must be set. Please specify a valid container image."
+    exit 1
+fi
+
+# Build srun command
+SRUN_CMD="srun --mpi=pmix --container-image=$CONTAINER_IMAGE"
+
+# Add container mounts
+if [ -n "$CONTAINER_MOUNTS" ]; then
+    for mount in $CONTAINER_MOUNTS; do
+        SRUN_CMD="$SRUN_CMD --container-mounts=$mount"
+    done
+fi
+
+$SRUN_CMD bash -c "$CMD"
+
+echo "======================================"
+echo "Job completed"
+echo "======================================"
diff --git a/examples/models/vlm/glm_45v/slurm_sft.sh b/examples/models/vlm/glm_45v/slurm_sft.sh
new file mode 100644
index 0000000000..f23dee3c43
--- /dev/null
+++ b/examples/models/vlm/glm_45v/slurm_sft.sh
@@ -0,0 +1,164 @@
+#!/bin/bash
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# ==============================================================================
+# GLM-4.5V Full Supervised Fine-Tuning (SFT)
+#
+# GLM-4.5V is a large MoE model (106B parameters)
+# Recommended: TP=1, PP=8, EP=16 for full SFT (512 GPUs, 64 nodes)
+# For smaller setups, use LoRA/DoRA instead (see slurm_peft.sh)
+#
+# Usage:
+#   1. Modify the #SBATCH directives below for your cluster
+#   2. Set CONTAINER_IMAGE to your container path
+#   3. Submit: sbatch slurm_sft.sh
+# ==============================================================================
+
+#SBATCH --job-name=glm45v-sft
+#SBATCH --nodes=64
+#SBATCH --ntasks-per-node=8
+#SBATCH --gpus-per-node=8
+#SBATCH --time=24:00:00
+#SBATCH --partition=gpu
+#SBATCH --account=my_account
+#SBATCH --output=logs/glm45v_sft_%j.out
+#SBATCH --error=logs/glm45v_sft_%j.err
+#SBATCH --exclusive
+
+# ==============================================================================
+# CONFIGURATION
+# ==============================================================================
+
+# Workspace directory for checkpoints and results
+WORKSPACE=${WORKSPACE:-/workspace}
+
+# Model and training configurations
+PRETRAINED_CHECKPOINT=${WORKSPACE}/models/GLM-4.5V
+MODEL_NAME=glm_45v
+DATASET_NAME=cord_v2
+SEQ_LENGTH=8192
+TRAIN_ITERS=50
+GLOBAL_BATCH_SIZE=64
+MICRO_BATCH_SIZE=1
+EVAL_ITERS=10
+LR=0.000005
+MIN_LR=0.0000005
+LR_WARMUP_ITERS=10
+LOG_INTERVAL=1
+WANDB_PROJECT=megatron-bridge-${DATASET_NAME}
+
+# Parallelism configuration
+TP=1
+PP=8
+EP=16
+
+# Container image (required)
+CONTAINER_IMAGE=""
+# CONTAINER_IMAGE="/path/to/container.sqsh"
+
+# Container mounts (optional, space-separated)
+CONTAINER_MOUNTS=""
+# CONTAINER_MOUNTS="/data:/data /workspace:/workspace"
+
+# ==============================================================================
+# Environment Setup
+# ==============================================================================
+
+# NCCL optimizations for large-scale training
+export TORCH_NCCL_AVOID_RECORD_STREAMS=1
+export NCCL_NVLS_ENABLE=0
+
+# UV cache on shared filesystem (recommended for multi-node setups)
+# Pre-sync once before submitting jobs: UV_CACHE_DIR=/path/to/cache uv sync
+# export UV_CACHE_DIR="/path/to/shared/uv_cache"
+
+# HuggingFace cache directory (recommended for shared filesystem)
+# export HF_HOME="/path/to/shared/HF_HOME"
+
+# Authentication tokens (set these for your environment)
+# export HF_TOKEN="hf_your_token_here"
+# export WANDB_API_KEY="your_wandb_key_here"
+
+# ==============================================================================
+# Job Execution
+# ==============================================================================
+
+echo "======================================"
+echo "GLM-4.5V Full SFT Training Job"
+echo "======================================"
+echo "Job ID: $SLURM_JOB_ID"
+echo "Nodes: $SLURM_JOB_NUM_NODES"
+echo "GPUs per node: $SLURM_GPUS_PER_NODE"
+echo "Model: $MODEL_NAME"
+echo "Parallelism: TP=$TP, PP=$PP, EP=$EP"
+echo "======================================"
+
+# Create logs directory if it doesn't exist
+mkdir -p logs
+
+# Build CLI overrides
+CLI_OVERRIDES="
+    checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \
+    model.seq_length=$SEQ_LENGTH \
+    train.train_iters=$TRAIN_ITERS \
+    train.global_batch_size=$GLOBAL_BATCH_SIZE \
+    train.micro_batch_size=$MICRO_BATCH_SIZE \
+    train.eval_iters=$EVAL_ITERS \
+    optimizer.lr=$LR \
+    optimizer.min_lr=$MIN_LR \
+    scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \
+    checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_sft_tp${TP}_pp${PP}_ep${EP} \
+    logger.log_interval=$LOG_INTERVAL \
+    logger.wandb_project=$WANDB_PROJECT \
+    logger.wandb_exp_name=${MODEL_NAME}_${DATASET_NAME}_sft_tp${TP}_pp${PP}_ep${EP} \
+    dataset.maker_name=make_${DATASET_NAME}_dataset \
+    dataset.seq_length=$SEQ_LENGTH \
+    model.tensor_model_parallel_size=$TP \
+    model.pipeline_model_parallel_size=$PP \
+    model.expert_model_parallel_size=$EP
+"
+
+# Build command
+# Only local rank 0 on each node runs uv sync, then all ranks run with --no-sync
+CMD="if [ \"\$SLURM_LOCALID\" -eq 0 ]; then uv sync; else sleep 2; fi && "
+CMD="$CMD uv run --no-sync python scripts/training/run_recipe.py"
+CMD="$CMD --recipe ${MODEL_NAME}_finetune_config"
+CMD="$CMD --step_func vlm_step"
+CMD="$CMD $CLI_OVERRIDES"
+
+echo "Executing command..."
+echo "======================================"
+
+# Require container image
+if [ -z "$CONTAINER_IMAGE" ]; then
+    echo "ERROR: CONTAINER_IMAGE must be set. Please specify a valid container image."
+    exit 1
+fi
+
+# Build srun command
+SRUN_CMD="srun --mpi=pmix --container-image=$CONTAINER_IMAGE"
+
+# Add container mounts
+if [ -n "$CONTAINER_MOUNTS" ]; then
+    for mount in $CONTAINER_MOUNTS; do
+        SRUN_CMD="$SRUN_CMD --container-mounts=$mount"
+    done
+fi
+
+$SRUN_CMD bash -c "$CMD"
+
+echo "======================================"
+echo "Job completed"
+echo "======================================"
diff --git a/src/megatron/bridge/recipes/__init__.py b/src/megatron/bridge/recipes/__init__.py
index 10618ae372..746f988560 100644
--- a/src/megatron/bridge/recipes/__init__.py
+++ b/src/megatron/bridge/recipes/__init__.py
@@ -21,6 +21,8 @@
 from megatron.bridge.recipes.deepseek import *
 from megatron.bridge.recipes.gemma import *
 from megatron.bridge.recipes.gemma3_vl import *
+from megatron.bridge.recipes.glm import *
+from megatron.bridge.recipes.glm_vl import *
 from megatron.bridge.recipes.gpt import *
 from megatron.bridge.recipes.gpt_oss import *
 from megatron.bridge.recipes.llama import *
diff --git a/src/megatron/bridge/recipes/glm_vl/glm_45v.py b/src/megatron/bridge/recipes/glm_vl/glm_45v.py
index 69bb78261c..63a8cb2bf7 100644
--- a/src/megatron/bridge/recipes/glm_vl/glm_45v.py
+++ b/src/megatron/bridge/recipes/glm_vl/glm_45v.py
@@ -44,7 +44,7 @@
 
 
 def set_glm_45v_pipeline_model_parallel_layout(
-    model_cfg: GPTModelProvider, layout: Optional[Union[str, List[List[str]]]] = None
+    model_cfg: GPTModelProvider, layout: Optional[Union[str, List[List[str]]]] = None, is_peft: bool = False
 ) -> None:
     """Set the GLM-4.5V pipeline model parallel layout.
 
@@ -54,6 +54,7 @@ def set_glm_45v_pipeline_model_parallel_layout(
     Args:
         model_cfg: The model provider configuration to modify.
         layout: Optional custom layout. If None, uses predefined layouts based on PP/VP sizes.
+        is_peft: Whether the model is trained with PEFT.
     """
     # GLM-4.5V has no MTP layers
     last_layer = ["loss"]
@@ -61,14 +62,31 @@ def set_glm_45v_pipeline_model_parallel_layout(
     vp_size = model_cfg.virtual_pipeline_model_parallel_size or 1
 
     # GLM-4.5 Air has 46 decoder layers
+    # GLM-4.5 Vision Encoder is huge, we need to balance the first stage with the least number of layers
     # Layout maps for common PP/VP combinations
-    layout_map = {
-        (1, 1): None,
-        (2, 1): [["embedding"] + ["decoder"] * 23, ["decoder"] * 23 + last_layer],
-        (4, 1): [["embedding"] + ["decoder"] * 11, ["decoder"] * 12, ["decoder"] * 12, ["decoder"] * 11 + last_layer],
-        (8, 1): [["embedding"] + ["decoder"] * 5] + [["decoder"] * 6] * 6 + [["decoder"] * 5 + last_layer],
-        (16, 1): [["embedding"] + ["decoder"] * 2] + [["decoder"] * 3] * 14 + [["decoder"] * 2 + last_layer],
-    }
+    # We use different layouts for PEFT and full SFT.
+    if is_peft:
+        layout_map = {
+            (4, 1): [
+                ["embedding"] + ["decoder"] * 11,
+                ["decoder"] * 12,
+                ["decoder"] * 12,
+                ["decoder"] * 11 + last_layer,
+            ],
+            (8, 1): [["embedding"] + ["decoder"] * 5] + [["decoder"] * 6] * 6 + [["decoder"] * 5 + last_layer],
+            (16, 1): [["embedding"] + ["decoder"] * 2] + [["decoder"] * 3] * 14 + [["decoder"] * 2 + last_layer],
+        }
+    else:
+        layout_map = {
+            (4, 1): [
+                ["embedding"] + ["decoder"] * 11,
+                ["decoder"] * 12,
+                ["decoder"] * 12,
+                ["decoder"] * 11 + last_layer,
+            ],
+            (8, 1): [["embedding"] + ["decoder"]] + [["decoder"] * 7] * 6 + [["decoder"] * 3 + last_layer],
+            (16, 1): [["embedding"]] + [["decoder"] * 3] * 14 + [["decoder"] * 3 + last_layer],
+        }
 
     if layout is not None:
         model_cfg.pipeline_model_parallel_layout = layout
@@ -133,9 +151,9 @@ class GLM45VCommonKwargs(TypedDict, total=False):
 def glm_45v_finetune_config(**user_kwargs: Unpack[GLM45VCommonKwargs]) -> ConfigContainer:
     """Return a fine-tuning config for GLM-4.5V (based on GLM-4.5 Air 106B).
 
-    Default configuration: 4 nodes, 32 GPUs total
-    - LoRA/DoRA: TP=1, PP=8, EP=4 (32 GPUs, 4 nodes), LR=1e-4
-    - Full SFT: TP=1, PP=8, EP=16 (128 GPUs, 16 nodes), LR=5e-6
+    Default configuration:
+    - LoRA/DoRA: TP=1, PP=8, EP=4 (64 GPUs, 8 nodes), LR=1e-4
+    - Full SFT: TP=1, PP=8, EP=16 (512 GPUs, 64 nodes), LR=5e-6
 
     GLM-4.5V is a Vision-Language model with:
     - 106B total parameters (based on GLM-4.5 Air)
@@ -151,9 +169,10 @@ def glm_45v_finetune_config(**user_kwargs: Unpack[GLM45VCommonKwargs]) -> Config
     recommended_kwargs: GLM45VCommonKwargs = {
         "hf_path": "zai-org/GLM-4.5V",
         "tensor_model_parallel_size": 1,
-        "pipeline_model_parallel_size": 4,
+        "pipeline_model_parallel_size": 8,
         "pipeline_dtype": torch.bfloat16,
-        "expert_model_parallel_size": 16 if is_full_sft else 2,
+        "expert_model_parallel_size": 16 if is_full_sft else 4,
+        "global_batch_size": 64 if is_full_sft else 32,
         "peft": peft_value,
         "finetune_lr": 5e-6 if is_full_sft else 1e-4,
     }
@@ -186,7 +205,7 @@ def _glm_45v_common(
     train_iters: int = 300000,
     global_batch_size: int = 32,
     micro_batch_size: int = 1,
-    seq_length: int = 4096,
+    seq_length: int = 8192,
     lr: float = 3e-4,
     min_lr: float = 3e-5,
     lr_warmup_iters: int = 500,
@@ -242,7 +261,7 @@ def _glm_45v_common(
     model_cfg.seq_length = seq_length
 
     # Set pipeline model parallel layout for asymmetric stages
-    set_glm_45v_pipeline_model_parallel_layout(model_cfg, layout)
+    set_glm_45v_pipeline_model_parallel_layout(model_cfg, layout, is_peft=peft is not None)
 
     # Pipeline split for asymmetric stages are specified with the layout above
     model_cfg.account_for_embedding_in_pipeline_split = False
diff --git a/tests/unit_tests/recipes/test_glm_45v_recipes.py b/tests/unit_tests/recipes/test_glm_45v_recipes.py
index f4614b3fbc..d60d6636c2 100644
--- a/tests/unit_tests/recipes/test_glm_45v_recipes.py
+++ b/tests/unit_tests/recipes/test_glm_45v_recipes.py
@@ -252,10 +252,10 @@ def test_glm_45v_lora_defaults(monkeypatch: pytest.MonkeyPatch):
 
     _assert_basic_config(cfg)
 
-    # For LoRA, GLM-4.5V should use TP=1, PP=4, EP=2
+    # For LoRA, GLM-4.5V should use TP=1, PP=8, EP=4
     assert cfg.model.tensor_model_parallel_size == 1
-    assert cfg.model.pipeline_model_parallel_size == 4
-    assert cfg.model.expert_model_parallel_size == 2
+    assert cfg.model.pipeline_model_parallel_size == 8
+    assert cfg.model.expert_model_parallel_size == 4
 
     # Check PEFT config
     assert cfg.peft is not None
@@ -284,8 +284,8 @@ def test_glm_45v_dora_defaults(monkeypatch: pytest.MonkeyPatch):
 
     # For DoRA, GLM-4.5V should use same parallelism as LoRA
     assert cfg.model.tensor_model_parallel_size == 1
-    assert cfg.model.pipeline_model_parallel_size == 4
-    assert cfg.model.expert_model_parallel_size == 2
+    assert cfg.model.pipeline_model_parallel_size == 8
+    assert cfg.model.expert_model_parallel_size == 4
 
     # Check PEFT config (DoRA has alpha=64 by default, unlike LoRA's alpha=32)
     assert cfg.peft is not None
@@ -311,9 +311,9 @@ def test_glm_45v_full_sft_defaults(monkeypatch: pytest.MonkeyPatch):
 
     _assert_basic_config(cfg)
 
-    # For full SFT, GLM-4.5V should use TP=1, PP=4, EP=16
+    # For full SFT, GLM-4.5V should use TP=1, PP=8, EP=16
     assert cfg.model.tensor_model_parallel_size == 1
-    assert cfg.model.pipeline_model_parallel_size == 4
+    assert cfg.model.pipeline_model_parallel_size == 8
     assert cfg.model.expert_model_parallel_size == 16
     assert cfg.peft is None
 
@@ -363,80 +363,88 @@ def test_glm_45v_peft_with_freeze_options(monkeypatch: pytest.MonkeyPatch):
 
 
 # Pipeline layout tests
-def test_glm_45v_pipeline_layout_pp1():
-    """Test pipeline layout for PP=1."""
+
+
+def test_glm_45v_pipeline_layout_pp4():
+    """Test pipeline layout for PP=4."""
     model_cfg = _FakeModelCfg()
-    model_cfg.pipeline_model_parallel_size = 1
+    model_cfg.pipeline_model_parallel_size = 4
     model_cfg.virtual_pipeline_model_parallel_size = 1
 
     _glm_45v_module.set_glm_45v_pipeline_model_parallel_layout(model_cfg)
 
-    # PP=1 should have no layout (None)
-    assert model_cfg.pipeline_model_parallel_layout is None
+    # PP=4 should have 4 stages
+    assert model_cfg.pipeline_model_parallel_layout is not None
+    assert len(model_cfg.pipeline_model_parallel_layout) == 4
+    # First stage: embedding + 11 decoder layers
+    assert model_cfg.pipeline_model_parallel_layout[0][0] == "embedding"
+    # Last stage should have loss
+    assert "loss" in model_cfg.pipeline_model_parallel_layout[-1]
 
 
-def test_glm_45v_pipeline_layout_pp2():
-    """Test pipeline layout for PP=2."""
+def test_glm_45v_pipeline_layout_pp8():
+    """Test pipeline layout for PP=8."""
     model_cfg = _FakeModelCfg()
-    model_cfg.pipeline_model_parallel_size = 2
+    model_cfg.pipeline_model_parallel_size = 8
     model_cfg.virtual_pipeline_model_parallel_size = 1
 
     _glm_45v_module.set_glm_45v_pipeline_model_parallel_layout(model_cfg)
 
-    # PP=2 should split 46 layers: first stage 1+23=24, second stage 23
+    # PP=8 should have 8 stages (full SFT layout: embedding+1, then 7*6, then 3+loss)
     assert model_cfg.pipeline_model_parallel_layout is not None
-    assert len(model_cfg.pipeline_model_parallel_layout) == 2
-    # First stage: embedding + 23 decoder layers
+    assert len(model_cfg.pipeline_model_parallel_layout) == 8
+    # First stage: embedding + 1 decoder layer
     assert model_cfg.pipeline_model_parallel_layout[0][0] == "embedding"
-    assert model_cfg.pipeline_model_parallel_layout[0].count("decoder") == 23
-    # Last stage: 23 decoder layers + loss
-    assert model_cfg.pipeline_model_parallel_layout[1].count("decoder") == 23
-    assert "loss" in model_cfg.pipeline_model_parallel_layout[1]
+    assert model_cfg.pipeline_model_parallel_layout[0].count("decoder") == 1
+    # Last stage should have loss
+    assert "loss" in model_cfg.pipeline_model_parallel_layout[-1]
 
 
-def test_glm_45v_pipeline_layout_pp4():
-    """Test pipeline layout for PP=4."""
+def test_glm_45v_pipeline_layout_pp16():
+    """Test pipeline layout for PP=16."""
     model_cfg = _FakeModelCfg()
-    model_cfg.pipeline_model_parallel_size = 4
+    model_cfg.pipeline_model_parallel_size = 16
     model_cfg.virtual_pipeline_model_parallel_size = 1
 
     _glm_45v_module.set_glm_45v_pipeline_model_parallel_layout(model_cfg)
 
-    # PP=4 should have 4 stages
+    # PP=16 should have 16 stages (full SFT layout: embedding alone, then 3*14, then 3+loss)
     assert model_cfg.pipeline_model_parallel_layout is not None
-    assert len(model_cfg.pipeline_model_parallel_layout) == 4
-    # First stage: embedding + 11 decoder layers
+    assert len(model_cfg.pipeline_model_parallel_layout) == 16
+    # First stage: embedding only (no decoder layers, to balance vision encoder cost)
     assert model_cfg.pipeline_model_parallel_layout[0][0] == "embedding"
+    assert model_cfg.pipeline_model_parallel_layout[0].count("decoder") == 0
     # Last stage should have loss
     assert "loss" in model_cfg.pipeline_model_parallel_layout[-1]
 
 
-def test_glm_45v_pipeline_layout_pp8():
-    """Test pipeline layout for PP=8."""
+def test_glm_45v_pipeline_layout_pp8_peft():
+    """Test pipeline layout for PP=8 with PEFT."""
     model_cfg = _FakeModelCfg()
     model_cfg.pipeline_model_parallel_size = 8
     model_cfg.virtual_pipeline_model_parallel_size = 1
 
-    _glm_45v_module.set_glm_45v_pipeline_model_parallel_layout(model_cfg)
+    _glm_45v_module.set_glm_45v_pipeline_model_parallel_layout(model_cfg, is_peft=True)
 
-    # PP=8 should have 8 stages
+    # PP=8 PEFT layout: embedding+5, then 6*6, then 5+loss
     assert model_cfg.pipeline_model_parallel_layout is not None
     assert len(model_cfg.pipeline_model_parallel_layout) == 8
     # First stage: embedding + 5 decoder layers
     assert model_cfg.pipeline_model_parallel_layout[0][0] == "embedding"
+    assert model_cfg.pipeline_model_parallel_layout[0].count("decoder") == 5
     # Last stage should have loss
     assert "loss" in model_cfg.pipeline_model_parallel_layout[-1]
 
 
-def test_glm_45v_pipeline_layout_pp16():
-    """Test pipeline layout for PP=16."""
+def test_glm_45v_pipeline_layout_pp16_peft():
+    """Test pipeline layout for PP=16 with PEFT."""
     model_cfg = _FakeModelCfg()
     model_cfg.pipeline_model_parallel_size = 16
     model_cfg.virtual_pipeline_model_parallel_size = 1
 
-    _glm_45v_module.set_glm_45v_pipeline_model_parallel_layout(model_cfg)
+    _glm_45v_module.set_glm_45v_pipeline_model_parallel_layout(model_cfg, is_peft=True)
 
-    # PP=16 should have 16 stages
+    # PP=16 PEFT layout: embedding+2, then 3*14, then 2+loss
     assert model_cfg.pipeline_model_parallel_layout is not None
     assert len(model_cfg.pipeline_model_parallel_layout) == 16
     # First stage: embedding + 2 decoder layers
@@ -465,7 +473,7 @@ def test_glm_45v_pipeline_layout_in_config(monkeypatch: pytest.MonkeyPatch):
     monkeypatch.setattr(_glm_45v_module, "AutoBridge", _FakeAutoBridge)
 
     overrides = _safe_overrides_for("glm_45v_finetune_config")
-    overrides["pipeline_model_parallel_size"] = 2
+    overrides["pipeline_model_parallel_size"] = 8
 
     cfg = _glm_45v_module.glm_45v_finetune_config(**overrides)