Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f723594
[doc] refactor: Restructure examples folder - move recipes to models,…
yaoyu-33 Jan 29, 2026
5527465
Add GLM-4.5V and update Gemma 3 VL example scripts
yaoyu-33 Jan 29, 2026
dcae909
docs update
yaoyu-33 Jan 30, 2026
166d3e0
update run scripts
yaoyu-33 Jan 30, 2026
b8d29d4
docs(glm_45v): fix parallelism config in README to match scripts
yaoyu-33 Jan 30, 2026
1232791
fix comments
yaoyu-33 Jan 30, 2026
6a41455
Merge branch 'main' into add-glm45v-gemma3vl-examples
yaoyu-33 Jan 30, 2026
f71dbce
balance the decoder layer number on first PP rank for PP8 case
suiyoubi Feb 3, 2026
e0b62a1
update examples for glm45v
suiyoubi Feb 3, 2026
0f4af5c
Add resiliency examples for ft launcher and straggler detection (#2115)
ananthsub Feb 2, 2026
f283ad1
[🤖]: Update docs-versions after code-freeze for r0.3.0
github-actions[bot] Feb 2, 2026
219195c
[🤖]: Update docs-versions after code-freeze for r0.3.0
github-actions[bot] Feb 2, 2026
61e91b3
ci(fix): code freeze workflow (#2177)
ko3n1g Feb 2, 2026
89dddf0
Add refactored recipe files for pretrain configs of LLMs (#2067)
athitten Feb 2, 2026
ae8ae0d
docs: remove empty bullet point from News section (#2179)
yaoyu-33 Feb 2, 2026
78f0e92
Dsv3 Recipe Update (#2152)
dingqingy-nv Feb 3, 2026
779dce3
Version bump to `0.4.0rc0.dev0` (#2176)
github-actions[bot] Feb 3, 2026
9ebb2c6
Revert "Add refactored recipe files for pretrain configs of LLMs (#20…
ko3n1g Feb 3, 2026
8866c35
Revert packed seq extra checks (#2180)
cuichenx Feb 3, 2026
dc8b7af
DSv3 EP=8 for B200, PP8-VP2 for B300 BF16, Lm3.1 405B TP4-CP1 GB300 F…
malay-nagda Feb 3, 2026
e2fdda2
ci: Add secrets detector (#2154)
chtruong814 Feb 3, 2026
1109437
docs: Reorganize GLM-4.5V and Gemma 3 VL documentation
yaoyu-33 Feb 3, 2026
85de002
Merge branch 'main' into add-glm45v-gemma3vl-examples
yaoyu-33 Feb 3, 2026
26d4546
Merge branch 'main' into add-glm45v-gemma3vl-examples
yaoyu-33 Feb 3, 2026
11a0e70
Fix MyST xref warnings by using GitHub URLs for example links
yaoyu-33 Feb 3, 2026
3f5b063
Merge branch 'main' into add-glm45v-gemma3vl-examples
yaoyu-33 Feb 4, 2026
2a95b5f
fix(tests): update GLM-45V recipe tests for new PP=8 defaults
yaoyu-33 Feb 4, 2026
b5030db
update glm45 layout and test
suiyoubi Feb 4, 2026
bb011ea
ruff
suiyoubi Feb 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/models/llm/gemma3.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ torchrun --nproc-per-node=8 run/run_recipe.py \
- Gemma 3 1B: https://huggingface.co/google/gemma-3-1b-it

## Related Docs
- Gemma3 Vision-Language Models: [Gemma 3 VL](../vlm/gemma3-vl.md)
- Gemma3 Vision-Language Models: [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md)
- Recipe usage: [Recipe usage](../../recipe-usage.md)
- Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
- Training entry points: [Entry points](../../training/entry-points.md)
159 changes: 2 additions & 157 deletions docs/models/vlm/gemma3-vl.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,163 +44,9 @@ Gemma 3 VL builds on the Gemma 3 architecture with additional multimodal capabil
- **Multimodal Integration**: Seamless integration of visual and textual information through learned projection layers
- **Flexible Image Handling**: Supports variable resolution images and multiple images per conversation

## Conversion with 🤗 Hugging Face

### Import HF → Megatron
To import the HF VL model to your desired Megatron path:
```bash
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-3-4b-it \
--megatron-path /models/gemma-3-4b-it
```

### Export Megatron → HF
```bash
python examples/conversion/convert_checkpoints.py export \
--hf-model google/gemma-3-4b-it \
--megatron-path /results/gemma3_vl_4b/checkpoints/iter_00001000 \
--hf-path ./gemma3-vl-hf-export
```

### Run Inference on Converted Checkpoint

```bash
python examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path google/gemma-3-4b-it \
--megatron_model_path /models/gemma-3-4b-it \
--image_path <example image path> \
--prompt "Describe this image." \
--max_new_tokens 100
```

Note:
- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
- You can also use image URLs: `--image_path="https://example.com/image.jpg"`

## Finetune Recipes

- See: [bridge.recipes.gemma3_vl](../../apidocs/bridge/bridge.recipes.gemma3_vl.md)
- Available recipes:
- `gemma3_vl_4b_finetune_config`: Finetuning for 4B VL model with PEFT support
- `gemma3_vl_12b_finetune_config`: Finetuning for 12B VL model with PEFT support
- `gemma3_vl_27b_finetune_config`: Finetuning for 27B VL model with PEFT support

Before training, ensure the following environment variables are set:
1. `SAVE_DIR`: checkpoint and log saving directory
2. `HF_TOKEN`: to download models from HF Hub (if required)
3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
4. `WANDB_API_KEY`: (optional) to enable WandB logging

### Full Finetuning

```bash
torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
--pretrained-checkpoint /models/gemma-3-4b-it \
--recipe gemma3_vl_4b_finetune_config \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma3_vl_4b_finetune
```

Or programmatically:
```python
from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config

# Full finetuning
config = gemma3_vl_4b_finetune_config(
name="gemma3_vl_4b_full_finetune",
pretrained_checkpoint="/models/gemma-3-4b-it",
dataset_type="hf",
peft=None,
train_iters=1000,
global_batch_size=64,
)
```

### Parameter-Efficient Finetuning (PEFT) with LoRA

```bash
torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
--pretrained-checkpoint /models/gemma-3-4b-it \
--recipe gemma3_vl_4b_finetune_config \
--peft_scheme lora \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=128 \
checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora
```

PEFT options:
- `--peft_scheme`: Set to `lora` for LoRA or `dora` for DoRA. Omit for full finetuning.

You can also combine PEFT with freeze options:
- `model.freeze_language_model=True`: Freeze the language model
- `model.freeze_vision_model=True`: Freeze the vision encoder
- `model.freeze_vision_projection=True`: Freeze the vision projection layer

Example with freeze options:
```bash
torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
--pretrained-checkpoint /models/gemma-3-4b-it \
--recipe gemma3_vl_4b_finetune_config \
--peft_scheme lora \
model.freeze_language_model=True \
model.freeze_vision_model=False \
checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora_vision
```

Programmatic configuration:
```python
from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config

# LoRA finetuning
config = gemma3_vl_4b_finetune_config(
name="gemma3_vl_4b_lora_finetune",
pretrained_checkpoint="/models/gemma-3-4b-it",
dataset_type="hf",
peft="lora", # or "dora"
train_iters=1000,
global_batch_size=128,
)

# LoRA with vision model frozen
config = gemma3_vl_4b_finetune_config(
name="gemma3_vl_4b_lora_language_only",
pretrained_checkpoint="/models/gemma-3-4b-it",
peft="lora",
freeze_vision_model=True,
freeze_vision_projection=True,
)
```

### Recommended Configurations

| Model | Mode | TP | PP | Global Batch Size | Learning Rate | Hardware |
|-------|------|----|----|-------------------|---------------|----------|
| Gemma 3 VL 4B | Full SFT | 1 | 1 | 32-64 | 5e-6 | 8 GPUs |
| Gemma 3 VL 4B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
| Gemma 3 VL 12B | Full SFT | 4 | 1 | 32-64 | 5e-6 | 8 GPUs |
| Gemma 3 VL 12B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
| Gemma 3 VL 27B | Full SFT | 8 | 2 | 16-32 | 5e-6 | 16 GPUs |
| Gemma 3 VL 27B | LoRA/DoRA | 4 | 1 | 32-64 | 1e-4 | 16 GPUs |

**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs.

## Example Datasets

| Dataset | Maker Name | Description |
|---------|------------|-------------|
| [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) | `make_cord_v2_dataset` | OCR receipts: Single-image-text dataset for receipt understanding |
| [MedPix-VQA](https://huggingface.co/datasets/mmoukouba/MedPix-VQA) | `make_medpix_dataset` | Medical VQA: Single-image Q&A for clinical images |
| [The Cauldron (Raven subset)](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | `make_raven_dataset` | Visual reasoning: Multi-image analogical reasoning |

To change the dataset, specify `dataset.maker_name=<maker_name>` in your command.

## Examples
- Checkpoint import/export: [examples/conversion/convert_checkpoints.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/convert_checkpoints.py)
- Generate with VLM (HF→Megatron): [examples/conversion/hf_to_megatron_generate_vlm.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/hf_to_megatron_generate_vlm.py)

For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [Gemma 3 VL Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md).

## Hugging Face Model Cards

Expand All @@ -213,4 +59,3 @@ To change the dataset, specify `dataset.maker_name=<maker_name>` in your command
- Recipe usage: [Recipe usage](../../recipe-usage.md)
- Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
- Training entry points: [Entry points](../../training/entry-points.md)

131 changes: 3 additions & 128 deletions docs/models/vlm/glm-45v.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Please update `transformers` version to 4.57.1 or higher in order to use the GLM
- 128 MoE experts with shared experts
- ~12B active parameters per token
- Sequence length: 131,072 tokens
- Recommended: 4 nodes, 32 GPUs (LoRA/DoRA) or 16 nodes, 128 GPUs (Full SFT)
- Recommended: 32 nodes, 256 GPUs (LoRA/DoRA) or 64 nodes, 512 GPUs (Full SFT)

## Model Architecture Features

Expand All @@ -39,134 +39,9 @@ GLM-4.5V combines efficient sparse MoE language modeling with multimodal capabil
- **Image and Video Support**: Handles both static images and video inputs
- **Flexible Image Handling**: Supports variable resolution images and multiple images per conversation

## Conversion with 🤗 Hugging Face

### Import HF → Megatron
To import the HF VL model to your desired Megatron path:
```bash
python examples/conversion/convert_checkpoints.py import \
--hf-model zai-org/GLM-4.5V \
--megatron-path /models/glm-45v
```

### Export Megatron → HF
```bash
python examples/conversion/convert_checkpoints.py export \
--hf-model zai-org/GLM-4.5V \
--megatron-path /results/glm_45v/checkpoints/iter_0001000 \
--hf-path ./glm-45v-hf-export
```

### Run Inference on Converted Checkpoint

```bash
python examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path zai-org/GLM-4.5V \
--megatron_model_path /models/glm-45v \
--image_path <example image path> \
--prompt "Describe this image." \
--max_new_tokens 100
```

Note:
- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
- You can also use image URLs: `--image_path="https://example.com/image.jpg"`

## Finetune Recipes

- See: [bridge.recipes.glm_vl](../../apidocs/bridge/bridge.recipes.glm_vl.md)
- Available recipes:
- `glm_45v_finetune_config`: Finetuning for GLM-4.5V model with PEFT support

Before training, ensure the following environment variables are set:
1. `SAVE_DIR`: checkpoint and log saving directory
2. `HF_TOKEN`: to download models from HF Hub (if required)
3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
4. `WANDB_API_KEY`: (optional) to enable WandB logging

### Full Finetuning

```python
from megatron.bridge.recipes.glm_vl import glm_45v_finetune_config

# Full finetuning
config = glm_45v_finetune_config(
name="glm_45v_full_finetune",
pretrained_checkpoint="/models/glm-45v",
dataset_type="hf",
peft=None,
train_iters=1000,
global_batch_size=32,
)
```

### Parameter-Efficient Finetuning (PEFT) with LoRA

```python
config = glm_45v_finetune_config(
name="glm_45v_full_finetune",
pretrained_checkpoint="/models/glm-45v",
dataset_type="hf",
peft='lora',
train_iters=1000,
global_batch_size=32,
)
```

PEFT options:
- `--peft-scheme`: Set to `lora` for LoRA or `dora` for DoRA. Omit for full finetuning.

You can also combine PEFT with freeze options:
- `--freeze-language-model`: Freeze the language model
- `--freeze-vision-model`: Freeze the vision encoder
- `--freeze-vision-projection`: Freeze the vision projection layer

Example with freeze options:
```python
from megatron.bridge.recipes.glm_vl import glm_45v_finetune_config

# LoRA finetuning
config = glm_45v_finetune_config(
name="glm_45v_lora_finetune",
pretrained_checkpoint="/models/glm-45v",
dataset_type="hf",
peft="lora", # or "dora"
train_iters=1000,
global_batch_size=64,
)

# LoRA with vision model frozen
config = glm_45v_finetune_config(
name="glm_45v_lora_language_only",
pretrained_checkpoint="/models/glm-45v",
peft="lora",
freeze_vision_model=True,
freeze_vision_projection=True,
)
```

### Recommended Configurations

| Model | Mode | TP | PP | EP | Global Batch Size | Learning Rate | Hardware |
|-------|------|----|----|-----|-------------------|---------------|----------|
| GLM-4.5V | Full SFT | 1 | 8 | 16 | 16-32 | 5e-6 | 128 GPUs (16 nodes) |
| GLM-4.5V | LoRA/DoRA | 1 | 8 | 4 | 32-64 | 1e-4 | 32 GPUs (4 nodes) |

**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs. The sparse MoE architecture requires Expert Parallelism (EP) for efficient training.

## Example Datasets

| Dataset | Maker Name | Description |
|---------|------------|-------------|
| [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) | `make_cord_v2_dataset` | OCR receipts: Single-image-text dataset for receipt understanding |
| [MedPix-VQA](https://huggingface.co/datasets/mmoukouba/MedPix-VQA) | `make_medpix_dataset` | Medical VQA: Single-image Q&A for clinical images |
| [The Cauldron (Raven subset)](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | `make_raven_dataset` | Visual reasoning: Multi-image analogical reasoning |

To change the dataset, specify `dataset.maker_name=<maker_name>` in your command.

## Examples
- Checkpoint import/export: [examples/conversion/convert_checkpoints.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/convert_checkpoints.py)
- Generate with VLM (HF→Megatron): [examples/conversion/hf_to_megatron_generate_vlm.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/hf_to_megatron_generate_vlm.py)

For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [GLM-4.5V Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/glm_45v/README.md).

## Hugging Face Model Cards

Expand Down
Loading
Loading