Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 2 additions & 100 deletions docs/models/vlm/qwen3-vl.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,107 +13,9 @@ Unless explicitly stated, any megatron model path in the commands below should N
[here](https://docs.nvidia.com/nemo/megatron-bridge/latest/training/checkpointing.html#checkpoint-contents)
```

## Conversion with 🤗 Hugging Face
## Examples

### Import HF → Megatron
To import the HF model to your desired `$MEGATRON_MODEL_PATH`, run the following command.
```bash
python examples/conversion/convert_checkpoints.py import \
--hf-model $HF_MODEL_PATH \
--megatron-path $MEGATRON_MODEL_PATH
```

### Export Megatron → HF
You can export a trained model with the following command.
```bash
python examples/conversion/convert_checkpoints.py export \
--hf-model $HF_MODEL_PATH \
--megatron-path <trained megatron model path> \
--hf-path <output hf model path>
```

### Run In-Framework Inference on Converted Checkpoint
You can run a quick sanity check on the converted checkpoint with the following command.
```bash
python examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path $HF_MODEL_PATH \
--megatron_model_path $MEGATRON_MODEL_PATH \
--image_path <example image path> \
--prompt "Describe this image." \
--max_new_tokens 100
```

## Finetuning Recipes
Before training, ensure the following environment variables are set:
1. `SAVE_DIR`: to specify a checkpoint and log saving directory
2. `HF_TOKEN`: to download models from HF Hub (if required)
3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
4. `WANDB_API_KEY`: (optional) to enable WandB logging

### Full Finetuning

Example usage for full parameter finetuning:

```bash
torchrun --nproc-per-node=8 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \
--pretrained-checkpoint $MEGATRON_MODEL_PATH \
--recipe qwen3_vl_8b_finetune_config \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=<batch size> \
train.train_iters=<number of iterations> \
logger.wandb_project=<optional wandb project name> \
logger.wandb_save_dir=$SAVE_DIR \
checkpoint.save=$SAVE_DIR/<experiment name>
```

For MoE models with expert parallelism:
```bash
torchrun --nproc-per-node=8 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \
--pretrained-checkpoint $MEGATRON_MODEL_PATH \
--recipe qwen3_vl_30b_a3b_finetune_config \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=<batch size> \
train.train_iters=<number of iterations> \
checkpoint.save=$SAVE_DIR/<experiment name>
```

Note:
- The `--recipe` parameter selects the model configuration:
- `qwen3_vl_8b_finetune_config` - for 8B dense model
- `qwen3_vl_30b_a3b_finetune_config` - for 30B MoE model
- For dataset formats and additional information, refer to the [Qwen2.5-VL documentation]
- See the full script with examples at [`examples/models/vlm/qwen_vl/finetune_qwen_vl.py`](../../../examples/models/vlm/qwen_vl/finetune_qwen_vl.py)

### PEFT (Parameter-Efficient Fine-Tuning)

Qwen3-VL supports PEFT methods including LoRA and DoRA for memory-efficient training. PEFT trains only adapter parameters (~1-2% of model), significantly reducing memory requirements and enabling faster training.

**LoRA with 8B Dense Model (1 GPU):**
```bash
torchrun --nproc-per-node=1 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \
--pretrained-checkpoint $MEGATRON_MODEL_PATH \
--recipe qwen3_vl_8b_finetune_config \
--dataset-type hf \
--peft lora \
checkpoint.save=$SAVE_DIR/<experiment name>
```

**LoRA with 30B MoE Model (8 GPUs with Expert Parallelism):**
```bash
torchrun --nproc-per-node=8 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \
--pretrained-checkpoint $MEGATRON_MODEL_PATH \
--recipe qwen3_vl_30b_a3b_finetune_config \
--dataset-type hf \
--peft lora \
checkpoint.save=$SAVE_DIR/<experiment name>
```

**DoRA Training:**
```bash
--peft dora
```
For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [Qwen3-VL Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/qwen3_vl/README.md).

## Hugging Face Model Cards
- Qwen3-VL-8B: `https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct`
Expand Down
14 changes: 8 additions & 6 deletions docs/training/multi-token-prediction.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,13 @@ where:
Here's a minimal example using the Qwen3 30B-A3B recipe with MTP enabled:

```python
from megatron.bridge.recipes.qwen import qwen3_30b_a3b_pretrain
from megatron.bridge.recipes.qwen.qwen3_moe import qwen3_30b_a3b_pretrain_config
from megatron.bridge.training.pretrain import pretrain
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.config import ConfigContainer

log_dir = f"/path/to/log/dir"
log_dir = "/path/to/log/dir"
cfg: ConfigContainer = qwen3_30b_a3b_pretrain_config()
cfg.logger.log_dir = log_dir
cfg.logger.tensorboard_dir = log_dir + "/tb_logs"
cfg.checkpoint.save = log_dir + "/checkpoints"
cfg.checkpoint.load = log_dir + "/checkpoints"
Expand All @@ -82,10 +83,11 @@ cfg.dataset.blend=[[
], None]
cfg.dataset.split="9999,8,2"
cfg.dataset.path_to_cache = "/path/to/cache"
# cfg.model.num_layers = 8 # train a smaller model if OOM
# MTP Configuration
cfg.mtp_num_layers = 1
cfg.mtp_loss_scaling_factor = 0.1
pretrain(cfg)
cfg.model.mtp_num_layers = 1
cfg.model.mtp_loss_scaling_factor = 0.1
pretrain(cfg, forward_step)
```
Follow the [DCLM Tutorial](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/tutorials/data/dclm) to prepare the training data

Expand Down
4 changes: 4 additions & 0 deletions examples/models/vlm/gemma3_vl/peft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@
# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}

# Before training, make sure to set WANDB_API_KEY or disable wandb logging
# export WANDB_API_KEY=<your_wandb_api_key>
# export WANDB_MODE=disabled

# Common configurations
PRETRAINED_CHECKPOINT=${WORKSPACE}/models/gemma-3-4b-it
MODEL_NAME=gemma3_vl_4b
Expand Down
4 changes: 4 additions & 0 deletions examples/models/vlm/gemma3_vl/sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@
# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}

# Before training, make sure to set WANDB_API_KEY or disable wandb logging
# export WANDB_API_KEY=<your_wandb_api_key>
# export WANDB_MODE=disabled

# Common configurations
PRETRAINED_CHECKPOINT=${WORKSPACE}/models/gemma-3-4b-it
MODEL_NAME=gemma3_vl_4b
Expand Down
2 changes: 2 additions & 0 deletions examples/models/vlm/glm_45v/slurm_peft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ export NCCL_NVLS_ENABLE=0
# Authentication tokens (set these for your environment)
# export HF_TOKEN="hf_your_token_here"
# export WANDB_API_KEY="your_wandb_key_here"
# or disable wandb logging
# export WANDB_MODE=disabled

# ==============================================================================
# Job Execution
Expand Down
2 changes: 2 additions & 0 deletions examples/models/vlm/glm_45v/slurm_sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ export NCCL_NVLS_ENABLE=0
# Authentication tokens (set these for your environment)
# export HF_TOKEN="hf_your_token_here"
# export WANDB_API_KEY="your_wandb_key_here"
# or disable wandb logging
# export WANDB_MODE=disabled

# ==============================================================================
# Job Execution
Expand Down
10 changes: 7 additions & 3 deletions examples/models/vlm/ministral3/conversion.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,21 @@
# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}

# Note: Ministral 3 requires transformers version 5
# uv pip install --upgrade transformers
# Commands below use uv run --no-sync to avoid conflicts with the virtual environment.

# Import HF → Megatron
uv run python examples/conversion/convert_checkpoints.py import \
uv run --no-sync python examples/conversion/convert_checkpoints.py import \
--hf-model mistralai/Ministral-3-3B-Instruct-2512-BF16 \
--megatron-path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16

# Export Megatron → HF
uv run python examples/conversion/convert_checkpoints.py export \
uv run --no-sync python examples/conversion/convert_checkpoints.py export \
--hf-model mistralai/Ministral-3-3B-Instruct-2512-BF16 \
--megatron-path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000 \
--hf-path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export

# Round-trip validation
uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
uv run --no-sync python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
--hf-model-id mistralai/Ministral-3-3B-Instruct-2512-BF16 --tp 2 --pp 2
10 changes: 7 additions & 3 deletions examples/models/vlm/ministral3/inference.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,12 @@
# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}

# Note: Ministral 3 requires transformers version 5
# uv pip install --upgrade transformers
# Commands below use uv run --no-sync to avoid conflicts with the virtual environment.

# Inference with Hugging Face checkpoints
uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
uv run --no-sync python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path mistralai/Ministral-3-3B-Instruct-2512-BF16 \
--image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
--prompt "Describe this image." \
Expand All @@ -26,7 +30,7 @@ uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf
--pp 2

# Inference with imported Megatron checkpoints
uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
uv run --no-sync python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path mistralai/Ministral-3-3B-Instruct-2512-BF16 \
--megatron_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000 \
--image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
Expand All @@ -36,7 +40,7 @@ uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf
--pp 2

# Inference with exported HF checkpoints
uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
uv run --no-sync python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export \
--image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
--prompt "Describe this image." \
Expand Down
10 changes: 9 additions & 1 deletion examples/models/vlm/ministral3/peft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@
# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}

# Note: Ministral 3 requires transformers version 5
# uv pip install --upgrade transformers
# Commands below use uv run --no-sync to avoid conflicts with the virtual environment.

# Before training, make sure to set WANDB_API_KEY or disable wandb logging
# export WANDB_API_KEY=<your_wandb_api_key>
# export WANDB_MODE=disabled

# Common configurations
PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16
MODEL_NAME=ministral3_3b
Expand All @@ -38,7 +46,7 @@ for config in "${PARALLELISM_CONFIGS[@]}"; do
IFS=',' read -r TP PP <<< "$config"

echo "Running LoRA finetuning with TP=$TP, PP=$PP"
uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
uv run --no-sync python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
--recipe ${MODEL_NAME}_finetune_config \
--step_func vlm_step \
--peft_scheme lora \
Expand Down
10 changes: 9 additions & 1 deletion examples/models/vlm/ministral3/sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@
# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}

# Note: Ministral 3 requires transformers version 5
# uv pip install --upgrade transformers
# Commands below use uv run --no-sync to avoid conflicts with the virtual environment.

# Before training, make sure to set WANDB_API_KEY or disable wandb logging
# export WANDB_API_KEY=<your_wandb_api_key>
# export WANDB_MODE=disabled

# Common configurations
PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16
MODEL_NAME=ministral3_3b
Expand All @@ -38,7 +46,7 @@ for config in "${PARALLELISM_CONFIGS[@]}"; do
IFS=',' read -r TP PP <<< "$config"

echo "Running full finetuning with TP=$TP, PP=$PP"
uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
uv run --no-sync python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
--recipe ${MODEL_NAME}_finetune_config \
--step_func vlm_step \
checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \
Expand Down
Loading