diff --git a/docs/models/vlm/qwen3-vl.md b/docs/models/vlm/qwen3-vl.md index ae8ed510b8..f87cfcf902 100644 --- a/docs/models/vlm/qwen3-vl.md +++ b/docs/models/vlm/qwen3-vl.md @@ -13,107 +13,9 @@ Unless explicitly stated, any megatron model path in the commands below should N [here](https://docs.nvidia.com/nemo/megatron-bridge/latest/training/checkpointing.html#checkpoint-contents) ``` -## Conversion with 🤗 Hugging Face +## Examples -### Import HF → Megatron -To import the HF model to your desired `$MEGATRON_MODEL_PATH`, run the following command. -```bash -python examples/conversion/convert_checkpoints.py import \ ---hf-model $HF_MODEL_PATH \ ---megatron-path $MEGATRON_MODEL_PATH -``` - -### Export Megatron → HF -You can export a trained model with the following command. -```bash -python examples/conversion/convert_checkpoints.py export \ ---hf-model $HF_MODEL_PATH \ ---megatron-path \ ---hf-path -``` - -### Run In-Framework Inference on Converted Checkpoint -You can run a quick sanity check on the converted checkpoint with the following command. -```bash -python examples/conversion/hf_to_megatron_generate_vlm.py \ ---hf_model_path $HF_MODEL_PATH \ ---megatron_model_path $MEGATRON_MODEL_PATH \ ---image_path \ ---prompt "Describe this image." \ ---max_new_tokens 100 -``` - -## Finetuning Recipes -Before training, ensure the following environment variables are set: -1. `SAVE_DIR`: to specify a checkpoint and log saving directory -2. `HF_TOKEN`: to download models from HF Hub (if required) -3. `HF_HOME`: (optional) to avoid re-downloading models and datasets -4. `WANDB_API_KEY`: (optional) to enable WandB logging - -### Full Finetuning - -Example usage for full parameter finetuning: - -```bash -torchrun --nproc-per-node=8 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \ ---pretrained-checkpoint $MEGATRON_MODEL_PATH \ ---recipe qwen3_vl_8b_finetune_config \ ---dataset-type hf \ -dataset.maker_name=make_cord_v2_dataset \ -train.global_batch_size= \ -train.train_iters= \ -logger.wandb_project= \ -logger.wandb_save_dir=$SAVE_DIR \ -checkpoint.save=$SAVE_DIR/ -``` - -For MoE models with expert parallelism: -```bash -torchrun --nproc-per-node=8 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \ ---pretrained-checkpoint $MEGATRON_MODEL_PATH \ ---recipe qwen3_vl_30b_a3b_finetune_config \ ---dataset-type hf \ -dataset.maker_name=make_cord_v2_dataset \ -train.global_batch_size= \ -train.train_iters= \ -checkpoint.save=$SAVE_DIR/ -``` - -Note: -- The `--recipe` parameter selects the model configuration: - - `qwen3_vl_8b_finetune_config` - for 8B dense model - - `qwen3_vl_30b_a3b_finetune_config` - for 30B MoE model -- For dataset formats and additional information, refer to the [Qwen2.5-VL documentation] -- See the full script with examples at [`examples/models/vlm/qwen_vl/finetune_qwen_vl.py`](../../../examples/models/vlm/qwen_vl/finetune_qwen_vl.py) - -### PEFT (Parameter-Efficient Fine-Tuning) - -Qwen3-VL supports PEFT methods including LoRA and DoRA for memory-efficient training. PEFT trains only adapter parameters (~1-2% of model), significantly reducing memory requirements and enabling faster training. - -**LoRA with 8B Dense Model (1 GPU):** -```bash -torchrun --nproc-per-node=1 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \ ---pretrained-checkpoint $MEGATRON_MODEL_PATH \ ---recipe qwen3_vl_8b_finetune_config \ ---dataset-type hf \ ---peft lora \ -checkpoint.save=$SAVE_DIR/ -``` - -**LoRA with 30B MoE Model (8 GPUs with Expert Parallelism):** -```bash -torchrun --nproc-per-node=8 examples/models/vlm/qwen_vl/finetune_qwen_vl.py \ ---pretrained-checkpoint $MEGATRON_MODEL_PATH \ ---recipe qwen3_vl_30b_a3b_finetune_config \ ---dataset-type hf \ ---peft lora \ -checkpoint.save=$SAVE_DIR/ -``` - -**DoRA Training:** -```bash ---peft dora -``` +For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [Qwen3-VL Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/qwen3_vl/README.md). ## Hugging Face Model Cards - Qwen3-VL-8B: `https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct` diff --git a/docs/training/multi-token-prediction.md b/docs/training/multi-token-prediction.md index d7c3f63b2b..3cfbd5a149 100644 --- a/docs/training/multi-token-prediction.md +++ b/docs/training/multi-token-prediction.md @@ -66,12 +66,13 @@ where: Here's a minimal example using the Qwen3 30B-A3B recipe with MTP enabled: ```python -from megatron.bridge.recipes.qwen import qwen3_30b_a3b_pretrain +from megatron.bridge.recipes.qwen.qwen3_moe import qwen3_30b_a3b_pretrain_config from megatron.bridge.training.pretrain import pretrain +from megatron.bridge.training.gpt_step import forward_step +from megatron.bridge.training.config import ConfigContainer -log_dir = f"/path/to/log/dir" +log_dir = "/path/to/log/dir" cfg: ConfigContainer = qwen3_30b_a3b_pretrain_config() -cfg.logger.log_dir = log_dir cfg.logger.tensorboard_dir = log_dir + "/tb_logs" cfg.checkpoint.save = log_dir + "/checkpoints" cfg.checkpoint.load = log_dir + "/checkpoints" @@ -82,10 +83,11 @@ cfg.dataset.blend=[[ ], None] cfg.dataset.split="9999,8,2" cfg.dataset.path_to_cache = "/path/to/cache" +# cfg.model.num_layers = 8 # train a smaller model if OOM # MTP Configuration -cfg.mtp_num_layers = 1 -cfg.mtp_loss_scaling_factor = 0.1 -pretrain(cfg) +cfg.model.mtp_num_layers = 1 +cfg.model.mtp_loss_scaling_factor = 0.1 +pretrain(cfg, forward_step) ``` Follow the [DCLM Tutorial](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/tutorials/data/dclm) to prepare the training data diff --git a/examples/models/vlm/gemma3_vl/peft.sh b/examples/models/vlm/gemma3_vl/peft.sh index a4786e14d3..c966900ee4 100644 --- a/examples/models/vlm/gemma3_vl/peft.sh +++ b/examples/models/vlm/gemma3_vl/peft.sh @@ -16,6 +16,10 @@ # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace} +# Before training, make sure to set WANDB_API_KEY or disable wandb logging +# export WANDB_API_KEY= +# export WANDB_MODE=disabled + # Common configurations PRETRAINED_CHECKPOINT=${WORKSPACE}/models/gemma-3-4b-it MODEL_NAME=gemma3_vl_4b diff --git a/examples/models/vlm/gemma3_vl/sft.sh b/examples/models/vlm/gemma3_vl/sft.sh index b7715c4eaf..820c9c3298 100755 --- a/examples/models/vlm/gemma3_vl/sft.sh +++ b/examples/models/vlm/gemma3_vl/sft.sh @@ -16,6 +16,10 @@ # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace} +# Before training, make sure to set WANDB_API_KEY or disable wandb logging +# export WANDB_API_KEY= +# export WANDB_MODE=disabled + # Common configurations PRETRAINED_CHECKPOINT=${WORKSPACE}/models/gemma-3-4b-it MODEL_NAME=gemma3_vl_4b diff --git a/examples/models/vlm/glm_45v/slurm_peft.sh b/examples/models/vlm/glm_45v/slurm_peft.sh index 017da6a74c..e876089c34 100755 --- a/examples/models/vlm/glm_45v/slurm_peft.sh +++ b/examples/models/vlm/glm_45v/slurm_peft.sh @@ -90,6 +90,8 @@ export NCCL_NVLS_ENABLE=0 # Authentication tokens (set these for your environment) # export HF_TOKEN="hf_your_token_here" # export WANDB_API_KEY="your_wandb_key_here" +# or disable wandb logging +# export WANDB_MODE=disabled # ============================================================================== # Job Execution diff --git a/examples/models/vlm/glm_45v/slurm_sft.sh b/examples/models/vlm/glm_45v/slurm_sft.sh index f23dee3c43..e76e6968a1 100644 --- a/examples/models/vlm/glm_45v/slurm_sft.sh +++ b/examples/models/vlm/glm_45v/slurm_sft.sh @@ -90,6 +90,8 @@ export NCCL_NVLS_ENABLE=0 # Authentication tokens (set these for your environment) # export HF_TOKEN="hf_your_token_here" # export WANDB_API_KEY="your_wandb_key_here" +# or disable wandb logging +# export WANDB_MODE=disabled # ============================================================================== # Job Execution diff --git a/examples/models/vlm/ministral3/conversion.sh b/examples/models/vlm/ministral3/conversion.sh index 296af05d3c..7b0bbad008 100755 --- a/examples/models/vlm/ministral3/conversion.sh +++ b/examples/models/vlm/ministral3/conversion.sh @@ -16,17 +16,21 @@ # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace} +# Note: Ministral 3 requires transformers version 5 +# uv pip install --upgrade transformers +# Commands below use uv run --no-sync to avoid conflicts with the virtual environment. + # Import HF → Megatron -uv run python examples/conversion/convert_checkpoints.py import \ +uv run --no-sync python examples/conversion/convert_checkpoints.py import \ --hf-model mistralai/Ministral-3-3B-Instruct-2512-BF16 \ --megatron-path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16 # Export Megatron → HF -uv run python examples/conversion/convert_checkpoints.py export \ +uv run --no-sync python examples/conversion/convert_checkpoints.py export \ --hf-model mistralai/Ministral-3-3B-Instruct-2512-BF16 \ --megatron-path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000 \ --hf-path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export # Round-trip validation -uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \ +uv run --no-sync python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \ --hf-model-id mistralai/Ministral-3-3B-Instruct-2512-BF16 --tp 2 --pp 2 diff --git a/examples/models/vlm/ministral3/inference.sh b/examples/models/vlm/ministral3/inference.sh index 98e20c2050..de0b8bee29 100755 --- a/examples/models/vlm/ministral3/inference.sh +++ b/examples/models/vlm/ministral3/inference.sh @@ -16,8 +16,12 @@ # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace} +# Note: Ministral 3 requires transformers version 5 +# uv pip install --upgrade transformers +# Commands below use uv run --no-sync to avoid conflicts with the virtual environment. + # Inference with Hugging Face checkpoints -uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ +uv run --no-sync python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ --hf_model_path mistralai/Ministral-3-3B-Instruct-2512-BF16 \ --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ --prompt "Describe this image." \ @@ -26,7 +30,7 @@ uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf --pp 2 # Inference with imported Megatron checkpoints -uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ +uv run --no-sync python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ --hf_model_path mistralai/Ministral-3-3B-Instruct-2512-BF16 \ --megatron_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000 \ --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ @@ -36,7 +40,7 @@ uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf --pp 2 # Inference with exported HF checkpoints -uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ +uv run --no-sync python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ --hf_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export \ --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ --prompt "Describe this image." \ diff --git a/examples/models/vlm/ministral3/peft.sh b/examples/models/vlm/ministral3/peft.sh index 0fb8e1b38e..b3c44a2f86 100755 --- a/examples/models/vlm/ministral3/peft.sh +++ b/examples/models/vlm/ministral3/peft.sh @@ -16,6 +16,14 @@ # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace} +# Note: Ministral 3 requires transformers version 5 +# uv pip install --upgrade transformers +# Commands below use uv run --no-sync to avoid conflicts with the virtual environment. + +# Before training, make sure to set WANDB_API_KEY or disable wandb logging +# export WANDB_API_KEY= +# export WANDB_MODE=disabled + # Common configurations PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16 MODEL_NAME=ministral3_3b @@ -38,7 +46,7 @@ for config in "${PARALLELISM_CONFIGS[@]}"; do IFS=',' read -r TP PP <<< "$config" echo "Running LoRA finetuning with TP=$TP, PP=$PP" - uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ + uv run --no-sync python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ --recipe ${MODEL_NAME}_finetune_config \ --step_func vlm_step \ --peft_scheme lora \ diff --git a/examples/models/vlm/ministral3/sft.sh b/examples/models/vlm/ministral3/sft.sh index 193afaf10e..a22eebbb03 100755 --- a/examples/models/vlm/ministral3/sft.sh +++ b/examples/models/vlm/ministral3/sft.sh @@ -16,6 +16,14 @@ # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace} +# Note: Ministral 3 requires transformers version 5 +# uv pip install --upgrade transformers +# Commands below use uv run --no-sync to avoid conflicts with the virtual environment. + +# Before training, make sure to set WANDB_API_KEY or disable wandb logging +# export WANDB_API_KEY= +# export WANDB_MODE=disabled + # Common configurations PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16 MODEL_NAME=ministral3_3b @@ -38,7 +46,7 @@ for config in "${PARALLELISM_CONFIGS[@]}"; do IFS=',' read -r TP PP <<< "$config" echo "Running full finetuning with TP=$TP, PP=$PP" - uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ + uv run --no-sync python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ --recipe ${MODEL_NAME}_finetune_config \ --step_func vlm_step \ checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \ diff --git a/examples/models/vlm/qwen3_vl/README.md b/examples/models/vlm/qwen3_vl/README.md new file mode 100644 index 0000000000..f62008f601 --- /dev/null +++ b/examples/models/vlm/qwen3_vl/README.md @@ -0,0 +1,120 @@ +# Qwen 3 VL - Vision Language Model + +This directory contains example scripts for Qwen 3 vision-language models. + +For model introduction and architecture details, see the [Qwen 3 - VL documentation](../../../../docs/models/vlm/qwen3-vl.md). + +## Workspace Configuration + +All scripts use a `WORKSPACE` environment variable to define the base directory for checkpoints and results. By default, this is set to `/workspace`. You can override it: + +```bash +export WORKSPACE=/your/custom/path +``` + +Directory structure: +- `${WORKSPACE}/models/` - Converted checkpoints +- `${WORKSPACE}/results/` - Training outputs and experiment results + +## Checkpoint Conversion + +### Import HF → Megatron +To import the HF VL model to your desired Megatron path: +```bash +python examples/conversion/convert_checkpoints.py import \ + --hf-model Qwen/Qwen3-VL-8B-Instruct \ + --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct +``` + +### Export Megatron → HF +```bash +python examples/conversion/convert_checkpoints.py export \ + --hf-model Qwen/Qwen3-VL-8B-Instruct \ + --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct/iter_0000000 \ + --hf-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct-hf-export +``` + +## Inference + +### Run Inference on Converted Checkpoint + +```bash +python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ + --hf_model_path Qwen/Qwen3-VL-8B-Instruct \ + --megatron_model_path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct/iter_0000000 \ + --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ + --prompt "Describe this image." \ + --max_new_tokens 100 +``` + +Note: +- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward. +- You can also use image URLs: `--image_path="https://example.com/image.jpg"` + +See the [inference.sh](inference.sh) script for commands to: +- Run inference with Hugging Face checkpoints +- Run inference with imported Megatron checkpoints +- Run inference with exported Hugging Face checkpoints + +**Expected output:** +``` +... +Generation step 46 +Generation step 47 +Generation step 48 +Generation step 49 +======== GENERATED TEXT OUTPUT ======== +Image: https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png +Prompt: Describe this image. +Generated: <|im_start|>user +<|vision_start|><|image_pad|><|image_pad|> +... +<|image_pad|><|vision_end|>Describe this image.<|im_end|> +<|im_start|>assistant +This image displays a **technical specifications table** comparing two variants of NVIDIA's H100 GPU: the **H100 SXM** and the **H100 NVL**. + +The table is organized into rows, each detailing a specific performance or hardware characteristic, with columns showing the corresponding value for each GPU variant. + +Here is a breakdown of the key specifications: + +**Performance (FLOPS & TOPS):** +* **FP64 (Double Precision):** The +======================================= +``` + +## Finetune Recipes + +- Available recipes: + - `qwen3_vl_8b_finetune_config`: Finetuning for 8B VL model with PEFT support + - `qwen3_vl_30b_a3b_finetune_config`: Finetuning for 30B-A3B VL model with PEFT support + - `qwen3_vl_235b_a22b_finetune_config`: Finetuning for 235B-A22B VL model with PEFT support + +Before training, ensure the following environment variables are set: +1. `HF_TOKEN`: to download models from HF Hub (if required) +2. `HF_HOME`: (optional) to avoid re-downloading models and datasets +3. `WANDB_API_KEY`: (optional) to enable WandB logging + +### Pretrain + +- Available recipes: + - `qwen3_vl_8b_pretrain_config`: Pretraining for 8B VL model with PEFT support + - `qwen3_vl_30b_a3b_pretrain_config`: Pretraining for 30B-A3B VL model with PEFT support + - `qwen3_vl_235b_a22b_pretrain_config`: Pretraining for 235B-A22B VL model with PEFT support + +### Supervised Fine-Tuning (SFT) + +See the [sft.sh](sft.sh) script for full parameter fine-tuning with configurable model parallelisms. + +W&B report coming soon. + +### Parameter-Efficient Fine-Tuning (PEFT) with LoRA + +See the [peft.sh](peft.sh) script for LoRA fine-tuning with configurable tensor and pipeline parallelism. + +W&B report coming soon. + +**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs. + +## Evaluation + +Coming soon. diff --git a/examples/models/vlm/qwen3_vl/conversion.sh b/examples/models/vlm/qwen3_vl/conversion.sh new file mode 100755 index 0000000000..1a9a20a798 --- /dev/null +++ b/examples/models/vlm/qwen3_vl/conversion.sh @@ -0,0 +1,47 @@ +#!/usr/bin/env bash +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Workspace directory for checkpoints and results +WORKSPACE=${WORKSPACE:-/workspace} + +# Import HF → Megatron for dense model +uv run python examples/conversion/convert_checkpoints.py import \ + --hf-model Qwen/Qwen3-VL-8B-Instruct \ + --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct + +# Export Megatron → HF for dense model +uv run python examples/conversion/convert_checkpoints.py export \ + --hf-model Qwen/Qwen3-VL-8B-Instruct \ + --megatron-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct/iter_0000000 \ + --hf-path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct-hf-export + +# Round-trip validation for dense model +uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \ + --hf-model-id Qwen/Qwen3-VL-8B-Instruct --tp 2 --pp 2 + +# Import HF → Megatron for MoE model +uv run python examples/conversion/convert_checkpoints.py import \ + --hf-model Qwen/Qwen3-VL-30B-A3B-Instruct \ + --megatron-path ${WORKSPACE}/models/Qwen3-VL-30B-A3B-Instruct + +# Export Megatron → HF for MoE model +uv run python examples/conversion/convert_checkpoints.py export \ + --hf-model Qwen/Qwen3-VL-30B-A3B-Instruct \ + --megatron-path ${WORKSPACE}/models/Qwen3-VL-30B-A3B-Instruct/iter_0000000 \ + --hf-path ${WORKSPACE}/models/Qwen3-VL-30B-A3B-Instruct-hf-export + +# Round-trip validation for MoE model +uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \ + --hf-model-id Qwen/Qwen3-VL-30B-A3B-Instruct --ep 8 diff --git a/examples/models/vlm/qwen3_vl/inference.sh b/examples/models/vlm/qwen3_vl/inference.sh new file mode 100755 index 0000000000..f4e9a8483f --- /dev/null +++ b/examples/models/vlm/qwen3_vl/inference.sh @@ -0,0 +1,70 @@ +#!/usr/bin/env bash +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Workspace directory for checkpoints and results +WORKSPACE=${WORKSPACE:-/workspace} + +# Inference with Hugging Face checkpoints - Dense model +uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ + --hf_model_path Qwen/Qwen3-VL-8B-Instruct \ + --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ + --prompt "Describe this image." \ + --max_new_tokens 100 \ + --tp 2 \ + --pp 2 + +# Inference with imported Megatron checkpoints - Dense model +uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ + --hf_model_path Qwen/Qwen3-VL-8B-Instruct \ + --megatron_model_path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct/iter_0000000 \ + --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ + --prompt "Describe this image." \ + --max_new_tokens 100 \ + --tp 2 \ + --pp 2 + +# Inference with exported HF checkpoints - Dense model +uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \ + --hf_model_path ${WORKSPACE}/models/Qwen3-VL-8B-Instruct-hf-export \ + --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ + --prompt "Describe this image." \ + --max_new_tokens 100 \ + --tp 2 \ + --pp 2 + +# Inference with Hugging Face checkpoints - MoE model +uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \ + --hf_model_path Qwen/Qwen3-VL-30B-A3B-Instruct \ + --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ + --prompt "Describe this image." \ + --max_new_tokens 100 \ + --ep 8 + +# Inference with imported Megatron checkpoints - MoE model +uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \ + --hf_model_path Qwen/Qwen3-VL-30B-A3B-Instruct \ + --megatron_model_path ${WORKSPACE}/models/Qwen3-VL-30B-A3B-Instruct/iter_0000000 \ + --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ + --prompt "Describe this image." \ + --max_new_tokens 100 \ + --ep 8 + +# Inference with exported HF checkpoints - MoE model +uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \ + --hf_model_path ${WORKSPACE}/models/Qwen3-VL-30B-A3B-Instruct-hf-export \ + --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \ + --prompt "Describe this image." \ + --max_new_tokens 100 \ + --ep 8 diff --git a/examples/models/vlm/qwen3_vl/peft.sh b/examples/models/vlm/qwen3_vl/peft.sh new file mode 100644 index 0000000000..6cddb470a0 --- /dev/null +++ b/examples/models/vlm/qwen3_vl/peft.sh @@ -0,0 +1,114 @@ +#!/usr/bin/env bash +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Workspace directory for checkpoints and results +WORKSPACE=${WORKSPACE:-/workspace} + +# Before training, make sure to set WANDB_API_KEY or disable wandb logging +# export WANDB_API_KEY= +# export WANDB_MODE=disabled + +# Common configurations for dense model finetuning +PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Qwen3-VL-8B-Instruct +MODEL_NAME=qwen3_vl_8b +DATASET_NAME=cord_v2 +SEQ_LENGTH=4096 +TRAIN_ITERS=50 +GLOBAL_BATCH_SIZE=32 +MICRO_BATCH_SIZE=1 +EVAL_ITERS=10 +LR=0.00005 +MIN_LR=0.000005 +LR_WARMUP_ITERS=10 +LOG_INTERVAL=1 +WANDB_PROJECT=megatron-bridge-${DATASET_NAME} + +# TP/PP combinations: "TP,PP" +PARALLELISM_CONFIGS=("2,1" "1,2") + +for config in "${PARALLELISM_CONFIGS[@]}"; do + IFS=',' read -r TP PP <<< "$config" + + echo "Running LoRA finetuning with TP=$TP, PP=$PP" + uv run python -m torch.distributed.run --nproc_per_node=2 scripts/training/run_recipe.py \ + --recipe ${MODEL_NAME}_finetune_config \ + --step_func vlm_step \ + --peft_scheme lora \ + checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \ + model.seq_length=$SEQ_LENGTH \ + train.train_iters=$TRAIN_ITERS \ + train.global_batch_size=$GLOBAL_BATCH_SIZE \ + train.micro_batch_size=$MICRO_BATCH_SIZE \ + train.eval_iters=$EVAL_ITERS \ + optimizer.lr=$LR \ + optimizer.min_lr=$MIN_LR \ + scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \ + checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_tp${TP}_pp${PP} \ + logger.log_interval=$LOG_INTERVAL \ + logger.wandb_project=$WANDB_PROJECT \ + logger.wandb_exp_name=${MODEL_NAME}_${DATASET_NAME}_lora_tp${TP}_pp${PP} \ + dataset.maker_name=make_${DATASET_NAME}_dataset \ + dataset.seq_length=$SEQ_LENGTH \ + model.tensor_model_parallel_size=$TP \ + model.pipeline_model_parallel_size=$PP +done + + +# Common configurations for MoE model finetuning +PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Qwen3-VL-30B-A3B-Instruct +MODEL_NAME=qwen3_vl_30b_a3b +DATASET_NAME=cord_v2 +SEQ_LENGTH=4096 +TRAIN_ITERS=50 +GLOBAL_BATCH_SIZE=32 +MICRO_BATCH_SIZE=1 +EVAL_ITERS=10 +LR=0.00005 +MIN_LR=0.000005 +LR_WARMUP_ITERS=10 +LOG_INTERVAL=1 +WANDB_PROJECT=megatron-bridge-${DATASET_NAME} + +# EP/TP/PP combinations: "EP,TP,PP" configurations +PARALLELISM_CONFIGS=("8,1,1" "1,4,2") + +for config in "${PARALLELISM_CONFIGS[@]}"; do + IFS=',' read -r EP TP PP <<< "$config" + + echo "Running LoRA finetuning with EP=$EP, TP=$TP, PP=$PP" + uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ + --recipe ${MODEL_NAME}_finetune_config \ + --step_func vlm_step \ + --peft_scheme lora \ + checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \ + model.seq_length=$SEQ_LENGTH \ + train.train_iters=$TRAIN_ITERS \ + train.global_batch_size=$GLOBAL_BATCH_SIZE \ + train.micro_batch_size=$MICRO_BATCH_SIZE \ + train.eval_iters=$EVAL_ITERS \ + optimizer.lr=$LR \ + optimizer.min_lr=$MIN_LR \ + scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \ + checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_ep${EP}_tp${TP}_pp${PP} \ + logger.log_interval=$LOG_INTERVAL \ + logger.wandb_project=$WANDB_PROJECT \ + logger.wandb_exp_name=${MODEL_NAME}_${DATASET_NAME}_lora_ep${EP}_tp${TP}_pp${PP} \ + dataset.maker_name=make_${DATASET_NAME}_dataset \ + dataset.seq_length=$SEQ_LENGTH \ + model.expert_model_parallel_size=$EP \ + model.tensor_model_parallel_size=$TP \ + model.pipeline_model_parallel_size=$PP +done + diff --git a/examples/models/vlm/qwen3_vl/sft.sh b/examples/models/vlm/qwen3_vl/sft.sh new file mode 100755 index 0000000000..0a26786273 --- /dev/null +++ b/examples/models/vlm/qwen3_vl/sft.sh @@ -0,0 +1,112 @@ +#!/usr/bin/env bash +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Workspace directory for checkpoints and results +WORKSPACE=${WORKSPACE:-/workspace} + +# Before training, make sure to set WANDB_API_KEY or disable wandb logging +# export WANDB_API_KEY= +# export WANDB_MODE=disabled + +# Common configurations for dense model finetuning +PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Qwen3-VL-8B-Instruct +MODEL_NAME=qwen3_vl_8b +DATASET_NAME=cord_v2 +SEQ_LENGTH=4096 +TRAIN_ITERS=50 +GLOBAL_BATCH_SIZE=32 +MICRO_BATCH_SIZE=1 +EVAL_ITERS=10 +LR=0.00005 +MIN_LR=0.000005 +LR_WARMUP_ITERS=10 +LOG_INTERVAL=1 +WANDB_PROJECT=megatron-bridge-${DATASET_NAME} + +# TP/PP combinations: "TP,PP" +PARALLELISM_CONFIGS=("2,1" "1,2") + +for config in "${PARALLELISM_CONFIGS[@]}"; do + IFS=',' read -r TP PP <<< "$config" + + echo "Running full finetuning with TP=$TP, PP=$PP" + uv run python -m torch.distributed.run --nproc_per_node=2 scripts/training/run_recipe.py \ + --recipe ${MODEL_NAME}_finetune_config \ + --step_func vlm_step \ + checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \ + model.seq_length=$SEQ_LENGTH \ + train.train_iters=$TRAIN_ITERS \ + train.global_batch_size=$GLOBAL_BATCH_SIZE \ + train.micro_batch_size=$MICRO_BATCH_SIZE \ + train.eval_iters=$EVAL_ITERS \ + optimizer.lr=$LR \ + optimizer.min_lr=$MIN_LR \ + scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \ + checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_sft_tp${TP}_pp${PP} \ + logger.log_interval=$LOG_INTERVAL \ + logger.wandb_project=$WANDB_PROJECT \ + logger.wandb_exp_name=${MODEL_NAME}_${DATASET_NAME}_sft_tp${TP}_pp${PP} \ + dataset.maker_name=make_${DATASET_NAME}_dataset \ + dataset.seq_length=$SEQ_LENGTH \ + model.tensor_model_parallel_size=$TP \ + model.pipeline_model_parallel_size=$PP +done + + +# Common configurations for MoE model finetuning +PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Qwen3-VL-30B-A3B-Instruct +MODEL_NAME=qwen3_vl_30b_a3b +DATASET_NAME=cord_v2 +SEQ_LENGTH=4096 +TRAIN_ITERS=50 +GLOBAL_BATCH_SIZE=32 +MICRO_BATCH_SIZE=1 +EVAL_ITERS=10 +LR=0.00005 +MIN_LR=0.000005 +LR_WARMUP_ITERS=10 +LOG_INTERVAL=1 +WANDB_PROJECT=megatron-bridge-${DATASET_NAME} + +# EP/TP/PP/SP combinations: "EP,TP,PP,SP" configurations +PARALLELISM_CONFIGS=("8,1,1,False" "1,4,2,False" "2,2,2,True") + +for config in "${PARALLELISM_CONFIGS[@]}"; do + IFS=',' read -r EP TP PP SP <<< "$config" + + echo "Running full finetuning with EP=$EP, TP=$TP, PP=$PP, SP=$SP" + uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ + --recipe ${MODEL_NAME}_finetune_config \ + --step_func vlm_step \ + checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \ + model.seq_length=$SEQ_LENGTH \ + train.train_iters=$TRAIN_ITERS \ + train.global_batch_size=$GLOBAL_BATCH_SIZE \ + train.micro_batch_size=$MICRO_BATCH_SIZE \ + train.eval_iters=$EVAL_ITERS \ + optimizer.lr=$LR \ + optimizer.min_lr=$MIN_LR \ + scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \ + checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_sft_ep${EP}_tp${TP}_pp${PP}_sp_${SP} \ + logger.log_interval=$LOG_INTERVAL \ + logger.wandb_project=$WANDB_PROJECT \ + logger.wandb_exp_name=${MODEL_NAME}_${DATASET_NAME}_sft_ep${EP}_tp${TP}_pp${PP}_sp_${SP} \ + dataset.maker_name=make_${DATASET_NAME}_dataset \ + dataset.seq_length=$SEQ_LENGTH \ + model.expert_model_parallel_size=$EP \ + model.tensor_model_parallel_size=$TP \ + model.pipeline_model_parallel_size=$PP \ + model.sequence_parallel=$SP +done