Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1e63871
chore: Update docs (#2178)
ko3n1g Feb 2, 2026
7f61c05
cp: `Dsv3 Recipe Update (2152)` into `r0.3.0` (#2186)
ko3n1g Feb 3, 2026
bf5ee44
cp: `Revert packed seq extra checks (2180)` into `r0.3.0` (#2196)
ko3n1g Feb 3, 2026
48a27fa
cp: `DSv3 EP=8 for B200, PP8-VP2 for B300 BF16, Lm3.1 405B TP4-CP1 GB…
ko3n1g Feb 3, 2026
77f5503
cp: `[docs] add MTP guide (2138)` into `r0.3.0` (#2202)
ko3n1g Feb 5, 2026
c22f858
cp: `add peft to recipe qwen3vl (2023)` into `r0.3.0` (#2220)
ko3n1g Feb 5, 2026
3cb4dee
cp: `[doc, model] feat: Add GLM-4.5V VL examples and update Gemma 3 V…
ko3n1g Feb 5, 2026
f91a086
cp: `[docs, model] Add Ministral 3 Examples (2139)` into `r0.3.0` (#2…
ko3n1g Feb 5, 2026
94af2ed
cp: `ci(fix): Wheel build (2192)` into `r0.3.0` (#2238)
ko3n1g Feb 6, 2026
98762f1
cp: `chore: Expose custom bash cmds (2237)` into `r0.3.0` (#2243)
ko3n1g Feb 6, 2026
37ba134
cp: `Fix Qwen2.5-VL huggingface conversion issue (#2107) (2156)` into…
ko3n1g Feb 6, 2026
ae58d30
cp: `fix: Use nargs for `custom_bash_cmds` (2261)` into `r0.3.0` (#2262)
ko3n1g Feb 6, 2026
b6661ea
cp: `gb300 lm3.1 495b nvfp4 fix (2258)` into `r0.3.0` (#2259)
ko3n1g Feb 6, 2026
241572b
cp: `Fix: perf script ddp nccl-ub (2158)` into `r0.3.0` (#2217)
ko3n1g Feb 6, 2026
d7a13b1
cp: `Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resolve NaN gr…
ko3n1g Feb 6, 2026
78a5eba
cp: `b300 dsv3 bf16 hang fix (2260)` into `r0.3.0` (#2270)
ko3n1g Feb 7, 2026
98506a7
chore: Change submodule pointer for release (#2191)
ko3n1g Feb 7, 2026
8ae972e
cp: `feat: Add dataset compile helper (#2236)` (#2249)
ko3n1g Feb 7, 2026
843c2d7
Revert "cp: `Dsv3 Recipe Update (2152)` into `r0.3.0` (#2186)"
ko3n1g Feb 7, 2026
6d665f8
fix no submodule checkout
ko3n1g Feb 7, 2026
34aec47
Revert "cp: `Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resolv…
ko3n1g Feb 7, 2026
861bbdd
Merge pull request #2271 from NVIDIA-NeMo/ko3n1g/fix/r030
ko3n1g Feb 7, 2026
f2fee27
Reapply "cp: `Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resol…
ko3n1g Feb 8, 2026
595e767
Reapply "cp: `Dsv3 Recipe Update (2152)` into `r0.3.0` (#2186)"
ko3n1g Feb 8, 2026
a7a840d
Merge pull request #2273 from NVIDIA-NeMo/ko3n1g/chore/reapply-2152-a…
ko3n1g Feb 8, 2026
1db8398
cp: `dsv3_gb300_revert- BF16 & FP8-MX scale (#2277)` (#2286)
ko3n1g Feb 9, 2026
b39bd94
cp: mlflow upgrade (#2281)
ko3n1g Feb 9, 2026
be11e50
cp: `build: Address CVE-2025-68973` (#2290)
ko3n1g Feb 9, 2026
b10d7e3
cp: `docs: Update callback code snippets to include all imports neede…
ko3n1g Feb 10, 2026
669ad62
cp: `build: Bump modelopt and TE (2304)` into `r0.3.0` (#2314)
ko3n1g Feb 10, 2026
0a5db7e
cp: `Enabling TP Comm Overlap and Packed Sequencing Configs for LLAMA…
ko3n1g Feb 10, 2026
c9dcdb6
cp: `Updating Configs for LLAMA3 70B LoRa (2292)` into `r0.3.0` (#2311)
ko3n1g Feb 10, 2026
800e3ba
cp: `LLAMA3 70B: LoRa enabled in all modules instead of only LinearQK…
ko3n1g Feb 10, 2026
acc61ed
cp: `[training] fix: Add cu_seqlens_argmin to vlm packed sequence (22…
ko3n1g Feb 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 56 additions & 50 deletions .github/workflows/build-test-publish-wheel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Build, test, and publish a PyPi wheel (to testpypi).

on:
Expand All @@ -35,55 +34,62 @@ concurrency:

jobs:
pre-flight:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.64.2
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.69.1
with:
default_runner_prefix: ${{ vars.DEFAULT_RUNNER_PREFIX }}
non_nvidia_runner_prefix: ${{ vars.NON_NVIDIA_RUNNER_PREFIX }}
default_test_data_path: ${{ vars.DEFAULT_TEST_DATA_PATH }}
non_nvidia_test_data_path: ${{ vars.NON_NVIDIA_TEST_DATA_PATH }}
secrets:
NVIDIA_MANAGEMENT_ORG_PAT: ${{ secrets.NVIDIA_MANAGEMENT_ORG_PAT }}

# build-test-publish-wheel:
# needs: [pre-flight]
# if: |
# !(needs.pre-flight.outputs.docs_only == 'true'
# || needs.pre-flight.outputs.is_deployment_workflow == 'true')
# uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_build_test_publish_wheel.yml@v0.65.1
# with:
# dry-run: true
# python-package: megatron.bridge
# python-version: "3.10"
# packaging: uv
# no-publish: ${{ !(github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/r')) }}
# has-src-dir: true
# skip-test-wheel: true
# custom-container: nvcr.io/nvidia/pytorch:25.05-py3
# runner: self-hosted-nemo
# no-build-isolation: true
# submodules: recursive
# container-options: "--gpus all --runtime=nvidia"
# secrets:
# TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }}
# TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }}
# SLACK_WEBHOOK: ${{ secrets.SLACK_RELEASE_ENDPOINT }}
# SLACK_WEBHOOK_ADMIN: ${{ secrets.SLACK_WEBHOOK_ADMIN }}
# GH_TOKEN: ${{ secrets.PAT }}
build-test-publish-wheel:
needs: [pre-flight]
if: |
!(needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true')
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_build_test_publish_wheel.yml@v0.70.1
with:
dry-run: true
python-package: megatron.bridge
python-version: "3.10"
packaging: uv
no-publish: ${{ !(github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/r')) }}
has-src-dir: true
skip-test-wheel: true
custom-container: nvcr.io/nvidia/pytorch:25.11-py3
runner: ${{ needs.pre-flight.outputs.runner_prefix }}-gpu-x2-container
no-build-isolation: true
submodules: recursive
container-options: "--gpus all --runtime=nvidia"
secrets:
TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }}
TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }}
SLACK_WEBHOOK: ${{ secrets.SLACK_RELEASE_ENDPOINT }}
SLACK_WEBHOOK_ADMIN: ${{ secrets.SLACK_WEBHOOK_ADMIN }}
GH_TOKEN: ${{ secrets.PAT }}

# build-test-publish-wheel-summary:
# needs: [pre-flight, build-test-publish-wheel]
# if: |
# (
# needs.pre-flight.outputs.docs_only == 'true'
# || needs.pre-flight.outputs.is_deployment_workflow == 'true'
# || always()
# )
# && !cancelled()
# runs-on: ubuntu-latest
# steps:
# - name: Result
# run: |
# FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0
build-test-publish-wheel-summary:
needs: [pre-flight, build-test-publish-wheel]
if: |
(
needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true'
|| always()
)
&& !cancelled()
runs-on: ubuntu-latest
steps:
- name: Result
run: |
FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0

# if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then
# echo "✅ All previous jobs completed successfully"
# exit 0
# else
# echo "❌ Found $FAILED_JOBS failed job(s)"
# # Show which jobs failed
# gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name'
# exit 1
# fi
if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then
echo "✅ All previous jobs completed successfully"
exit 0
else
echo "❌ Found $FAILED_JOBS failed job(s)"
# Show which jobs failed
gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name'
exit 1
fi
25 changes: 13 additions & 12 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,32 +25,32 @@ on:
workflow_dispatch:
inputs:
mcore_commit:
description: 'MCore commit SHA to test against'
description: "MCore commit SHA to test against"
required: false
type: string
mcore_branch:
description: 'MCore branch name (for reference)'
description: "MCore branch name (for reference)"
required: false
type: string
mcore_repo:
description: 'MCore repository URL (for fetching from forks)'
description: "MCore repository URL (for fetching from forks)"
required: false
type: string
default: 'https://github.com/NVIDIA/Megatron-LM.git'
default: "https://github.com/NVIDIA/Megatron-LM.git"
test_suite:
description: 'Test suite to run'
description: "Test suite to run"
required: false
type: choice
options:
- 'all'
- 'unit-only'
- 'functional-only'
default: 'all'
- "all"
- "unit-only"
- "functional-only"
default: "all"
triggered_by:
description: 'Trigger source (for tracking)'
description: "Trigger source (for tracking)"
required: false
type: string
default: 'manual'
default: "manual"

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-${{ github.event.label.name || 'main' }}-${{ github.event_name }}
Expand Down Expand Up @@ -393,8 +393,9 @@ jobs:
- script: L2_Launch_models_nemotron_vl
- script: L2_Launch_models_olmoe
- script: L2_Launch_models_qwen
- script: L2_Launch_models_qwen_quantization
# - script: L2_Launch_models_qwen_quantization
- script: L2_Launch_models_qwen_vl
- script: L2_Launch_recipes_gemma_vl
- script: L2_Launch_recipes_gpt_oss
- script: L2_Launch_recipes_llama_1b
- script: L2_Launch_recipes_llama_3b
Expand Down
2 changes: 1 addition & 1 deletion 3rdparty/Megatron-LM
Submodule Megatron-LM updated 448 files
6 changes: 5 additions & 1 deletion docker/Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@ ENV UV_LINK_MODE=copy
ENV UV_VERSION="0.7.2"

RUN curl -LsSf https://astral.sh/uv/${UV_VERSION}/install.sh | sh && \
uv venv ${UV_PROJECT_ENVIRONMENT} --system-site-packages
uv venv ${UV_PROJECT_ENVIRONMENT} --system-site-packages && \
# Address CVE-2025-68973
apt-get update && apt install -y --only-upgrade gnupg && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

COPY pyproject.toml uv.lock /opt/Megatron-Bridge/
COPY src/megatron/bridge/__init__.py src/megatron/bridge/package_info.py /opt/Megatron-Bridge/src/megatron/bridge/
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
project = "Megatron Bridge"
copyright = "2025, NVIDIA Corporation"
author = "NVIDIA Corporation"
release = "latest"
release = "0.3.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
Binary file added docs/images/mtp_loss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mtp_loss_comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ training/activation-recomputation.md
training/cpu-offloading.md
training/peft.md
training/packed-sequences.md
training/multi-token-prediction.md
training/distillation.md
training/callbacks.md
```
Expand Down
2 changes: 1 addition & 1 deletion docs/models/llm/gemma3.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ torchrun --nproc-per-node=8 run/run_recipe.py \
- Gemma 3 1B: https://huggingface.co/google/gemma-3-1b-it

## Related Docs
- Gemma3 Vision-Language Models: [Gemma 3 VL](../vlm/gemma3-vl.md)
- Gemma3 Vision-Language Models: [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md)
- Recipe usage: [Recipe usage](../../recipe-usage.md)
- Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
- Training entry points: [Entry points](../../training/entry-points.md)
1 change: 1 addition & 0 deletions docs/models/vlm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Megatron Bridge supports the following VLM families:
| Model | Documentation | Description |
|-------|---------------|-------------|
| **Gemma 3 VL** | [gemma3-vl.md](gemma3-vl.md) | Google Gemma 3 Vision Language model |
| **Ministral 3** | [ministral3.md](ministral3.md) | Ministral 3 Vision Language model |
| **Nemotron Nano V2 VL** | [nemotron-nano-v2-vl.md](nemotron-nano-v2-vl.md) | NVIDIA Nemotron Nano V2 Vision Language model |
| **Qwen2.5 VL** | [qwen2.5-vl.md](qwen2.5-vl.md) | Alibaba Cloud Qwen2.5 Vision Language model |
| **Qwen3 VL** | [qwen3-vl.md](qwen3-vl.md) | Alibaba Cloud Qwen3 Vision Language model |
Expand Down
159 changes: 2 additions & 157 deletions docs/models/vlm/gemma3-vl.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,163 +44,9 @@ Gemma 3 VL builds on the Gemma 3 architecture with additional multimodal capabil
- **Multimodal Integration**: Seamless integration of visual and textual information through learned projection layers
- **Flexible Image Handling**: Supports variable resolution images and multiple images per conversation

## Conversion with 🤗 Hugging Face

### Import HF → Megatron
To import the HF VL model to your desired Megatron path:
```bash
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-3-4b-it \
--megatron-path /models/gemma-3-4b-it
```

### Export Megatron → HF
```bash
python examples/conversion/convert_checkpoints.py export \
--hf-model google/gemma-3-4b-it \
--megatron-path /results/gemma3_vl_4b/checkpoints/iter_00001000 \
--hf-path ./gemma3-vl-hf-export
```

### Run Inference on Converted Checkpoint

```bash
python examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path google/gemma-3-4b-it \
--megatron_model_path /models/gemma-3-4b-it \
--image_path <example image path> \
--prompt "Describe this image." \
--max_new_tokens 100
```

Note:
- `--megatron_model_path` is optional. If not specified, the script will convert the model and then run forward.
- You can also use image URLs: `--image_path="https://example.com/image.jpg"`

## Finetune Recipes

- See: [bridge.recipes.gemma3_vl](../../apidocs/bridge/bridge.recipes.gemma3_vl.md)
- Available recipes:
- `gemma3_vl_4b_finetune_config`: Finetuning for 4B VL model with PEFT support
- `gemma3_vl_12b_finetune_config`: Finetuning for 12B VL model with PEFT support
- `gemma3_vl_27b_finetune_config`: Finetuning for 27B VL model with PEFT support

Before training, ensure the following environment variables are set:
1. `SAVE_DIR`: checkpoint and log saving directory
2. `HF_TOKEN`: to download models from HF Hub (if required)
3. `HF_HOME`: (optional) to avoid re-downloading models and datasets
4. `WANDB_API_KEY`: (optional) to enable WandB logging

### Full Finetuning

```bash
torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
--pretrained-checkpoint /models/gemma-3-4b-it \
--recipe gemma3_vl_4b_finetune_config \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma3_vl_4b_finetune
```

Or programmatically:
```python
from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config

# Full finetuning
config = gemma3_vl_4b_finetune_config(
name="gemma3_vl_4b_full_finetune",
pretrained_checkpoint="/models/gemma-3-4b-it",
dataset_type="hf",
peft=None,
train_iters=1000,
global_batch_size=64,
)
```

### Parameter-Efficient Finetuning (PEFT) with LoRA

```bash
torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
--pretrained-checkpoint /models/gemma-3-4b-it \
--recipe gemma3_vl_4b_finetune_config \
--peft_scheme lora \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=128 \
checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora
```

PEFT options:
- `--peft_scheme`: Set to `lora` for LoRA or `dora` for DoRA. Omit for full finetuning.

You can also combine PEFT with freeze options:
- `model.freeze_language_model=True`: Freeze the language model
- `model.freeze_vision_model=True`: Freeze the vision encoder
- `model.freeze_vision_projection=True`: Freeze the vision projection layer

Example with freeze options:
```bash
torchrun --nproc-per-node=8 run/run_vlm_recipe.py \
--pretrained-checkpoint /models/gemma-3-4b-it \
--recipe gemma3_vl_4b_finetune_config \
--peft_scheme lora \
model.freeze_language_model=True \
model.freeze_vision_model=False \
checkpoint.save=$SAVE_DIR/gemma3_vl_4b_lora_vision
```

Programmatic configuration:
```python
from megatron.bridge.recipes.gemma3_vl import gemma3_vl_4b_finetune_config

# LoRA finetuning
config = gemma3_vl_4b_finetune_config(
name="gemma3_vl_4b_lora_finetune",
pretrained_checkpoint="/models/gemma-3-4b-it",
dataset_type="hf",
peft="lora", # or "dora"
train_iters=1000,
global_batch_size=128,
)

# LoRA with vision model frozen
config = gemma3_vl_4b_finetune_config(
name="gemma3_vl_4b_lora_language_only",
pretrained_checkpoint="/models/gemma-3-4b-it",
peft="lora",
freeze_vision_model=True,
freeze_vision_projection=True,
)
```

### Recommended Configurations

| Model | Mode | TP | PP | Global Batch Size | Learning Rate | Hardware |
|-------|------|----|----|-------------------|---------------|----------|
| Gemma 3 VL 4B | Full SFT | 1 | 1 | 32-64 | 5e-6 | 8 GPUs |
| Gemma 3 VL 4B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
| Gemma 3 VL 12B | Full SFT | 4 | 1 | 32-64 | 5e-6 | 8 GPUs |
| Gemma 3 VL 12B | LoRA/DoRA | 1 | 1 | 64-128 | 1e-4 | 8 GPUs |
| Gemma 3 VL 27B | Full SFT | 8 | 2 | 16-32 | 5e-6 | 16 GPUs |
| Gemma 3 VL 27B | LoRA/DoRA | 4 | 1 | 32-64 | 1e-4 | 16 GPUs |

**Note:** LoRA/DoRA significantly reduces memory requirements, allowing for larger batch sizes and fewer GPUs.

## Example Datasets

| Dataset | Maker Name | Description |
|---------|------------|-------------|
| [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) | `make_cord_v2_dataset` | OCR receipts: Single-image-text dataset for receipt understanding |
| [MedPix-VQA](https://huggingface.co/datasets/mmoukouba/MedPix-VQA) | `make_medpix_dataset` | Medical VQA: Single-image Q&A for clinical images |
| [The Cauldron (Raven subset)](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | `make_raven_dataset` | Visual reasoning: Multi-image analogical reasoning |

To change the dataset, specify `dataset.maker_name=<maker_name>` in your command.

## Examples
- Checkpoint import/export: [examples/conversion/convert_checkpoints.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/convert_checkpoints.py)
- Generate with VLM (HF→Megatron): [examples/conversion/hf_to_megatron_generate_vlm.py](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/hf_to_megatron_generate_vlm.py)

For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the [Gemma 3 VL Examples](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/models/vlm/gemma3_vl/README.md).

## Hugging Face Model Cards

Expand All @@ -213,4 +59,3 @@ To change the dataset, specify `dataset.maker_name=<maker_name>` in your command
- Recipe usage: [Recipe usage](../../recipe-usage.md)
- Customizing the training recipe configuration: [Configuration overview](../../training/config-container-overview.md)
- Training entry points: [Entry points](../../training/entry-points.md)

Loading
Loading