Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
fd6bd65
fix: recognize 'sft' and 'peft' keywords in train mode inference
yaoyu-33 Feb 26, 2026
d7a20c8
feat: add HF PEFT adapter export for LoRA/DoRA checkpoints
yaoyu-33 Feb 26, 2026
b551e45
test: add adapter export tests and README documentation
yaoyu-33 Feb 26, 2026
6461edc
chore: revert unintended 3rdparty/Megatron-LM submodule change
yaoyu-33 Feb 26, 2026
53a1d64
docs: use 'uv run python' in adapter example docstrings and READMEs
yaoyu-33 Feb 26, 2026
eb0ed32
feat: align adapter CLI args with merge_lora, add GPU/TP/PP support t…
yaoyu-33 Mar 5, 2026
0cfaf2b
fix: handle missing run_config.yaml and tied embeddings in verify_ada…
yaoyu-33 Mar 5, 2026
1531e5f
Merge branch 'main' into yuya/add-hf-adapter-export
yaoyu-33 Mar 5, 2026
40ca82b
build: add peft dependency and regenerate uv.lock
yaoyu-33 Mar 5, 2026
325d1cb
build: regenerate uv.lock with peft>=0.18.1 dependency
yaoyu-33 Mar 5, 2026
4d50f23
Merge branch 'main' into yuya/add-hf-adapter-export
yaoyu-33 Mar 6, 2026
6ee1925
update uv.lock
yaoyu-33 Mar 6, 2026
578403d
test: fix adapter export unit tests failing without parallel state init
yaoyu-33 Mar 7, 2026
d0c76b2
ci: empty commit to re-trigger CI
yaoyu-33 Mar 9, 2026
cd79d06
ci: re-trigger (nvrx/converter/nemotronh flaky again)
yaoyu-33 Mar 10, 2026
e2f8cc7
Merge remote-tracking branch 'origin/main' into yuya/add-hf-adapter-e…
yaoyu-33 Mar 10, 2026
6eb53b9
build: regenerate uv.lock after main merge
yaoyu-33 Mar 10, 2026
eabcce3
test: add unit tests for export_adapter_ckpt coverage
yaoyu-33 Mar 11, 2026
28aaf1a
Merge remote-tracking branch 'origin/main' into yuya/add-hf-adapter-e…
yaoyu-33 Mar 12, 2026
8fa62fa
build: regenerate uv.lock after main merge
yaoyu-33 Mar 12, 2026
1b02fa6
Merge remote-tracking branch 'origin/main' into yuya/add-hf-adapter-e…
yaoyu-33 Mar 13, 2026
f2104df
build: regenerate uv.lock after main merge
yaoyu-33 Mar 13, 2026
4e1fdd2
Merge branch 'main' into yuya/add-hf-adapter-export
yaoyu-33 Mar 14, 2026
3ba210d
Fix uv.lock
ko3n1g Mar 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions examples/conversion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -449,3 +449,27 @@ Each log entry captures detailed tensor information for every module:
- **Model Verification**: Compare intermediate results between HuggingFace and Megatron models
- **Numerical Debugging**: Identify divergence points in model conversion

### 9. `adapter/` — LoRA/DoRA Adapter Export & Verification

Scripts for exporting Megatron-Bridge LoRA/DoRA adapter weights to HuggingFace PEFT format and verifying correctness. See [`adapter/README.md`](adapter/README.md) for full details.

| Script | Description |
|---|---|
| `adapter/export_adapter.py` | Export a Megatron PEFT checkpoint to HF PEFT format (CPU-only) |
| `adapter/verify_adapter.py` | Verify exported adapter via logit comparison |
| `adapter/stream_adapter_weights.py` | Stream individual adapter tensors for custom workflows |

**Quick start:**
```bash
# Export
uv run python examples/conversion/adapter/export_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--megatron-peft-checkpoint /path/to/finetune_ckpt \
--output-hf-path ./my_adapter

# Verify
uv run python examples/conversion/adapter/verify_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter
```

153 changes: 153 additions & 0 deletions examples/conversion/adapter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# Adapter Export & Verification

Scripts for exporting Megatron-Bridge LoRA/DoRA adapter weights to HuggingFace PEFT format and verifying the results.

## Overview

After fine-tuning a model with LoRA (or DoRA) in Megatron-Bridge, the adapter
weights live inside a Megatron distributed checkpoint. The scripts in this
directory let you:

1. **Export** the adapter to a HuggingFace PEFT-compatible directory
(`adapter_config.json` + `adapter_model.safetensors`).
2. **Verify** the export by loading it with the `peft` library and comparing
logits against the Megatron checkpoint.
3. **Stream** individual adapter tensors from a Megatron model for inspection
or custom workflows.

The exported adapter can be loaded with standard HuggingFace tooling:

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
model = PeftModel.from_pretrained(base, "./my_adapter")
```

## Scripts

### 1. `export_adapter.py` — Checkpoint Export

Converts a Megatron-Bridge PEFT checkpoint to HuggingFace PEFT format. Runs
entirely on CPU — no GPU required.

```bash
uv run python examples/conversion/adapter/export_adapter.py \
--hf-model-path meta-llama/Llama-3.2-1B \
--lora-checkpoint /path/to/finetune_ckpt \
--output ./my_adapter
```

| Argument | Description |
|---|---|
| `--hf-model-path` | HuggingFace model name or local path (architecture + base weights) |
| `--lora-checkpoint` | Path to the Megatron-Bridge distributed checkpoint containing LoRA adapter weights |
| `--output` | Output directory (default: `./my_adapter`) |
| `--trust-remote-code` | Allow custom code from the HuggingFace repository |

**Output structure:**

```
my_adapter/
├── adapter_config.json
└── adapter_model.safetensors
```
Comment on lines +51 to +55
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to the fenced output block.

The output structure block is missing a fence language and triggers MD040.

Suggested markdownlint fix
-```
+```text
 my_adapter/
 ├── adapter_config.json
 └── adapter_model.safetensors
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/conversion/adapter/README.md` around lines 51 - 55, The fenced
output block showing the adapter file tree is missing a language label (triggers
MD040); update the opening fence from ``` to ```text for the block containing
"my_adapter/ ├── adapter_config.json └── adapter_model.safetensors" (in the
README's fenced code block) so the snippet is marked as plain text—apply the
same change to any other similar unlabeled file-tree fences in the README.


### 2. `verify_adapter.py` — Export Verification

Loads the exported adapter with the `peft` library and runs verification
checks:

- The PEFT model logits must differ from the base model (adapter has effect).
- When `--lora-checkpoint` is provided, the top-k predicted tokens
from the PEFT model must match those from the Megatron model with merged
weights.

Supports CPU-only, single-GPU, and multi-GPU (TP/PP) modes.

```bash
# Quick check (PEFT-only, no Megatron comparison, CPU)
uv run python examples/conversion/adapter/verify_adapter.py \
--hf-model-path meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter \
--cpu

# Full verification on GPU (single GPU)
uv run python examples/conversion/adapter/verify_adapter.py \
--hf-model-path meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter \
--lora-checkpoint /path/to/finetune_ckpt/iter_0000020

# Multi-GPU with TP=2
uv run python -m torch.distributed.run --nproc_per_node=2 \
examples/conversion/adapter/verify_adapter.py \
--hf-model-path meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter \
--lora-checkpoint /path/to/finetune_ckpt/iter_0000020 \
--tp 2

# Multi-GPU with PP=4
uv run python -m torch.distributed.run --nproc_per_node=4 \
examples/conversion/adapter/verify_adapter.py \
--hf-model-path meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter \
--lora-checkpoint /path/to/finetune_ckpt/iter_0000020 \
--pp 4
```
Comment on lines +35 to +97
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align command examples with uv run usage.

Use uv run python ... in this README to match the rest of the conversion examples and avoid environment mismatch.

Suggested doc fix
-python examples/conversion/adapter/export_adapter.py \
+uv run python examples/conversion/adapter/export_adapter.py \
     --hf-model-id meta-llama/Llama-3.2-1B \
     --megatron-peft-checkpoint /path/to/finetune_ckpt \
     --output-hf-path ./my_adapter
@@
-python examples/conversion/adapter/verify_adapter.py \
+uv run python examples/conversion/adapter/verify_adapter.py \
     --hf-model-id meta-llama/Llama-3.2-1B \
     --hf-adapter-path ./my_adapter
@@
-python examples/conversion/adapter/verify_adapter.py \
+uv run python examples/conversion/adapter/verify_adapter.py \
     --hf-model-id meta-llama/Llama-3.2-1B \
     --hf-adapter-path ./my_adapter \
     --megatron-peft-checkpoint /path/to/finetune_ckpt/iter_0000020
As per coding guidelines, `{**/*.sh,examples/**/*.py}: Use 'uv run' to execute scripts instead of activating a virtual environment and calling 'python' directly`.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/conversion/adapter/README.md` around lines 35 - 78, Update the
example command invocations in README.md to use the project's recommended runner
by prefixing each "python" call with "uv run" for both export_adapter.py and
verify_adapter.py examples (e.g., change the export command invoking
export_adapter.py with --hf-model-id, --megatron-peft-checkpoint,
--output-hf-path and the verify commands invoking verify_adapter.py with
--hf-model-id, --hf-adapter-path, --megatron-peft-checkpoint to use "uv run
python ..."). Ensure every occurrence of "python
examples/conversion/adapter/export_adapter.py" and "python
examples/conversion/adapter/verify_adapter.py" in the README is replaced so the
shown CLI examples match the rest of the conversion docs.


| Argument | Description |
|---|---|
| `--hf-model-path` | HuggingFace base model name or path |
| `--hf-adapter-path` | Exported HF PEFT adapter directory |
| `--lora-checkpoint` | *(optional)* Megatron checkpoint iter directory for cross-check |
| `--prompt` | Prompt for the forward pass (default: `"The capital of France is"`) |
| `--top-k` | Number of top tokens to compare (default: `5`) |
| `--tp` | Tensor parallel size (default: `1`) |
| `--pp` | Pipeline parallel size (default: `1`) |
| `--ep` | Expert parallel size (default: `1`) |
| `--cpu` | Run entirely on CPU (no GPU required, TP/PP/EP must be 1) |

### 3. `stream_adapter_weights.py` — Low-Level Adapter Streaming

Demonstrates how to use `AutoBridge.export_adapter_weights` to iterate through
adapter tensors one at a time. Useful for custom export pipelines or debugging.

Requires a GPU (uses NCCL backend).

```bash
# Single GPU
uv run python examples/conversion/adapter/stream_adapter_weights.py \
--output ./adapters/demo_lora.safetensors

# Multi-GPU (tensor + pipeline parallelism)
uv run python -m torch.distributed.run --nproc_per_node=4 \
examples/conversion/adapter/stream_adapter_weights.py \
--tensor-model-parallel-size 2 \
--pipeline-model-parallel-size 2 \
--output ./adapters/demo_tp2_pp2.safetensors
```

## Programmatic API

The same functionality is available directly through `AutoBridge`:

```python
from megatron.bridge import AutoBridge

bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.2-1B")

# One-liner: checkpoint → HF PEFT directory
bridge.export_adapter_ckpt(
peft_checkpoint="/path/to/finetune_ckpt",
output_path="./my_adapter",
)

# Or, if you already have a model in memory:
bridge.save_hf_adapter(
model=megatron_model,
path="./my_adapter",
peft_config=lora,
base_model_name_or_path="meta-llama/Llama-3.2-1B",
)
```
79 changes: 79 additions & 0 deletions examples/conversion/adapter/export_adapter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/env python3
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Export LoRA adapter weights from a Megatron-Bridge PEFT checkpoint to
HuggingFace PEFT format (``adapter_config.json`` + ``adapter_model.safetensors``).

No GPU required -- runs entirely on CPU.

The output can be loaded directly with::

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("<hf-model-path>")
model = PeftModel.from_pretrained(base, "./my_adapter")

Usage::

uv run python examples/conversion/adapter/export_adapter.py \\
--hf-model-path meta-llama/Llama-3.2-1B \\
--lora-checkpoint /path/to/finetune_ckpt \\
--output ./my_adapter
"""

from __future__ import annotations

import argparse
from pathlib import Path

from megatron.bridge import AutoBridge


def parse_args() -> argparse.Namespace:
"""Parse command-line arguments."""
parser = argparse.ArgumentParser(
description="Export Megatron-Bridge LoRA adapter to HuggingFace PEFT format",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
parser.add_argument(
"--hf-model-path",
required=True,
help="HuggingFace model name or local path (architecture + base weights).",
)
parser.add_argument(
"--lora-checkpoint",
required=True,
help="Megatron-Bridge distributed checkpoint containing LoRA adapter weights.",
)
parser.add_argument("--output", type=Path, default=Path("./my_adapter"))
parser.add_argument("--trust-remote-code", action="store_true")
return parser.parse_args()


def main() -> None:
"""Export a Megatron-Bridge PEFT checkpoint to HuggingFace PEFT format."""
args = parse_args()

bridge = AutoBridge.from_hf_pretrained(args.hf_model_path, trust_remote_code=args.trust_remote_code)
bridge.export_adapter_ckpt(
peft_checkpoint=args.lora_checkpoint,
output_path=args.output,
)


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@

Run the example:

uv run python examples/conversion/stream_adapter_weights.py \
uv run python examples/conversion/adapter/stream_adapter_weights.py \
--output ./adapters/demo.safetensors

Multi-GPU launch (torchrun) with tensor/pipeline/expert parallelism:

uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/stream_adapter_weights.py \
uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/adapter/stream_adapter_weights.py \
--tensor-model-parallel-size 2 \
--pipeline-model-parallel-size 2 \
--expert-model-parallel-size 1 \
Expand Down
Loading
Loading