-
Notifications
You must be signed in to change notification settings - Fork 239
[peft][ckpt] feat: add HF PEFT adapter export for LoRA/DoRA checkpoints #2574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fd6bd65
d7a20c8
b551e45
6461edc
53a1d64
eb0ed32
0cfaf2b
1531e5f
40ca82b
325d1cb
4d50f23
6ee1925
578403d
d0c76b2
cd79d06
e2f8cc7
6eb53b9
eabcce3
28aaf1a
8fa62fa
1b02fa6
f2104df
4e1fdd2
3ba210d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| # Adapter Export & Verification | ||
|
|
||
| Scripts for exporting Megatron-Bridge LoRA/DoRA adapter weights to HuggingFace PEFT format and verifying the results. | ||
|
|
||
| ## Overview | ||
|
|
||
| After fine-tuning a model with LoRA (or DoRA) in Megatron-Bridge, the adapter | ||
| weights live inside a Megatron distributed checkpoint. The scripts in this | ||
| directory let you: | ||
|
|
||
| 1. **Export** the adapter to a HuggingFace PEFT-compatible directory | ||
| (`adapter_config.json` + `adapter_model.safetensors`). | ||
| 2. **Verify** the export by loading it with the `peft` library and comparing | ||
| logits against the Megatron checkpoint. | ||
| 3. **Stream** individual adapter tensors from a Megatron model for inspection | ||
| or custom workflows. | ||
|
|
||
| The exported adapter can be loaded with standard HuggingFace tooling: | ||
|
|
||
| ```python | ||
| from peft import PeftModel | ||
| from transformers import AutoModelForCausalLM | ||
|
|
||
| base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") | ||
| model = PeftModel.from_pretrained(base, "./my_adapter") | ||
| ``` | ||
|
|
||
| ## Scripts | ||
|
|
||
| ### 1. `export_adapter.py` — Checkpoint Export | ||
|
|
||
| Converts a Megatron-Bridge PEFT checkpoint to HuggingFace PEFT format. Runs | ||
| entirely on CPU — no GPU required. | ||
|
|
||
| ```bash | ||
| uv run python examples/conversion/adapter/export_adapter.py \ | ||
| --hf-model-path meta-llama/Llama-3.2-1B \ | ||
| --lora-checkpoint /path/to/finetune_ckpt \ | ||
| --output ./my_adapter | ||
| ``` | ||
|
|
||
| | Argument | Description | | ||
| |---|---| | ||
| | `--hf-model-path` | HuggingFace model name or local path (architecture + base weights) | | ||
| | `--lora-checkpoint` | Path to the Megatron-Bridge distributed checkpoint containing LoRA adapter weights | | ||
| | `--output` | Output directory (default: `./my_adapter`) | | ||
| | `--trust-remote-code` | Allow custom code from the HuggingFace repository | | ||
|
|
||
| **Output structure:** | ||
|
|
||
| ``` | ||
| my_adapter/ | ||
| ├── adapter_config.json | ||
| └── adapter_model.safetensors | ||
| ``` | ||
|
|
||
| ### 2. `verify_adapter.py` — Export Verification | ||
|
|
||
| Loads the exported adapter with the `peft` library and runs verification | ||
| checks: | ||
|
|
||
| - The PEFT model logits must differ from the base model (adapter has effect). | ||
| - When `--lora-checkpoint` is provided, the top-k predicted tokens | ||
| from the PEFT model must match those from the Megatron model with merged | ||
| weights. | ||
|
|
||
| Supports CPU-only, single-GPU, and multi-GPU (TP/PP) modes. | ||
|
|
||
| ```bash | ||
| # Quick check (PEFT-only, no Megatron comparison, CPU) | ||
| uv run python examples/conversion/adapter/verify_adapter.py \ | ||
| --hf-model-path meta-llama/Llama-3.2-1B \ | ||
| --hf-adapter-path ./my_adapter \ | ||
| --cpu | ||
|
|
||
| # Full verification on GPU (single GPU) | ||
| uv run python examples/conversion/adapter/verify_adapter.py \ | ||
| --hf-model-path meta-llama/Llama-3.2-1B \ | ||
| --hf-adapter-path ./my_adapter \ | ||
| --lora-checkpoint /path/to/finetune_ckpt/iter_0000020 | ||
|
|
||
| # Multi-GPU with TP=2 | ||
| uv run python -m torch.distributed.run --nproc_per_node=2 \ | ||
| examples/conversion/adapter/verify_adapter.py \ | ||
| --hf-model-path meta-llama/Llama-3.2-1B \ | ||
| --hf-adapter-path ./my_adapter \ | ||
| --lora-checkpoint /path/to/finetune_ckpt/iter_0000020 \ | ||
| --tp 2 | ||
|
|
||
| # Multi-GPU with PP=4 | ||
| uv run python -m torch.distributed.run --nproc_per_node=4 \ | ||
| examples/conversion/adapter/verify_adapter.py \ | ||
| --hf-model-path meta-llama/Llama-3.2-1B \ | ||
| --hf-adapter-path ./my_adapter \ | ||
| --lora-checkpoint /path/to/finetune_ckpt/iter_0000020 \ | ||
| --pp 4 | ||
| ``` | ||
|
Comment on lines
+35
to
+97
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Align command examples with Use Suggested doc fix-python examples/conversion/adapter/export_adapter.py \
+uv run python examples/conversion/adapter/export_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--megatron-peft-checkpoint /path/to/finetune_ckpt \
--output-hf-path ./my_adapter
@@
-python examples/conversion/adapter/verify_adapter.py \
+uv run python examples/conversion/adapter/verify_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter
@@
-python examples/conversion/adapter/verify_adapter.py \
+uv run python examples/conversion/adapter/verify_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter \
--megatron-peft-checkpoint /path/to/finetune_ckpt/iter_0000020🧰 Tools🪛 markdownlint-cli2 (0.21.0)[warning] 51-51: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI Agents |
||
|
|
||
| | Argument | Description | | ||
| |---|---| | ||
| | `--hf-model-path` | HuggingFace base model name or path | | ||
| | `--hf-adapter-path` | Exported HF PEFT adapter directory | | ||
| | `--lora-checkpoint` | *(optional)* Megatron checkpoint iter directory for cross-check | | ||
| | `--prompt` | Prompt for the forward pass (default: `"The capital of France is"`) | | ||
| | `--top-k` | Number of top tokens to compare (default: `5`) | | ||
| | `--tp` | Tensor parallel size (default: `1`) | | ||
| | `--pp` | Pipeline parallel size (default: `1`) | | ||
| | `--ep` | Expert parallel size (default: `1`) | | ||
| | `--cpu` | Run entirely on CPU (no GPU required, TP/PP/EP must be 1) | | ||
|
|
||
| ### 3. `stream_adapter_weights.py` — Low-Level Adapter Streaming | ||
|
|
||
| Demonstrates how to use `AutoBridge.export_adapter_weights` to iterate through | ||
| adapter tensors one at a time. Useful for custom export pipelines or debugging. | ||
|
|
||
| Requires a GPU (uses NCCL backend). | ||
|
|
||
| ```bash | ||
| # Single GPU | ||
| uv run python examples/conversion/adapter/stream_adapter_weights.py \ | ||
| --output ./adapters/demo_lora.safetensors | ||
|
|
||
| # Multi-GPU (tensor + pipeline parallelism) | ||
| uv run python -m torch.distributed.run --nproc_per_node=4 \ | ||
| examples/conversion/adapter/stream_adapter_weights.py \ | ||
| --tensor-model-parallel-size 2 \ | ||
| --pipeline-model-parallel-size 2 \ | ||
| --output ./adapters/demo_tp2_pp2.safetensors | ||
| ``` | ||
|
|
||
| ## Programmatic API | ||
|
|
||
| The same functionality is available directly through `AutoBridge`: | ||
|
|
||
| ```python | ||
| from megatron.bridge import AutoBridge | ||
|
|
||
| bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.2-1B") | ||
|
|
||
| # One-liner: checkpoint → HF PEFT directory | ||
| bridge.export_adapter_ckpt( | ||
| peft_checkpoint="/path/to/finetune_ckpt", | ||
| output_path="./my_adapter", | ||
| ) | ||
|
|
||
| # Or, if you already have a model in memory: | ||
| bridge.save_hf_adapter( | ||
| model=megatron_model, | ||
| path="./my_adapter", | ||
| peft_config=lora, | ||
| base_model_name_or_path="meta-llama/Llama-3.2-1B", | ||
| ) | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| #!/usr/bin/env python3 | ||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """ | ||
| Export LoRA adapter weights from a Megatron-Bridge PEFT checkpoint to | ||
| HuggingFace PEFT format (``adapter_config.json`` + ``adapter_model.safetensors``). | ||
|
|
||
| No GPU required -- runs entirely on CPU. | ||
|
|
||
| The output can be loaded directly with:: | ||
|
|
||
| from peft import PeftModel | ||
| from transformers import AutoModelForCausalLM | ||
|
|
||
| base = AutoModelForCausalLM.from_pretrained("<hf-model-path>") | ||
| model = PeftModel.from_pretrained(base, "./my_adapter") | ||
|
|
||
| Usage:: | ||
|
|
||
| uv run python examples/conversion/adapter/export_adapter.py \\ | ||
| --hf-model-path meta-llama/Llama-3.2-1B \\ | ||
| --lora-checkpoint /path/to/finetune_ckpt \\ | ||
| --output ./my_adapter | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import argparse | ||
| from pathlib import Path | ||
|
|
||
| from megatron.bridge import AutoBridge | ||
|
|
||
|
|
||
| def parse_args() -> argparse.Namespace: | ||
| """Parse command-line arguments.""" | ||
| parser = argparse.ArgumentParser( | ||
| description="Export Megatron-Bridge LoRA adapter to HuggingFace PEFT format", | ||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter, | ||
| ) | ||
| parser.add_argument( | ||
| "--hf-model-path", | ||
| required=True, | ||
| help="HuggingFace model name or local path (architecture + base weights).", | ||
| ) | ||
| parser.add_argument( | ||
| "--lora-checkpoint", | ||
| required=True, | ||
| help="Megatron-Bridge distributed checkpoint containing LoRA adapter weights.", | ||
| ) | ||
| parser.add_argument("--output", type=Path, default=Path("./my_adapter")) | ||
| parser.add_argument("--trust-remote-code", action="store_true") | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def main() -> None: | ||
| """Export a Megatron-Bridge PEFT checkpoint to HuggingFace PEFT format.""" | ||
| args = parse_args() | ||
|
|
||
| bridge = AutoBridge.from_hf_pretrained(args.hf_model_path, trust_remote_code=args.trust_remote_code) | ||
| bridge.export_adapter_ckpt( | ||
| peft_checkpoint=args.lora_checkpoint, | ||
| output_path=args.output, | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a language to the fenced output block.
The output structure block is missing a fence language and triggers MD040.
Suggested markdownlint fix
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 51-51: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents