NVIDIA-NeMo · yaoyu-33 · Mar 16, 2026 · Feb 26, 2026 · Feb 26, 2026 · Feb 26, 2026
diff --git a/examples/conversion/README.md b/examples/conversion/README.md
@@ -449,3 +449,27 @@ Each log entry captures detailed tensor information for every module:
 - **Model Verification**: Compare intermediate results between HuggingFace and Megatron models
 - **Numerical Debugging**: Identify divergence points in model conversion
 
+### 9. `adapter/` — LoRA/DoRA Adapter Export & Verification
+
+Scripts for exporting Megatron-Bridge LoRA/DoRA adapter weights to HuggingFace PEFT format and verifying correctness. See [`adapter/README.md`](adapter/README.md) for full details.
+
+| Script | Description |
+|---|---|
+| `adapter/export_adapter.py` | Export a Megatron PEFT checkpoint to HF PEFT format (CPU-only) |
+| `adapter/verify_adapter.py` | Verify exported adapter via logit comparison |
+| `adapter/stream_adapter_weights.py` | Stream individual adapter tensors for custom workflows |
+
+**Quick start:**
+```bash
+# Export
+uv run python examples/conversion/adapter/export_adapter.py \
+    --hf-model-id meta-llama/Llama-3.2-1B \
+    --megatron-peft-checkpoint /path/to/finetune_ckpt \
+    --output-hf-path ./my_adapter
+
+# Verify
+uv run python examples/conversion/adapter/verify_adapter.py \
+    --hf-model-id meta-llama/Llama-3.2-1B \
+    --hf-adapter-path ./my_adapter
+```
+
diff --git a/examples/conversion/adapter/README.md b/examples/conversion/adapter/README.md
@@ -0,0 +1,153 @@
+# Adapter Export & Verification
+
+Scripts for exporting Megatron-Bridge LoRA/DoRA adapter weights to HuggingFace PEFT format and verifying the results.
+
+## Overview
+
+After fine-tuning a model with LoRA (or DoRA) in Megatron-Bridge, the adapter
+weights live inside a Megatron distributed checkpoint. The scripts in this
+directory let you:
+
+1. **Export** the adapter to a HuggingFace PEFT-compatible directory
+   (`adapter_config.json` + `adapter_model.safetensors`).
+2. **Verify** the export by loading it with the `peft` library and comparing
+   logits against the Megatron checkpoint.
+3. **Stream** individual adapter tensors from a Megatron model for inspection
+   or custom workflows.
+
+The exported adapter can be loaded with standard HuggingFace tooling:
+
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM
+
+base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
+model = PeftModel.from_pretrained(base, "./my_adapter")
+```
+
+## Scripts
+
+### 1. `export_adapter.py` — Checkpoint Export
+
+Converts a Megatron-Bridge PEFT checkpoint to HuggingFace PEFT format. Runs
+entirely on CPU — no GPU required.
+
+```bash
+uv run python examples/conversion/adapter/export_adapter.py \
+    --hf-model-path meta-llama/Llama-3.2-1B \
+    --lora-checkpoint /path/to/finetune_ckpt \
+    --output ./my_adapter
+```
+
+| Argument | Description |
+|---|---|
+| `--hf-model-path` | HuggingFace model name or local path (architecture + base weights) |
+| `--lora-checkpoint` | Path to the Megatron-Bridge distributed checkpoint containing LoRA adapter weights |
+| `--output` | Output directory (default: `./my_adapter`) |
+| `--trust-remote-code` | Allow custom code from the HuggingFace repository |
+
+**Output structure:**
+
+```
+my_adapter/
+├── adapter_config.json
+└── adapter_model.safetensors
+```
+
+### 2. `verify_adapter.py` — Export Verification
+
+Loads the exported adapter with the `peft` library and runs verification
+checks:
+
+- The PEFT model logits must differ from the base model (adapter has effect).
+- When `--lora-checkpoint` is provided, the top-k predicted tokens
+  from the PEFT model must match those from the Megatron model with merged
+  weights.
+
+Supports CPU-only, single-GPU, and multi-GPU (TP/PP) modes.
+
+```bash
+# Quick check (PEFT-only, no Megatron comparison, CPU)
+uv run python examples/conversion/adapter/verify_adapter.py \
+    --hf-model-path meta-llama/Llama-3.2-1B \
+    --hf-adapter-path ./my_adapter \
+    --cpu
+
+# Full verification on GPU (single GPU)
+uv run python examples/conversion/adapter/verify_adapter.py \
+    --hf-model-path meta-llama/Llama-3.2-1B \
+    --hf-adapter-path ./my_adapter \
+    --lora-checkpoint /path/to/finetune_ckpt/iter_0000020
+
+# Multi-GPU with TP=2
+uv run python -m torch.distributed.run --nproc_per_node=2 \
+    examples/conversion/adapter/verify_adapter.py \
+    --hf-model-path meta-llama/Llama-3.2-1B \
+    --hf-adapter-path ./my_adapter \
+    --lora-checkpoint /path/to/finetune_ckpt/iter_0000020 \
+    --tp 2
+
+# Multi-GPU with PP=4
+uv run python -m torch.distributed.run --nproc_per_node=4 \
+    examples/conversion/adapter/verify_adapter.py \
+    --hf-model-path meta-llama/Llama-3.2-1B \
+    --hf-adapter-path ./my_adapter \
+    --lora-checkpoint /path/to/finetune_ckpt/iter_0000020 \
+    --pp 4
+```
+
+| Argument | Description |
+|---|---|
+| `--hf-model-path` | HuggingFace base model name or path |
+| `--hf-adapter-path` | Exported HF PEFT adapter directory |
+| `--lora-checkpoint` | *(optional)* Megatron checkpoint iter directory for cross-check |
+| `--prompt` | Prompt for the forward pass (default: `"The capital of France is"`) |
+| `--top-k` | Number of top tokens to compare (default: `5`) |
+| `--tp` | Tensor parallel size (default: `1`) |
+| `--pp` | Pipeline parallel size (default: `1`) |
+| `--ep` | Expert parallel size (default: `1`) |
+| `--cpu` | Run entirely on CPU (no GPU required, TP/PP/EP must be 1) |
+
+### 3. `stream_adapter_weights.py` — Low-Level Adapter Streaming
+
+Demonstrates how to use `AutoBridge.export_adapter_weights` to iterate through
+adapter tensors one at a time. Useful for custom export pipelines or debugging.
+
+Requires a GPU (uses NCCL backend).
+
+```bash
+# Single GPU
+uv run python examples/conversion/adapter/stream_adapter_weights.py \
+    --output ./adapters/demo_lora.safetensors
+
+# Multi-GPU (tensor + pipeline parallelism)
+uv run python -m torch.distributed.run --nproc_per_node=4 \
+    examples/conversion/adapter/stream_adapter_weights.py \
+    --tensor-model-parallel-size 2 \
+    --pipeline-model-parallel-size 2 \
+    --output ./adapters/demo_tp2_pp2.safetensors
+```
+
+## Programmatic API
+
+The same functionality is available directly through `AutoBridge`:
+
+```python
+from megatron.bridge import AutoBridge
+
+bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.2-1B")
+
+# One-liner: checkpoint → HF PEFT directory
+bridge.export_adapter_ckpt(
+    peft_checkpoint="/path/to/finetune_ckpt",
+    output_path="./my_adapter",
+)
+
+# Or, if you already have a model in memory:
+bridge.save_hf_adapter(
+    model=megatron_model,
+    path="./my_adapter",
+    peft_config=lora,
+    base_model_name_or_path="meta-llama/Llama-3.2-1B",
+)
+```
diff --git a/examples/conversion/adapter/export_adapter.py b/examples/conversion/adapter/export_adapter.py
@@ -0,0 +1,79 @@
+#!/usr/bin/env python3
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Export LoRA adapter weights from a Megatron-Bridge PEFT checkpoint to
+HuggingFace PEFT format (``adapter_config.json`` + ``adapter_model.safetensors``).
+
+No GPU required -- runs entirely on CPU.
+
+The output can be loaded directly with::
+
+    from peft import PeftModel
+    from transformers import AutoModelForCausalLM
+
+    base = AutoModelForCausalLM.from_pretrained("<hf-model-path>")
+    model = PeftModel.from_pretrained(base, "./my_adapter")
+
+Usage::
+
+    uv run python examples/conversion/adapter/export_adapter.py \\
+        --hf-model-path meta-llama/Llama-3.2-1B \\
+        --lora-checkpoint /path/to/finetune_ckpt \\
+        --output ./my_adapter
+"""
+
+from __future__ import annotations
+
+import argparse
+from pathlib import Path
+
+from megatron.bridge import AutoBridge
+
+
+def parse_args() -> argparse.Namespace:
+    """Parse command-line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Export Megatron-Bridge LoRA adapter to HuggingFace PEFT format",
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+    )
+    parser.add_argument(
+        "--hf-model-path",
+        required=True,
+        help="HuggingFace model name or local path (architecture + base weights).",
+    )
+    parser.add_argument(
+        "--lora-checkpoint",
+        required=True,
+        help="Megatron-Bridge distributed checkpoint containing LoRA adapter weights.",
+    )
+    parser.add_argument("--output", type=Path, default=Path("./my_adapter"))
+    parser.add_argument("--trust-remote-code", action="store_true")
+    return parser.parse_args()
+
+
+def main() -> None:
+    """Export a Megatron-Bridge PEFT checkpoint to HuggingFace PEFT format."""
+    args = parse_args()
+
+    bridge = AutoBridge.from_hf_pretrained(args.hf_model_path, trust_remote_code=args.trust_remote_code)
+    bridge.export_adapter_ckpt(
+        peft_checkpoint=args.lora_checkpoint,
+        output_path=args.output,
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/...ples/conversion/stream_adapter_weights.py → ...version/adapter/stream_adapter_weights.py b/...ples/conversion/stream_adapter_weights.py → ...version/adapter/stream_adapter_weights.py
@@ -35,12 +35,12 @@
 
 Run the example:
 
-    uv run python examples/conversion/stream_adapter_weights.py \
+    uv run python examples/conversion/adapter/stream_adapter_weights.py \
         --output ./adapters/demo.safetensors
 
 Multi-GPU launch (torchrun) with tensor/pipeline/expert parallelism:
 
-    uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/stream_adapter_weights.py \
+    uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/adapter/stream_adapter_weights.py \
         --tensor-model-parallel-size 2 \
         --pipeline-model-parallel-size 2 \
         --expert-model-parallel-size 1 \