Skip to content

[peft][ckpt] feat: add HF PEFT adapter export for LoRA/DoRA checkpoints#2574

Merged
yaoyu-33 merged 24 commits intomainfrom
yuya/add-hf-adapter-export
Mar 16, 2026
Merged

[peft][ckpt] feat: add HF PEFT adapter export for LoRA/DoRA checkpoints#2574
yaoyu-33 merged 24 commits intomainfrom
yuya/add-hf-adapter-export

Conversation

@yaoyu-33
Copy link
Contributor

@yaoyu-33 yaoyu-33 commented Feb 26, 2026

Summary

  • Add save_hf_adapter() and export_adapter_ckpt() to AutoBridge for exporting Megatron-Bridge LoRA/DoRA adapters to HuggingFace PEFT format (adapter_config.json + adapter_model.safetensors)
  • Fix LoRA merge precision: perform merge in float32 to avoid bf16 matmul precision loss, then cast back to original dtype
  • Add helper functions infer_target_modules_from_adapter_weights() and build_adapter_config_dict() in peft_bridge.py
  • Add example scripts under examples/conversion/adapter/ for export, verification, and streaming
  • Add unit tests (19 tests) for the new helpers and save_hf_adapter
  • Add functional test (7 tests) for end-to-end Qwen3 LoRA export with PEFT library verification

Test plan

  • Unit tests pass: pytest tests/unit_tests/models/test_adapter_export.py (19 passed)
  • Functional tests pass: pytest tests/functional_tests/models/qwen/test_qwen3_peft_export.py (7 passed)
  • Existing LoRA tests still pass: pytest tests/unit_tests/models/test_model_bridge_lora.py

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Added LoRA/DoRA adapter export to HuggingFace PEFT format.
    • Added adapter verification functionality via logit comparison.
    • Added streaming adapter weights support.
  • Documentation

    • Added comprehensive adapter export and verification workflow guides with examples.
  • Tests

    • Added functional and unit tests for adapter export functionality.

The infer_train_mode function only checked for 'finetune' in recipe
names. Recipes named with 'sft' or 'peft' were not recognized as
finetune mode, causing a ValueError. Add these keywords to the
has_finetune check.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Add `save_hf_adapter()` and `export_adapter_ckpt()` to AutoBridge for
exporting Megatron-Bridge LoRA/DoRA adapters to HuggingFace PEFT format
(adapter_config.json + adapter_model.safetensors).

Key changes:
- peft_bridge: perform LoRA merge in float32 to avoid bf16 precision loss
- peft_bridge: add helpers to infer target_modules and build adapter config
- auto_bridge: add save_hf_adapter() for direct model export
- auto_bridge: add export_adapter_ckpt() for checkpoint-based export
- Move adapter examples to examples/conversion/adapter/ with export,
  stream, and verification scripts

Signed-off-by: Yu Yao <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
- Add unit tests for peft_bridge helpers (infer_target_modules,
  build_adapter_config_dict) and save_hf_adapter in auto_bridge
- Add functional test for Qwen3 LoRA adapter export end-to-end:
  creates toy model, attaches LoRA, exports via AutoBridge, verifies
  output files, config, weight shapes, and PEFT library loading
- Add README for examples/conversion/adapter/ with usage docs
- Update parent examples/conversion/README.md with adapter section

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

This PR introduces adapter export and verification functionality for Megatron-Bridge LoRA/DoRA adapters, adding new public methods to AutoBridge and helper utilities, three example scripts demonstrating the workflow, and comprehensive tests validating the implementation.

Changes

Cohort / File(s) Summary
Submodule & Build Configuration
3rdparty/Megatron-LM, scripts/training/run_recipe.py
Updated Megatron-LM submodule pointer to new commit; expanded finetune mode detection in recipe to include "sft" and "peft" substrings.
Documentation
examples/conversion/README.md, examples/conversion/adapter/README.md
Added comprehensive documentation for adapter export and verification workflow, covering export, verification, and streaming of LoRA/DoRA adapters with examples and script descriptions.
Example Scripts
examples/conversion/adapter/export_adapter.py, examples/conversion/adapter/verify_adapter.py, examples/conversion/adapter/stream_adapter_weights.py
New scripts for exporting Megatron PEFT checkpoints to HuggingFace format, verifying exports via logit comparison, and streaming adapter tensors; stream script location references updated.
Core Bridge Implementation
src/megatron/bridge/models/conversion/auto_bridge.py, src/megatron/bridge/models/conversion/peft_bridge.py
Added public methods save_hf_adapter and export_adapter_ckpt to AutoBridge for adapter export; added helper functions infer_target_modules_from_adapter_weights and build_adapter_config_dict to peft_bridge; enhanced adapter weight merging with float32 precision handling.
Tests
tests/functional_tests/models/qwen/test_qwen3_peft_export.py, tests/unit_tests/models/test_adapter_export.py
Comprehensive functional test suite validating Qwen3 adapter export artifacts and PEFT library compatibility; unit tests covering adapter config building, target module inference, precision handling, and end-to-end export flow.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant export_adapter.py
    participant AutoBridge
    participant MegatronModel
    participant HFAdapter as HuggingFace PEFT
    participant FileSystem

    User->>export_adapter.py: Run export script
    export_adapter.py->>AutoBridge: Load from pretrained HF model
    AutoBridge->>HFAdapter: Initialize with HF model
    export_adapter.py->>AutoBridge: export_adapter_ckpt(megatron_ckpt)
    AutoBridge->>MegatronModel: Load LoRA config from run_config.yaml
    MegatronModel->>MegatronModel: Materialize base + LoRA weights
    AutoBridge->>MegatronModel: Load adapter weights from checkpoint
    AutoBridge->>AutoBridge: save_hf_adapter()
    AutoBridge->>HFAdapter: Extract adapter weights
    AutoBridge->>FileSystem: Write adapter_config.json
    AutoBridge->>FileSystem: Write adapter_model.safetensors
    FileSystem-->>User: Adapter artifacts saved
Loading
sequenceDiagram
    participant User
    participant verify_adapter.py
    participant HFModel as HF Base Model
    participant PEFTAdapter as PEFT Adapter
    participant MegatronModel
    participant Comparison

    User->>verify_adapter.py: Run verification script
    verify_adapter.py->>HFModel: Load base model
    verify_adapter.py->>HFModel: Compute logits for prompt
    HFModel-->>verify_adapter.py: Base logits
    verify_adapter.py->>PEFTAdapter: Load via PEFT library
    verify_adapter.py->>PEFTAdapter: Compute logits
    PEFTAdapter-->>verify_adapter.py: PEFT logits
    verify_adapter.py->>Comparison: Compare top-k logits
    Comparison-->>verify_adapter.py: PEFT verification result
    
    alt Megatron verification requested
        verify_adapter.py->>MegatronModel: Load from checkpoint
        MegatronModel->>MegatronModel: Construct LoRA model
        MegatronModel->>MegatronModel: Load merged weights
        verify_adapter.py->>MegatronModel: Compute logits
        MegatronModel-->>verify_adapter.py: Megatron logits
        verify_adapter.py->>Comparison: Compare PEFT vs Megatron
        Comparison-->>User: Final PASSED/FAILED
    else
        verify_adapter.py-->>User: PEFT verification result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

Run CICD

Suggested reviewers

  • ananthsub
  • ko3n1g
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding HF PEFT adapter export functionality for LoRA/DoRA checkpoints, which is the central feature of this pull request.
Test Results For Major Changes ✅ Passed PR documentation claims 19 unit tests and 7 functional tests, matching actual test file counts with comprehensive coverage of new adapter export features and float32 merge fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yuya/add-hf-adapter-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
tests/unit_tests/models/test_adapter_export.py (1)

39-297: Add a module-level test category marker and reuse fixtures for common setup.

The file repeats the same distributed patch context and does not label tests with a pytest category.

Suggested refactor
 import pytest
 import torch
@@
 from megatron.bridge.models.conversion.peft_bridge import (
@@
 )
 
+pytestmark = pytest.mark.unit
+
+@pytest.fixture
+def no_distributed():
+    with (
+        patch("torch.distributed.is_available", return_value=False),
+        patch("torch.distributed.is_initialized", return_value=False),
+    ):
+        yield
+
@@
-    def test_save_creates_files(self, tmp_path):
+    def test_save_creates_files(self, tmp_path, no_distributed):
@@
-        with (
-            patch("torch.distributed.is_available", return_value=False),
-            patch("torch.distributed.is_initialized", return_value=False),
-        ):
-            from megatron.bridge.models.conversion.auto_bridge import AutoBridge
-
-            AutoBridge.save_hf_adapter(
-                mock_bridge,
-                model=[MagicMock()],
-                path=output_dir,
-                peft_config=lora,
-                base_model_name_or_path="test/model",
-            )
+        from megatron.bridge.models.conversion.auto_bridge import AutoBridge
+        AutoBridge.save_hf_adapter(
+            mock_bridge,
+            model=[MagicMock()],
+            path=output_dir,
+            peft_config=lora,
+            base_model_name_or_path="test/model",
+        )
As per coding guidelines, `tests/**/*.py: Use pytest fixtures for common setup in unit tests` and `Use 'pytest.mark' to categorize tests (unit, integration, system)`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/test_adapter_export.py` around lines 39 - 297, Add a
module-level pytest marker (e.g., pytestmark = pytest.mark.unit) at top of the
file and refactor repeated setup into fixtures: create a fixture (e.g.,
mock_distributed) that yields the patched
torch.distributed.is_available/is_initialized context and another fixture (e.g.,
mock_bridge_with_weights) that returns a MagicMock configured similarly to how
mock_bridge is created in TestSaveHfAdapter tests; update tests like
TestSaveHfAdapter.test_save_creates_files, test_save_raises_on_empty_adapter,
and test_save_infers_base_model_path to accept and use these fixtures instead of
repeating patch/context creation and mock_bridge construction, and keep
references to AutoBridge.save_hf_adapter and the MagicMock
export_adapter_weights behavior unchanged.
tests/functional_tests/models/qwen/test_qwen3_peft_export.py (2)

131-133: Add pytest markers to categorize the test class.

Per coding guidelines, tests should use pytest.mark for categorization. Consider adding markers to indicate this is a functional test and document any hardware/environment requirements.

♻️ Proposed fix
+@pytest.mark.functional
+@pytest.mark.slow  # if applicable
 class TestQwen3PeftExport:
     """Functional tests for Qwen3 LoRA adapter export to HuggingFace PEFT format."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py` around lines
131 - 133, Add pytest markers to the TestQwen3PeftExport class by decorating it
with appropriate pytest.mark annotations (e.g., pytest.mark.functional and any
required hardware/environment markers such as pytest.mark.gpu or
pytest.mark.requires_internet) so the test suite can categorize and selectively
run it; update the class definition (TestQwen3PeftExport) to include these
markers and document specific environment needs in the marker names or a short
docstring comment immediately above the class.

158-179: Consider moving repeated import to module level.

The from safetensors.torch import load_file import appears in three test methods (lines 160, 183, 195). Moving it to the module-level imports would be cleaner.

♻️ Proposed refactor

Add to imports section:

 from megatron.bridge.training.model_load_save import temporary_distributed_context
+from safetensors.torch import load_file

Then remove the local imports from test methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py` around lines
158 - 179, The repeated local import "from safetensors.torch import load_file"
used in tests like test_safetensors_weight_pairs should be moved to the
module-level imports: add a single "from safetensors.torch import load_file" at
the top of the test file and remove the in-function imports from each test
method (e.g., test_safetensors_weight_pairs and the other safetensors-related
test functions) so all tests reuse the same module-level symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/conversion/adapter/README.md`:
- Around line 51-55: The fenced output block showing the adapter file tree is
missing a language label (triggers MD040); update the opening fence from ``` to
```text for the block containing "my_adapter/ ├── adapter_config.json └──
adapter_model.safetensors" (in the README's fenced code block) so the snippet is
marked as plain text—apply the same change to any other similar unlabeled
file-tree fences in the README.
- Around line 35-78: Update the example command invocations in README.md to use
the project's recommended runner by prefixing each "python" call with "uv run"
for both export_adapter.py and verify_adapter.py examples (e.g., change the
export command invoking export_adapter.py with --hf-model-id,
--megatron-peft-checkpoint, --output-hf-path and the verify commands invoking
verify_adapter.py with --hf-model-id, --hf-adapter-path,
--megatron-peft-checkpoint to use "uv run python ..."). Ensure every occurrence
of "python examples/conversion/adapter/export_adapter.py" and "python
examples/conversion/adapter/verify_adapter.py" in the README is replaced so the
shown CLI examples match the rest of the conversion docs.

In `@examples/conversion/adapter/verify_adapter.py`:
- Around line 163-177: The block that enters a distributed context with
temporary_distributed_context(ctx.__enter__()) can leak the context on
exceptions and currently only loads weights into model[0]; wrap the code after
ctx.__enter__() in try/finally (or try/except/finally) to ensure ctx.__exit__()
is always called, and when loading the checkpoint (dist_checkpointing.load)
determine whether keys are "model" or "model0"/"model1"/... then iterate over
the list returned by provide_distributed_model and call load_state_dict on each
chunk (e.g., for i, m in enumerate(model): select loaded_sd[f"model{i}"] if
per-chunk keys exist, otherwise use loaded_sd["model"]) so all model chunks
receive their corresponding weights; reference functions:
temporary_distributed_context, provide_distributed_model,
_generate_model_state_dict, apply_peft_adapter_filter_to_state_dict,
dist_checkpointing.load, and model[0].load_state_dict.

In `@scripts/training/run_recipe.py`:
- Line 212: Update the inference failure message to list the current accepted
finetune keywords to match the detection logic: where the code computes
has_finetune (variable name has_finetune) from lowered including "finetune",
"sft", or "peft", change the error/help text emitted on inference failure to
mention "finetune", "sft", and "peft" (and still include "pretrain" if relevant)
so the guidance matches the logic in run_recipe.py around has_finetune.

In `@src/megatron/bridge/models/conversion/auto_bridge.py`:
- Around line 1051-1056: The code only calls model[0].load_state_dict(...),
leaving other chunks uninitialized; change this to iterate over all model chunks
and load each chunk's state dict: after computing sharded_state_dict,
apply_peft_adapter_filter_to_state_dict, and loaded_sd/model_key, loop over
enumerate(model) and for each index i determine a chunk-specific key by
preferring f"{model_key}.{i}" (or search for the first loaded_sd key that
startswith f"{model_key}.{i}") and call
model[i].load_state_dict(loaded_sd[chunk_key], strict=False); if no per-index
keys exist, fall back to loading the same loaded_sd[model_key] into every
model[i].load_state_dict(...) so all shards are populated.

In `@src/megatron/bridge/models/conversion/peft_bridge.py`:
- Around line 839-851: Add explicit shape validation before performing the LoRA
merge: compute or infer the expected delta tensor shape (based on
base_weight.shape and dim) and validate that the transformed tensors used by
LoRAMerge (e.g., linear_out_weight, linear_in_weight after casting/moving:
linear_out_on_base, linear_in_on_base) will produce a delta whose shape matches
base_weight.shape; if the shapes mismatch, raise a clear exception. Update the
merge call site in the function that uses orig_dtype, base_weight, LoRAMerge,
linear_out_weight, linear_in_weight, alpha, and dim to perform this check right
before calling merger.merge and only proceed to return merged.to(orig_dtype)
when the validation passes.

In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py`:
- Around line 73-89: The fixture qwen3_toy_model_dir currently pulls a tokenizer
from the network via AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B"); replace
this network dependency by creating and saving a minimal local tokenizer into
model_dir (e.g., construct a simple tokenizer instance with minimal vocab/config
and call tokenizer.save_pretrained(model_dir) instead of
AutoTokenizer.from_pretrained) so tests don’t require HF Hub access, or
alternatively annotate the fixture with `@pytest.mark.network` if you intend to
allow external network in CI; update the code references around the
qwen3_toy_model_dir fixture and remove the AutoTokenizer.from_pretrained call.

---

Nitpick comments:
In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py`:
- Around line 131-133: Add pytest markers to the TestQwen3PeftExport class by
decorating it with appropriate pytest.mark annotations (e.g.,
pytest.mark.functional and any required hardware/environment markers such as
pytest.mark.gpu or pytest.mark.requires_internet) so the test suite can
categorize and selectively run it; update the class definition
(TestQwen3PeftExport) to include these markers and document specific environment
needs in the marker names or a short docstring comment immediately above the
class.
- Around line 158-179: The repeated local import "from safetensors.torch import
load_file" used in tests like test_safetensors_weight_pairs should be moved to
the module-level imports: add a single "from safetensors.torch import load_file"
at the top of the test file and remove the in-function imports from each test
method (e.g., test_safetensors_weight_pairs and the other safetensors-related
test functions) so all tests reuse the same module-level symbol.

In `@tests/unit_tests/models/test_adapter_export.py`:
- Around line 39-297: Add a module-level pytest marker (e.g., pytestmark =
pytest.mark.unit) at top of the file and refactor repeated setup into fixtures:
create a fixture (e.g., mock_distributed) that yields the patched
torch.distributed.is_available/is_initialized context and another fixture (e.g.,
mock_bridge_with_weights) that returns a MagicMock configured similarly to how
mock_bridge is created in TestSaveHfAdapter tests; update tests like
TestSaveHfAdapter.test_save_creates_files, test_save_raises_on_empty_adapter,
and test_save_infers_base_model_path to accept and use these fixtures instead of
repeating patch/context creation and mock_bridge construction, and keep
references to AutoBridge.save_hf_adapter and the MagicMock
export_adapter_weights behavior unchanged.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3e44b4 and b551e45.

📒 Files selected for processing (11)
  • 3rdparty/Megatron-LM
  • examples/conversion/README.md
  • examples/conversion/adapter/README.md
  • examples/conversion/adapter/export_adapter.py
  • examples/conversion/adapter/stream_adapter_weights.py
  • examples/conversion/adapter/verify_adapter.py
  • scripts/training/run_recipe.py
  • src/megatron/bridge/models/conversion/auto_bridge.py
  • src/megatron/bridge/models/conversion/peft_bridge.py
  • tests/functional_tests/models/qwen/test_qwen3_peft_export.py
  • tests/unit_tests/models/test_adapter_export.py

Comment on lines +35 to +78
```bash
python examples/conversion/adapter/export_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--megatron-peft-checkpoint /path/to/finetune_ckpt \
--output-hf-path ./my_adapter
```

| Argument | Description |
|---|---|
| `--hf-model-id` | HuggingFace model name or local path (architecture + base weights) |
| `--megatron-peft-checkpoint` | Path to the Megatron-Bridge distributed checkpoint containing LoRA adapter weights |
| `--output-hf-path` | Output directory (default: `./my_adapter`) |
| `--trust-remote-code` | Allow custom code from the HuggingFace repository |

**Output structure:**

```
my_adapter/
├── adapter_config.json
└── adapter_model.safetensors
```

### 2. `verify_adapter.py` — Export Verification

Loads the exported adapter with the `peft` library and runs verification
checks:

- The PEFT model logits must differ from the base model (adapter has effect).
- When `--megatron-peft-checkpoint` is provided, the top-k predicted tokens
from the PEFT model must match those from the Megatron model with merged
weights.

```bash
# Quick check (PEFT-only, no Megatron comparison)
python examples/conversion/adapter/verify_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter

# Full verification (compares against Megatron checkpoint)
python examples/conversion/adapter/verify_adapter.py \
--hf-model-id meta-llama/Llama-3.2-1B \
--hf-adapter-path ./my_adapter \
--megatron-peft-checkpoint /path/to/finetune_ckpt/iter_0000020
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align command examples with uv run usage.

Use uv run python ... in this README to match the rest of the conversion examples and avoid environment mismatch.

Suggested doc fix
-python examples/conversion/adapter/export_adapter.py \
+uv run python examples/conversion/adapter/export_adapter.py \
     --hf-model-id meta-llama/Llama-3.2-1B \
     --megatron-peft-checkpoint /path/to/finetune_ckpt \
     --output-hf-path ./my_adapter
@@
-python examples/conversion/adapter/verify_adapter.py \
+uv run python examples/conversion/adapter/verify_adapter.py \
     --hf-model-id meta-llama/Llama-3.2-1B \
     --hf-adapter-path ./my_adapter
@@
-python examples/conversion/adapter/verify_adapter.py \
+uv run python examples/conversion/adapter/verify_adapter.py \
     --hf-model-id meta-llama/Llama-3.2-1B \
     --hf-adapter-path ./my_adapter \
     --megatron-peft-checkpoint /path/to/finetune_ckpt/iter_0000020
As per coding guidelines, `{**/*.sh,examples/**/*.py}: Use 'uv run' to execute scripts instead of activating a virtual environment and calling 'python' directly`.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/conversion/adapter/README.md` around lines 35 - 78, Update the
example command invocations in README.md to use the project's recommended runner
by prefixing each "python" call with "uv run" for both export_adapter.py and
verify_adapter.py examples (e.g., change the export command invoking
export_adapter.py with --hf-model-id, --megatron-peft-checkpoint,
--output-hf-path and the verify commands invoking verify_adapter.py with
--hf-model-id, --hf-adapter-path, --megatron-peft-checkpoint to use "uv run
python ..."). Ensure every occurrence of "python
examples/conversion/adapter/export_adapter.py" and "python
examples/conversion/adapter/verify_adapter.py" in the README is replaced so the
shown CLI examples match the rest of the conversion docs.

Comment on lines +51 to +55
```
my_adapter/
├── adapter_config.json
└── adapter_model.safetensors
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to the fenced output block.

The output structure block is missing a fence language and triggers MD040.

Suggested markdownlint fix
-```
+```text
 my_adapter/
 ├── adapter_config.json
 └── adapter_model.safetensors
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/conversion/adapter/README.md` around lines 51 - 55, The fenced
output block showing the adapter file tree is missing a language label (triggers
MD040); update the opening fence from ``` to ```text for the block containing
"my_adapter/ ├── adapter_config.json └── adapter_model.safetensors" (in the
README's fenced code block) so the snippet is marked as plain text—apply the
same change to any other similar unlabeled file-tree fences in the README.

lowered = recipe_name.lower()
has_pretrain = "pretrain" in lowered
has_finetune = "finetune" in lowered
has_finetune = "finetune" in lowered or "sft" in lowered or "peft" in lowered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the inference error text to reflect new accepted finetune keywords.

Line 212 now recognizes "sft" and "peft" for finetune inference, but the failure message still instructs only "pretrain"/"finetune". Please align the message so failed inference guidance is accurate.

Suggested patch
 ERR_INFER_MODE_FAILED = (
     "Unable to infer training mode from recipe name. "
-    "Please include 'pretrain' or 'finetune' in the recipe name or pass --mode explicitly."
+    "Please include 'pretrain', 'finetune', 'sft', or 'peft' in the recipe name, "
+    "or pass --mode explicitly."
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/training/run_recipe.py` at line 212, Update the inference failure
message to list the current accepted finetune keywords to match the detection
logic: where the code computes has_finetune (variable name has_finetune) from
lowered including "finetune", "sft", or "peft", change the error/help text
emitted on inference failure to mention "finetune", "sft", and "peft" (and still
include "pretrain" if relevant) so the guidance matches the logic in
run_recipe.py around has_finetune.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33
Copy link
Contributor Author

/ok to test 53a1d64

Patch get_tensor_model_parallel_world_size to return 1 in
TestMergeSingleAdapterWeight so LoRAMerge.merge() takes the tp_size==1
path and avoids requiring an initialized tensor model parallel group.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33
Copy link
Contributor Author

yaoyu-33 commented Mar 9, 2026

/ok to test 578403d

1 similar comment
@yaoyu-33
Copy link
Contributor Author

yaoyu-33 commented Mar 9, 2026

/ok to test 578403d

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Contributor Author

yaoyu-33 commented Mar 9, 2026

/ok to test d0c76b2

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Contributor Author

/ok to test cd79d06

3 similar comments
@yaoyu-33
Copy link
Contributor Author

/ok to test cd79d06

@yaoyu-33
Copy link
Contributor Author

/ok to test cd79d06

@yaoyu-33
Copy link
Contributor Author

/ok to test cd79d06

…xport

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor

# Conflicts:
#	pyproject.toml
#	uv.lock
@yaoyu-33
Copy link
Contributor Author

/ok to test 3372000

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33
Copy link
Contributor Author

/ok to test 6eb53b9

Add TestExportAdapterCkpt with 10 tests covering the orchestrator method
that was previously uncovered by Codecov. Tests mock heavy infrastructure
(dist_checkpointing, distributed context, model materialisation) and
exercise config parsing, VLMLoRA selection, error paths, float32 dtype
enforcement, and base_model_name fallback logic.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33
Copy link
Contributor Author

/ok to test eabcce3

cuichenx
cuichenx previously approved these changes Mar 11, 2026
…xport

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33
Copy link
Contributor Author

/ok to test 8fa62fa

…xport

Made-with: Cursor

# Conflicts:
#	uv.lock
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33
Copy link
Contributor Author

/ok to test f2104df

cuichenx
cuichenx previously approved these changes Mar 14, 2026
@yaoyu-33
Copy link
Contributor Author

/ok to test 4e1fdd2

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g
Copy link
Contributor

ko3n1g commented Mar 16, 2026

/ok to test 3ba210d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility. ready-to-merge PR is approved, current, and only waiting for CI to pass before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants