[peft][ckpt] feat: add HF PEFT adapter export for LoRA/DoRA checkpoints by yaoyu-33 · Pull Request #2574 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-02-26T23:27:15Z

Summary

Add save_hf_adapter() and export_adapter_ckpt() to AutoBridge for exporting Megatron-Bridge LoRA/DoRA adapters to HuggingFace PEFT format (adapter_config.json + adapter_model.safetensors)
Fix LoRA merge precision: perform merge in float32 to avoid bf16 matmul precision loss, then cast back to original dtype
Add helper functions infer_target_modules_from_adapter_weights() and build_adapter_config_dict() in peft_bridge.py
Add example scripts under examples/conversion/adapter/ for export, verification, and streaming
Add unit tests (19 tests) for the new helpers and save_hf_adapter
Add functional test (7 tests) for end-to-end Qwen3 LoRA export with PEFT library verification

Test plan

Unit tests pass: pytest tests/unit_tests/models/test_adapter_export.py (19 passed)
Functional tests pass: pytest tests/functional_tests/models/qwen/test_qwen3_peft_export.py (7 passed)
Existing LoRA tests still pass: pytest tests/unit_tests/models/test_model_bridge_lora.py

Made with Cursor

Summary by CodeRabbit

New Features
- Added LoRA/DoRA adapter export to HuggingFace PEFT format.
- Added adapter verification functionality via logit comparison.
- Added streaming adapter weights support.
Documentation
- Added comprehensive adapter export and verification workflow guides with examples.
Tests
- Added functional and unit tests for adapter export functionality.

The infer_train_mode function only checked for 'finetune' in recipe names. Recipes named with 'sft' or 'peft' were not recognized as finetune mode, causing a ValueError. Add these keywords to the has_finetune check. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

Add `save_hf_adapter()` and `export_adapter_ckpt()` to AutoBridge for exporting Megatron-Bridge LoRA/DoRA adapters to HuggingFace PEFT format (adapter_config.json + adapter_model.safetensors). Key changes: - peft_bridge: perform LoRA merge in float32 to avoid bf16 precision loss - peft_bridge: add helpers to infer target_modules and build adapter config - auto_bridge: add save_hf_adapter() for direct model export - auto_bridge: add export_adapter_ckpt() for checkpoint-based export - Move adapter examples to examples/conversion/adapter/ with export, stream, and verification scripts Signed-off-by: Yu Yao <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

- Add unit tests for peft_bridge helpers (infer_target_modules, build_adapter_config_dict) and save_hf_adapter in auto_bridge - Add functional test for Qwen3 LoRA adapter export end-to-end: creates toy model, attaches LoRA, exports via AutoBridge, verifies output files, config, weight shapes, and PEFT library loading - Add README for examples/conversion/adapter/ with usage docs - Update parent examples/conversion/README.md with adapter section Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

copy-pr-bot · 2026-02-26T23:27:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-26T23:37:44Z

📝 Walkthrough

Walkthrough

This PR introduces adapter export and verification functionality for Megatron-Bridge LoRA/DoRA adapters, adding new public methods to AutoBridge and helper utilities, three example scripts demonstrating the workflow, and comprehensive tests validating the implementation.

Changes

Cohort / File(s)	Summary
Submodule & Build Configuration `3rdparty/Megatron-LM`, `scripts/training/run_recipe.py`	Updated Megatron-LM submodule pointer to new commit; expanded finetune mode detection in recipe to include "sft" and "peft" substrings.
Documentation `examples/conversion/README.md`, `examples/conversion/adapter/README.md`	Added comprehensive documentation for adapter export and verification workflow, covering export, verification, and streaming of LoRA/DoRA adapters with examples and script descriptions.
Example Scripts `examples/conversion/adapter/export_adapter.py`, `examples/conversion/adapter/verify_adapter.py`, `examples/conversion/adapter/stream_adapter_weights.py`	New scripts for exporting Megatron PEFT checkpoints to HuggingFace format, verifying exports via logit comparison, and streaming adapter tensors; stream script location references updated.
Core Bridge Implementation `src/megatron/bridge/models/conversion/auto_bridge.py`, `src/megatron/bridge/models/conversion/peft_bridge.py`	Added public methods save_hf_adapter and export_adapter_ckpt to AutoBridge for adapter export; added helper functions infer_target_modules_from_adapter_weights and build_adapter_config_dict to peft_bridge; enhanced adapter weight merging with float32 precision handling.
Tests `tests/functional_tests/models/qwen/test_qwen3_peft_export.py`, `tests/unit_tests/models/test_adapter_export.py`	Comprehensive functional test suite validating Qwen3 adapter export artifacts and PEFT library compatibility; unit tests covering adapter config building, target module inference, precision handling, and end-to-end export flow.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant export_adapter.py
    participant AutoBridge
    participant MegatronModel
    participant HFAdapter as HuggingFace PEFT
    participant FileSystem

    User->>export_adapter.py: Run export script
    export_adapter.py->>AutoBridge: Load from pretrained HF model
    AutoBridge->>HFAdapter: Initialize with HF model
    export_adapter.py->>AutoBridge: export_adapter_ckpt(megatron_ckpt)
    AutoBridge->>MegatronModel: Load LoRA config from run_config.yaml
    MegatronModel->>MegatronModel: Materialize base + LoRA weights
    AutoBridge->>MegatronModel: Load adapter weights from checkpoint
    AutoBridge->>AutoBridge: save_hf_adapter()
    AutoBridge->>HFAdapter: Extract adapter weights
    AutoBridge->>FileSystem: Write adapter_config.json
    AutoBridge->>FileSystem: Write adapter_model.safetensors
    FileSystem-->>User: Adapter artifacts saved

sequenceDiagram
    participant User
    participant verify_adapter.py
    participant HFModel as HF Base Model
    participant PEFTAdapter as PEFT Adapter
    participant MegatronModel
    participant Comparison

    User->>verify_adapter.py: Run verification script
    verify_adapter.py->>HFModel: Load base model
    verify_adapter.py->>HFModel: Compute logits for prompt
    HFModel-->>verify_adapter.py: Base logits
    verify_adapter.py->>PEFTAdapter: Load via PEFT library
    verify_adapter.py->>PEFTAdapter: Compute logits
    PEFTAdapter-->>verify_adapter.py: PEFT logits
    verify_adapter.py->>Comparison: Compare top-k logits
    Comparison-->>verify_adapter.py: PEFT verification result
    
    alt Megatron verification requested
        verify_adapter.py->>MegatronModel: Load from checkpoint
        MegatronModel->>MegatronModel: Construct LoRA model
        MegatronModel->>MegatronModel: Load merged weights
        verify_adapter.py->>MegatronModel: Compute logits
        MegatronModel-->>verify_adapter.py: Megatron logits
        verify_adapter.py->>Comparison: Compare PEFT vs Megatron
        Comparison-->>User: Final PASSED/FAILED
    else
        verify_adapter.py-->>User: PEFT verification result
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

chore: Change submodule pointer for release #2191: Updates the same 3rdparty/Megatron-LM submodule pointer, indicating related work on Megatron-LM integration.

Suggested labels

Run CICD

Suggested reviewers

ananthsub
ko3n1g

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.10% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding HF PEFT adapter export functionality for LoRA/DoRA checkpoints, which is the central feature of this pull request.
Test Results For Major Changes	✅ Passed	PR documentation claims 19 unit tests and 7 functional tests, matching actual test file counts with comprehensive coverage of new adapter export features and float32 merge fix.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yuya/add-hf-adapter-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

tests/unit_tests/models/test_adapter_export.py (1)

39-297: Add a module-level test category marker and reuse fixtures for common setup.

The file repeats the same distributed patch context and does not label tests with a pytest category.

Suggested refactor

 import pytest
 import torch
@@
 from megatron.bridge.models.conversion.peft_bridge import (
@@
 )
 
+pytestmark = pytest.mark.unit
+
+@pytest.fixture
+def no_distributed():
+    with (
+        patch("torch.distributed.is_available", return_value=False),
+        patch("torch.distributed.is_initialized", return_value=False),
+    ):
+        yield
+
@@
-    def test_save_creates_files(self, tmp_path):
+    def test_save_creates_files(self, tmp_path, no_distributed):
@@
-        with (
-            patch("torch.distributed.is_available", return_value=False),
-            patch("torch.distributed.is_initialized", return_value=False),
-        ):
-            from megatron.bridge.models.conversion.auto_bridge import AutoBridge
-
-            AutoBridge.save_hf_adapter(
-                mock_bridge,
-                model=[MagicMock()],
-                path=output_dir,
-                peft_config=lora,
-                base_model_name_or_path="test/model",
-            )
+        from megatron.bridge.models.conversion.auto_bridge import AutoBridge
+        AutoBridge.save_hf_adapter(
+            mock_bridge,
+            model=[MagicMock()],
+            path=output_dir,
+            peft_config=lora,
+            base_model_name_or_path="test/model",
+        )

As per coding guidelines, `tests/**/*.py: Use pytest fixtures for common setup in unit tests` and `Use 'pytest.mark' to categorize tests (unit, integration, system)`.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/test_adapter_export.py` around lines 39 - 297, Add a
module-level pytest marker (e.g., pytestmark = pytest.mark.unit) at top of the
file and refactor repeated setup into fixtures: create a fixture (e.g.,
mock_distributed) that yields the patched
torch.distributed.is_available/is_initialized context and another fixture (e.g.,
mock_bridge_with_weights) that returns a MagicMock configured similarly to how
mock_bridge is created in TestSaveHfAdapter tests; update tests like
TestSaveHfAdapter.test_save_creates_files, test_save_raises_on_empty_adapter,
and test_save_infers_base_model_path to accept and use these fixtures instead of
repeating patch/context creation and mock_bridge construction, and keep
references to AutoBridge.save_hf_adapter and the MagicMock
export_adapter_weights behavior unchanged.

tests/functional_tests/models/qwen/test_qwen3_peft_export.py (2)

131-133: Add pytest markers to categorize the test class.

Per coding guidelines, tests should use pytest.mark for categorization. Consider adding markers to indicate this is a functional test and document any hardware/environment requirements.

♻️ Proposed fix

+@pytest.mark.functional
+@pytest.mark.slow  # if applicable
 class TestQwen3PeftExport:
     """Functional tests for Qwen3 LoRA adapter export to HuggingFace PEFT format."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py` around lines
131 - 133, Add pytest markers to the TestQwen3PeftExport class by decorating it
with appropriate pytest.mark annotations (e.g., pytest.mark.functional and any
required hardware/environment markers such as pytest.mark.gpu or
pytest.mark.requires_internet) so the test suite can categorize and selectively
run it; update the class definition (TestQwen3PeftExport) to include these
markers and document specific environment needs in the marker names or a short
docstring comment immediately above the class.

158-179: Consider moving repeated import to module level.

The from safetensors.torch import load_file import appears in three test methods (lines 160, 183, 195). Moving it to the module-level imports would be cleaner.

♻️ Proposed refactor

Add to imports section:

 from megatron.bridge.training.model_load_save import temporary_distributed_context
+from safetensors.torch import load_file

Then remove the local imports from test methods.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py` around lines
158 - 179, The repeated local import "from safetensors.torch import load_file"
used in tests like test_safetensors_weight_pairs should be moved to the
module-level imports: add a single "from safetensors.torch import load_file" at
the top of the test file and remove the in-function imports from each test
method (e.g., test_safetensors_weight_pairs and the other safetensors-related
test functions) so all tests reuse the same module-level symbol.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/conversion/adapter/README.md`:
- Around line 51-55: The fenced output block showing the adapter file tree is
missing a language label (triggers MD040); update the opening fence from ``` to
```text for the block containing "my_adapter/ ├── adapter_config.json └──
adapter_model.safetensors" (in the README's fenced code block) so the snippet is
marked as plain text—apply the same change to any other similar unlabeled
file-tree fences in the README.
- Around line 35-78: Update the example command invocations in README.md to use
the project's recommended runner by prefixing each "python" call with "uv run"
for both export_adapter.py and verify_adapter.py examples (e.g., change the
export command invoking export_adapter.py with --hf-model-id,
--megatron-peft-checkpoint, --output-hf-path and the verify commands invoking
verify_adapter.py with --hf-model-id, --hf-adapter-path,
--megatron-peft-checkpoint to use "uv run python ..."). Ensure every occurrence
of "python examples/conversion/adapter/export_adapter.py" and "python
examples/conversion/adapter/verify_adapter.py" in the README is replaced so the
shown CLI examples match the rest of the conversion docs.

In `@examples/conversion/adapter/verify_adapter.py`:
- Around line 163-177: The block that enters a distributed context with
temporary_distributed_context(ctx.__enter__()) can leak the context on
exceptions and currently only loads weights into model[0]; wrap the code after
ctx.__enter__() in try/finally (or try/except/finally) to ensure ctx.__exit__()
is always called, and when loading the checkpoint (dist_checkpointing.load)
determine whether keys are "model" or "model0"/"model1"/... then iterate over
the list returned by provide_distributed_model and call load_state_dict on each
chunk (e.g., for i, m in enumerate(model): select loaded_sd[f"model{i}"] if
per-chunk keys exist, otherwise use loaded_sd["model"]) so all model chunks
receive their corresponding weights; reference functions:
temporary_distributed_context, provide_distributed_model,
_generate_model_state_dict, apply_peft_adapter_filter_to_state_dict,
dist_checkpointing.load, and model[0].load_state_dict.

In `@scripts/training/run_recipe.py`:
- Line 212: Update the inference failure message to list the current accepted
finetune keywords to match the detection logic: where the code computes
has_finetune (variable name has_finetune) from lowered including "finetune",
"sft", or "peft", change the error/help text emitted on inference failure to
mention "finetune", "sft", and "peft" (and still include "pretrain" if relevant)
so the guidance matches the logic in run_recipe.py around has_finetune.

In `@src/megatron/bridge/models/conversion/auto_bridge.py`:
- Around line 1051-1056: The code only calls model[0].load_state_dict(...),
leaving other chunks uninitialized; change this to iterate over all model chunks
and load each chunk's state dict: after computing sharded_state_dict,
apply_peft_adapter_filter_to_state_dict, and loaded_sd/model_key, loop over
enumerate(model) and for each index i determine a chunk-specific key by
preferring f"{model_key}.{i}" (or search for the first loaded_sd key that
startswith f"{model_key}.{i}") and call
model[i].load_state_dict(loaded_sd[chunk_key], strict=False); if no per-index
keys exist, fall back to loading the same loaded_sd[model_key] into every
model[i].load_state_dict(...) so all shards are populated.

In `@src/megatron/bridge/models/conversion/peft_bridge.py`:
- Around line 839-851: Add explicit shape validation before performing the LoRA
merge: compute or infer the expected delta tensor shape (based on
base_weight.shape and dim) and validate that the transformed tensors used by
LoRAMerge (e.g., linear_out_weight, linear_in_weight after casting/moving:
linear_out_on_base, linear_in_on_base) will produce a delta whose shape matches
base_weight.shape; if the shapes mismatch, raise a clear exception. Update the
merge call site in the function that uses orig_dtype, base_weight, LoRAMerge,
linear_out_weight, linear_in_weight, alpha, and dim to perform this check right
before calling merger.merge and only proceed to return merged.to(orig_dtype)
when the validation passes.

In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py`:
- Around line 73-89: The fixture qwen3_toy_model_dir currently pulls a tokenizer
from the network via AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B"); replace
this network dependency by creating and saving a minimal local tokenizer into
model_dir (e.g., construct a simple tokenizer instance with minimal vocab/config
and call tokenizer.save_pretrained(model_dir) instead of
AutoTokenizer.from_pretrained) so tests don’t require HF Hub access, or
alternatively annotate the fixture with `@pytest.mark.network` if you intend to
allow external network in CI; update the code references around the
qwen3_toy_model_dir fixture and remove the AutoTokenizer.from_pretrained call.

---

Nitpick comments:
In `@tests/functional_tests/models/qwen/test_qwen3_peft_export.py`:
- Around line 131-133: Add pytest markers to the TestQwen3PeftExport class by
decorating it with appropriate pytest.mark annotations (e.g.,
pytest.mark.functional and any required hardware/environment markers such as
pytest.mark.gpu or pytest.mark.requires_internet) so the test suite can
categorize and selectively run it; update the class definition
(TestQwen3PeftExport) to include these markers and document specific environment
needs in the marker names or a short docstring comment immediately above the
class.
- Around line 158-179: The repeated local import "from safetensors.torch import
load_file" used in tests like test_safetensors_weight_pairs should be moved to
the module-level imports: add a single "from safetensors.torch import load_file"
at the top of the test file and remove the in-function imports from each test
method (e.g., test_safetensors_weight_pairs and the other safetensors-related
test functions) so all tests reuse the same module-level symbol.

In `@tests/unit_tests/models/test_adapter_export.py`:
- Around line 39-297: Add a module-level pytest marker (e.g., pytestmark =
pytest.mark.unit) at top of the file and refactor repeated setup into fixtures:
create a fixture (e.g., mock_distributed) that yields the patched
torch.distributed.is_available/is_initialized context and another fixture (e.g.,
mock_bridge_with_weights) that returns a MagicMock configured similarly to how
mock_bridge is created in TestSaveHfAdapter tests; update tests like
TestSaveHfAdapter.test_save_creates_files, test_save_raises_on_empty_adapter,
and test_save_infers_base_model_path to accept and use these fixtures instead of
repeating patch/context creation and mock_bridge construction, and keep
references to AutoBridge.save_hf_adapter and the MagicMock
export_adapter_weights behavior unchanged.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3e44b4 and b551e45.

📒 Files selected for processing (11)

3rdparty/Megatron-LM
examples/conversion/README.md
examples/conversion/adapter/README.md
examples/conversion/adapter/export_adapter.py
examples/conversion/adapter/stream_adapter_weights.py
examples/conversion/adapter/verify_adapter.py
scripts/training/run_recipe.py
src/megatron/bridge/models/conversion/auto_bridge.py
src/megatron/bridge/models/conversion/peft_bridge.py
tests/functional_tests/models/qwen/test_qwen3_peft_export.py
tests/unit_tests/models/test_adapter_export.py

coderabbitai · 2026-02-26T23:37:47Z

examples/conversion/adapter/README.md

+```bash
+python examples/conversion/adapter/export_adapter.py \
+    --hf-model-id meta-llama/Llama-3.2-1B \
+    --megatron-peft-checkpoint /path/to/finetune_ckpt \
+    --output-hf-path ./my_adapter
+```
+
+| Argument | Description |
+|---|---|
+| `--hf-model-id` | HuggingFace model name or local path (architecture + base weights) |
+| `--megatron-peft-checkpoint` | Path to the Megatron-Bridge distributed checkpoint containing LoRA adapter weights |
+| `--output-hf-path` | Output directory (default: `./my_adapter`) |
+| `--trust-remote-code` | Allow custom code from the HuggingFace repository |
+
+**Output structure:**
+
+```
+my_adapter/
+├── adapter_config.json
+└── adapter_model.safetensors
+```
+
+### 2. `verify_adapter.py` — Export Verification
+
+Loads the exported adapter with the `peft` library and runs verification
+checks:
+
+- The PEFT model logits must differ from the base model (adapter has effect).
+- When `--megatron-peft-checkpoint` is provided, the top-k predicted tokens
+  from the PEFT model must match those from the Megatron model with merged
+  weights.
+
+```bash
+# Quick check (PEFT-only, no Megatron comparison)
+python examples/conversion/adapter/verify_adapter.py \
+    --hf-model-id meta-llama/Llama-3.2-1B \
+    --hf-adapter-path ./my_adapter
+
+# Full verification (compares against Megatron checkpoint)
+python examples/conversion/adapter/verify_adapter.py \
+    --hf-model-id meta-llama/Llama-3.2-1B \
+    --hf-adapter-path ./my_adapter \
+    --megatron-peft-checkpoint /path/to/finetune_ckpt/iter_0000020
+```


⚠️ Potential issue | 🟡 Minor

Align command examples with uv run usage.

Use uv run python ... in this README to match the rest of the conversion examples and avoid environment mismatch.

Suggested doc fix

-python examples/conversion/adapter/export_adapter.py \ +uv run python examples/conversion/adapter/export_adapter.py \ --hf-model-id meta-llama/Llama-3.2-1B \ --megatron-peft-checkpoint /path/to/finetune_ckpt \ --output-hf-path ./my_adapter @@ -python examples/conversion/adapter/verify_adapter.py \ +uv run python examples/conversion/adapter/verify_adapter.py \ --hf-model-id meta-llama/Llama-3.2-1B \ --hf-adapter-path ./my_adapter @@ -python examples/conversion/adapter/verify_adapter.py \ +uv run python examples/conversion/adapter/verify_adapter.py \ --hf-model-id meta-llama/Llama-3.2-1B \ --hf-adapter-path ./my_adapter \ --megatron-peft-checkpoint /path/to/finetune_ckpt/iter_0000020

As per coding guidelines, `{**/*.sh,examples/**/*.py}: Use 'uv run' to execute scripts instead of activating a virtual environment and calling 'python' directly`.

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/conversion/adapter/README.md` around lines 35 - 78, Update the example command invocations in README.md to use the project's recommended runner by prefixing each "python" call with "uv run" for both export_adapter.py and verify_adapter.py examples (e.g., change the export command invoking export_adapter.py with --hf-model-id, --megatron-peft-checkpoint, --output-hf-path and the verify commands invoking verify_adapter.py with --hf-model-id, --hf-adapter-path, --megatron-peft-checkpoint to use "uv run python ..."). Ensure every occurrence of "python examples/conversion/adapter/export_adapter.py" and "python examples/conversion/adapter/verify_adapter.py" in the README is replaced so the shown CLI examples match the rest of the conversion docs.

coderabbitai · 2026-02-26T23:37:47Z

examples/conversion/adapter/README.md

+```
+my_adapter/
+├── adapter_config.json
+└── adapter_model.safetensors
+```


⚠️ Potential issue | 🟡 Minor

Add a language to the fenced output block.

The output structure block is missing a fence language and triggers MD040.

Suggested markdownlint fix

-``` +```text my_adapter/ ├── adapter_config.json └── adapter_model.safetensors

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/conversion/adapter/README.md` around lines 51 - 55, The fenced output block showing the adapter file tree is missing a language label (triggers MD040); update the opening fence from ``` to ```text for the block containing "my_adapter/ ├── adapter_config.json └── adapter_model.safetensors" (in the README's fenced code block) so the snippet is marked as plain text—apply the same change to any other similar unlabeled file-tree fences in the README.

examples/conversion/adapter/verify_adapter.py

coderabbitai · 2026-02-26T23:37:47Z

scripts/training/run_recipe.py

    lowered = recipe_name.lower()
    has_pretrain = "pretrain" in lowered
-    has_finetune = "finetune" in lowered
+    has_finetune = "finetune" in lowered or "sft" in lowered or "peft" in lowered


⚠️ Potential issue | 🟡 Minor

Update the inference error text to reflect new accepted finetune keywords.

Line 212 now recognizes "sft" and "peft" for finetune inference, but the failure message still instructs only "pretrain"/"finetune". Please align the message so failed inference guidance is accurate.

Suggested patch

ERR_INFER_MODE_FAILED = ( "Unable to infer training mode from recipe name. " - "Please include 'pretrain' or 'finetune' in the recipe name or pass --mode explicitly." + "Please include 'pretrain', 'finetune', 'sft', or 'peft' in the recipe name, " + "or pass --mode explicitly." )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/training/run_recipe.py` at line 212, Update the inference failure message to list the current accepted finetune keywords to match the detection logic: where the code computes has_finetune (variable name has_finetune) from lowered including "finetune", "sft", or "peft", change the error/help text emitted on inference failure to mention "finetune", "sft", and "peft" (and still include "pretrain" if relevant) so the guidance matches the logic in run_recipe.py around has_finetune.

src/megatron/bridge/models/conversion/auto_bridge.py

src/megatron/bridge/models/conversion/peft_bridge.py

tests/functional_tests/models/qwen/test_qwen3_peft_export.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-02-26T23:43:31Z

/ok to test 53a1d64

Patch get_tensor_model_parallel_world_size to return 1 in TestMergeSingleAdapterWeight so LoRAMerge.merge() takes the tp_size==1 path and avoids requiring an initialized tensor model parallel group. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-09T20:05:50Z

/ok to test 578403d

yaoyu-33 · 2026-03-09T21:46:40Z

/ok to test 578403d

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-09T23:04:04Z

/ok to test d0c76b2

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-10T00:32:43Z

/ok to test cd79d06

yaoyu-33 · 2026-03-10T03:42:11Z

/ok to test cd79d06

yaoyu-33 · 2026-03-10T04:19:56Z

/ok to test cd79d06

yaoyu-33 · 2026-03-10T04:34:44Z

/ok to test cd79d06

…xport Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor # Conflicts: # pyproject.toml # uv.lock

yaoyu-33 · 2026-03-10T15:30:05Z

/ok to test 3372000

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-10T16:02:18Z

/ok to test 6eb53b9

Add TestExportAdapterCkpt with 10 tests covering the orchestrator method that was previously uncovered by Codecov. Tests mock heavy infrastructure (dist_checkpointing, distributed context, model materialisation) and exercise config parsing, VLMLoRA selection, error paths, float32 dtype enforcement, and base_model_name fallback logic. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-11T02:06:27Z

/ok to test eabcce3

…xport Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-12T21:41:15Z

/ok to test 8fa62fa

…xport Made-with: Cursor # Conflicts: # uv.lock

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-13T20:29:24Z

/ok to test f2104df

yaoyu-33 · 2026-03-14T03:49:25Z

/ok to test 4e1fdd2

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g · 2026-03-16T13:48:10Z

/ok to test 3ba210d

yaoyu-33 added 3 commits February 26, 2026 13:36

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

yaoyu-33 added 2 commits February 26, 2026 16:38

chore: revert unintended 3rdparty/Megatron-LM submodule change

6461edc

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

docs: use 'uv run python' in adapter example docstrings and READMEs

53a1d64

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

copy-pr-bot bot temporarily deployed to test February 26, 2026 23:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 26, 2026 23:55 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 27, 2026 00:02 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 27, 2026 00:12 Inactive

ci: empty commit to re-trigger CI

d0c76b2

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

ci: re-trigger (nvrx/converter/nemotronh flaky again)

cd79d06

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge remote-tracking branch 'origin/main' into yuya/add-hf-adapter-e…

e2f8cc7

…xport Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor # Conflicts: # pyproject.toml # uv.lock

build: regenerate uv.lock after main merge

6eb53b9

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

cuichenx previously approved these changes Mar 11, 2026

View reviewed changes

HollowMan6 mentioned this pull request Mar 12, 2026

[megatron] feat: checkpoint save as HF PEFT format verl-project/verl#5575

Merged

8 tasks

yaoyu-33 added 2 commits March 12, 2026 14:35

Merge remote-tracking branch 'origin/main' into yuya/add-hf-adapter-e…

28aaf1a

…xport Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

build: regenerate uv.lock after main merge

8fa62fa

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 added 2 commits March 13, 2026 12:35

Merge remote-tracking branch 'origin/main' into yuya/add-hf-adapter-e…

1b02fa6

…xport Made-with: Cursor # Conflicts: # uv.lock

build: regenerate uv.lock after main merge

f2104df

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

cuichenx previously approved these changes Mar 14, 2026

View reviewed changes

Merge branch 'main' into yuya/add-hf-adapter-export

4e1fdd2

Fix uv.lock

3ba210d

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g approved these changes Mar 16, 2026

View reviewed changes

Conversation

yaoyu-33 commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Feb 26, 2026

Uh oh!

yaoyu-33 commented Mar 9, 2026

Uh oh!

yaoyu-33 commented Mar 9, 2026

Uh oh!

yaoyu-33 commented Mar 9, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 11, 2026

Uh oh!

yaoyu-33 commented Mar 12, 2026

Uh oh!

yaoyu-33 commented Mar 13, 2026

Uh oh!

yaoyu-33 commented Mar 14, 2026

Uh oh!

ko3n1g commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaoyu-33 commented Feb 26, 2026 •

edited by coderabbitai bot

Loading