cp: `feat: Megatron SFT LoRA (1629)` into `r0.5.0` by chtruong814 · Pull Request #1741 · NVIDIA-NeMo/RL

chtruong814 · 2026-01-08T12:58:49Z

beep boop [🤖]: Hi @arendu 👋,

we've cherry picked #1629 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Release Notes

New Features
- Added LoRA (Low-Rank Adaptation) support for Megatron-based fine-tuning
- Introduced example configurations for SFT with Megatron-LoRA
Documentation
- Comprehensive guide covering LoRA configuration parameters for DTensor and Megatron approaches with detailed parameter explanations and usage examples
Tests
- Added functional and integration tests for LoRA with automodel and Megatron backends

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai · 2026-01-08T13:04:16Z

📝 Walkthrough

Walkthrough

This pull request adds comprehensive PEFT (LoRA) support for Megatron backend in supervised fine-tuning workflows. Changes include updated documentation and configuration specifications for Megatron LoRA parameters, implementation of PEFT initialization and checkpoint handling in the Megatron policy worker, new example configurations for Megatron LoRA setups, and corresponding functional and integration test scripts.

Changes

Cohort / File(s)	Summary
Documentation & Config Schema `docs/guides/sft.md`, `examples/configs/sft.yaml`	Added Megatron PEFT (LoRA) configuration blocks with detailed parameter definitions, inline comments on optimizer behavior, and parameter documentation. Updated DTensor LoRA notes to reflect v2 as default backend.
Example Configurations `examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml`	New YAML configuration file demonstrating full SFT run setup for Llama-3.1-8B with Megatron LoRA on 1N8G cluster, including policy, optimizer, data, and logging configurations.
PEFT Integration `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Added PEFT/LoRA initialization from config, pre-wrap hook registration for PEFT, validation to prevent MOE router freezing with PEFT, and checkpoint loading behavior adjustment when PEFT is enabled.
Functional Tests `tests/functional/L1_Functional_Tests_GPU.sh`, `tests/functional/sft_automodel_lora.sh`, `tests/functional/sft_megatron_lora.sh`	New Megatron LoRA functional test script with setup, experiment execution, and metrics validation. Updated DTensor LoRA test config key reference. Added test invocations to GPU test suite.
Integration Test Suite `tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh`, `tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh`, `tests/test_suites/nightly.txt`	New Megatron LoRA integration test script with environment setup, experiment execution, TensorBoard conversion, and conditional metrics validation. Updated W&B project name, added EOF newlines, and registered new test in nightly suite.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: Megatron SFT LoRA #1629: Implements overlapping Megatron LoRA/PEFT support with identical modifications to megatron_policy_worker.py for pre-wrap hooks and checkpoint handling
fix: Fixes to make Megatron backend match dtensor #1389: Modifies megatron_policy_worker.py with overlapping Megatron backend changes including gradient/TP handling utilities
feat: LoRA SFT support for DTensorV2 path #1556: Implements LoRA/PEFT configuration schema additions similar to this PR but targets DTensor V2 worker integration

Suggested labels

r0.4.0, CI:L1

Suggested reviewers

terrykong

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change as a Megatron SFT LoRA feature addition, directly corresponding to the changeset's primary objective of implementing PEFT (LoRA) integration for Megatron policy workers and related documentation.
Test Results For Major Changes	✅ Passed	PR adds Megatron SFT LoRA support with comprehensive functional tests including metric validation, example configuration, and nightly test suite entries ensuring continuous feature validation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @docs/guides/sft.md:
- Around line 212-260: The docs have inconsistent parameter names: the YAML uses
lora_A_init_method and lora_B_init_method but the "Megatron Parameter Details"
uses lora_A_init and lora_B_init; update the parameter names in the details
section to lora_A_init_method and lora_B_init_method (and keep their allowed
values/descriptions unchanged) so they match the config block and examples
(reference the symbols lora_A_init_method and lora_B_init_method).

In @nemo_rl/models/policy/workers/megatron_policy_worker.py:
- Around line 340-354: The comment about toggling finetune is contradictory;
update the comment near the checkpoint-loading logic in megatron_policy_worker
(around the block that sets should_load_checkpoint when cfg.peft is not None) to
clearly state that setting cfg.checkpoint.finetune = False causes optimizer and
RNG states to be loaded (i.e., resumes training) when a PEFT checkpoint is
present; replace the confusing lines with a concise note like "When resuming
PEFT training from a checkpoint, set cfg.checkpoint.finetune = False to enable
loading optimizer and RNG states from the checkpoint."

🧹 Nitpick comments (1)

examples/configs/sft.yaml (1)
132-135: Consider more precise optimizer comment.

The comment "When weight decay is set, it actually uses AdamW" appears on both the optimizer name and weight_decay fields. While technically correct, it might be clearer to note that:

Line 132: The "adam" optimizer with non-zero weight_decay automatically becomes AdamW in Megatron

Line 135: Non-zero weight_decay triggers AdamW behavior

This would help users understand that the behavior is conditional on the weight_decay value.
📝 Suggested comment improvements
-    optimizer: "adam" # When weight decay is set, it actually uses AdamW 
+    optimizer: "adam" # Automatically becomes AdamW when weight_decay > 0
     lr: 5.0e-6
     min_lr: 4.9999e-6
-    weight_decay: 0.1 # When weight decay is set, it actually uses AdamW
+    weight_decay: 0.1 # Non-zero value triggers AdamW behavior (decoupled weight decay)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e576a6 and 2b974b4.

📒 Files selected for processing (10)

docs/guides/sft.md
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
examples/configs/sft.yaml
nemo_rl/models/policy/workers/megatron_policy_worker.py
tests/functional/L1_Functional_Tests_GPU.sh
tests/functional/sft_automodel_lora.sh
tests/functional/sft_megatron_lora.sh
tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
tests/test_suites/nightly.txt

🧰 Additional context used

📓 Path-based instructions (10)

examples/configs/recipes/**/*.yaml