Skip to content

cp: feat: Megatron SFT LoRA (1629) into r0.5.0#1741

Merged
yuki-97 merged 1 commit intor0.5.0from
cherry-pick-1629-r0.5.0
Jan 8, 2026
Merged

cp: feat: Megatron SFT LoRA (1629) into r0.5.0#1741
yuki-97 merged 1 commit intor0.5.0from
cherry-pick-1629-r0.5.0

Conversation

@chtruong814
Copy link
Contributor

@chtruong814 chtruong814 commented Jan 8, 2026

beep boop [🤖]: Hi @arendu 👋,

we've cherry picked #1629 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Release Notes

  • New Features

    • Added LoRA (Low-Rank Adaptation) support for Megatron-based fine-tuning
    • Introduced example configurations for SFT with Megatron-LoRA
  • Documentation

    • Comprehensive guide covering LoRA configuration parameters for DTensor and Megatron approaches with detailed parameter explanations and usage examples
  • Tests

    • Added functional and integration tests for LoRA with automodel and Megatron backends

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@chtruong814 chtruong814 requested review from a team as code owners January 8, 2026 12:58
@chtruong814 chtruong814 requested a review from arendu January 8, 2026 12:58
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 8, 2026
@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Jan 8, 2026
@yuki-97 yuki-97 enabled auto-merge (squash) January 8, 2026 13:00
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

This pull request adds comprehensive PEFT (LoRA) support for Megatron backend in supervised fine-tuning workflows. Changes include updated documentation and configuration specifications for Megatron LoRA parameters, implementation of PEFT initialization and checkpoint handling in the Megatron policy worker, new example configurations for Megatron LoRA setups, and corresponding functional and integration test scripts.

Changes

Cohort / File(s) Summary
Documentation & Config Schema
docs/guides/sft.md, examples/configs/sft.yaml
Added Megatron PEFT (LoRA) configuration blocks with detailed parameter definitions, inline comments on optimizer behavior, and parameter documentation. Updated DTensor LoRA notes to reflect v2 as default backend.
Example Configurations
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
New YAML configuration file demonstrating full SFT run setup for Llama-3.1-8B with Megatron LoRA on 1N8G cluster, including policy, optimizer, data, and logging configurations.
PEFT Integration
nemo_rl/models/policy/workers/megatron_policy_worker.py
Added PEFT/LoRA initialization from config, pre-wrap hook registration for PEFT, validation to prevent MOE router freezing with PEFT, and checkpoint loading behavior adjustment when PEFT is enabled.
Functional Tests
tests/functional/L1_Functional_Tests_GPU.sh, tests/functional/sft_automodel_lora.sh, tests/functional/sft_megatron_lora.sh
New Megatron LoRA functional test script with setup, experiment execution, and metrics validation. Updated DTensor LoRA test config key reference. Added test invocations to GPU test suite.
Integration Test Suite
tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh, tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh, tests/test_suites/nightly.txt
New Megatron LoRA integration test script with environment setup, experiment execution, TensorBoard conversion, and conditional metrics validation. Updated W&B project name, added EOF newlines, and registered new test in nightly suite.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

r0.4.0, CI:L1

Suggested reviewers

  • terrykong
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change as a Megatron SFT LoRA feature addition, directly corresponding to the changeset's primary objective of implementing PEFT (LoRA) integration for Megatron policy workers and related documentation.
Test Results For Major Changes ✅ Passed PR adds Megatron SFT LoRA support with comprehensive functional tests including metric validation, example configuration, and nightly test suite entries ensuring continuous feature validation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @docs/guides/sft.md:
- Around line 212-260: The docs have inconsistent parameter names: the YAML uses
lora_A_init_method and lora_B_init_method but the "Megatron Parameter Details"
uses lora_A_init and lora_B_init; update the parameter names in the details
section to lora_A_init_method and lora_B_init_method (and keep their allowed
values/descriptions unchanged) so they match the config block and examples
(reference the symbols lora_A_init_method and lora_B_init_method).

In @nemo_rl/models/policy/workers/megatron_policy_worker.py:
- Around line 340-354: The comment about toggling finetune is contradictory;
update the comment near the checkpoint-loading logic in megatron_policy_worker
(around the block that sets should_load_checkpoint when cfg.peft is not None) to
clearly state that setting cfg.checkpoint.finetune = False causes optimizer and
RNG states to be loaded (i.e., resumes training) when a PEFT checkpoint is
present; replace the confusing lines with a concise note like "When resuming
PEFT training from a checkpoint, set cfg.checkpoint.finetune = False to enable
loading optimizer and RNG states from the checkpoint."
🧹 Nitpick comments (1)
examples/configs/sft.yaml (1)

132-135: Consider more precise optimizer comment.

The comment "When weight decay is set, it actually uses AdamW" appears on both the optimizer name and weight_decay fields. While technically correct, it might be clearer to note that:

  • Line 132: The "adam" optimizer with non-zero weight_decay automatically becomes AdamW in Megatron
  • Line 135: Non-zero weight_decay triggers AdamW behavior

This would help users understand that the behavior is conditional on the weight_decay value.

📝 Suggested comment improvements
-    optimizer: "adam" # When weight decay is set, it actually uses AdamW 
+    optimizer: "adam" # Automatically becomes AdamW when weight_decay > 0
     lr: 5.0e-6
     min_lr: 4.9999e-6
-    weight_decay: 0.1 # When weight decay is set, it actually uses AdamW
+    weight_decay: 0.1 # Non-zero value triggers AdamW behavior (decoupled weight decay)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e576a6 and 2b974b4.

📒 Files selected for processing (10)
  • docs/guides/sft.md
  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
  • examples/configs/sft.yaml
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/functional/L1_Functional_Tests_GPU.sh
  • tests/functional/sft_automodel_lora.sh
  • tests/functional/sft_megatron_lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
  • tests/test_suites/nightly.txt
🧰 Additional context used
📓 Path-based instructions (10)
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes

Files:

  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
  • tests/functional/sft_megatron_lora.sh
  • tests/test_suites/nightly.txt
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
  • tests/functional/sft_automodel_lora.sh
  • examples/configs/sft.yaml
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/functional/L1_Functional_Tests_GPU.sh
  • docs/guides/sft.md
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

  • tests/functional/sft_megatron_lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
  • tests/functional/sft_automodel_lora.sh
  • tests/functional/L1_Functional_Tests_GPU.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • tests/functional/sft_megatron_lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
  • tests/functional/sft_automodel_lora.sh
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/functional/L1_Functional_Tests_GPU.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
tests/test_suites/nightly.txt

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Files:

  • tests/test_suites/nightly.txt
tests/test_suites/**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

tests/test_suites/**/*.sh: When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Files:

  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

  • nemo_rl/models/policy/workers/megatron_policy_worker.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

  • nemo_rl/models/policy/workers/megatron_policy_worker.py
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section

Files:

  • docs/guides/sft.md
🧠 Learnings (6)
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes

Applied to files:

  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

  • tests/functional/sft_megatron_lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

  • tests/functional/sft_megatron_lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
  • tests/functional/L1_Functional_Tests_GPU.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

  • tests/functional/sft_megatron_lora.sh
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/nightly.txt : When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/nightly.txt
  • tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
  • tests/functional/L1_Functional_Tests_GPU.sh
🪛 Ruff (0.14.10)
nemo_rl/models/policy/workers/megatron_policy_worker.py

267-269: Avoid specifying long messages outside the exception class

(TRY003)

🪛 Shellcheck (0.11.0)
tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (13)
tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh (1)

22-22: LGTM!

The wandb project name update from fruit_personal_debug to nemo-rl is appropriate for a shared testing environment.

nemo_rl/models/policy/workers/megatron_policy_worker.py (3)

28-28: LGTM!

The new imports appropriately support the PEFT/LoRA integration introduced in this PR.

Also applies to: 53-53, 90-90


263-269: LGTM!

The validation correctly prevents the unsupported combination of MOE router freezing with PEFT. The error message is clear and actionable.


288-314: LGTM!

The PEFT/LoRA object creation and hook registration logic is well-structured:

  • LoRA configuration is properly extracted and validated
  • The pre-wrap hook composition ensures PEFT initialization integrates correctly with the model wrapping process
tests/test_suites/nightly.txt (1)

75-75: LGTM!

The new Megatron LoRA nightly test entry follows the established convention for adding nightly tests.

tests/functional/sft_automodel_lora.sh (1)

31-31: LGTM!

The configuration key path update from policy.dtensor_cfg.lora.enabled to policy.dtensor_cfg.lora_cfg.enabled correctly reflects the schema change for LoRA configuration introduced in this PR.

tests/functional/L1_Functional_Tests_GPU.sh (1)

43-44: LGTM!

The new LoRA functional test invocations follow the established pattern using uv run --no-sync and are appropriately placed within the test suite.

examples/configs/sft.yaml (1)

117-129: LGTM! Well-structured PEFT configuration block.

The PEFT configuration block includes all necessary parameters with sensible defaults. The structure aligns with the documentation updates and the Megatron policy worker implementation.

tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh (1)

1-47: LGTM! Test script follows repository conventions.

The script adheres to established patterns for test suite scripts in this repository:

  • Standard configuration variables (NUM_NODES, NUM_RUNS, NUM_MINUTES) are correctly defined for external test infrastructure
  • Uses uv run as required by coding guidelines
  • Follows consistent cd and argument passing patterns
  • Includes conditional metric validation only when MAX_STEPS is reached
  • Properly cleans up checkpoint directory on success

Based on learnings, the shellcheck warnings about unused variables and unquoted expansions can be safely ignored as they follow repository conventions.

examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml (1)

1-44: LGTM! Well-structured recipe configuration.

The recipe file follows the naming convention and provides a complete, sensible configuration for Megatron LoRA fine-tuning:

  • Correctly disables DTensor and enables Megatron with PEFT
  • Uses appropriate LoRA parameters (dim=128, alpha=128)
  • Separates model and tokenizer (using instruct tokenizer for base model)
  • Configures appropriate training parameters and dataset
tests/functional/sft_megatron_lora.sh (1)

1-51: LGTM! Well-structured functional test.

The functional test script is properly structured with:

  • Cleanup trap to ensure checkpoint directory removal
  • Proper environment setup including git safe.directory configuration
  • Coverage tracking for code coverage metrics
  • Appropriate test configuration with small model (Qwen3-0.6B) for fast testing
  • Metric validation to ensure training loss decreases as expected
  • Follows repository conventions for cd and argument passing
docs/guides/sft.md (2)

171-171: LGTM! Clear backend support statement.

The updated note correctly clarifies that LoRA is supported with DTensor v2 and Megatron backends, with DTensor as the default. The requirement for _v2=true is important for users to know.


174-210: LGTM! Comprehensive DTensor configuration documentation.

The DTensor configuration section provides:

  • Clear YAML configuration example with all parameters
  • Detailed parameter descriptions with typical values
  • Important note about Triton kernel limitation with TP > 1
  • Simple usage example

This will help users configure DTensor LoRA correctly.

Comment on lines +212 to +260
### Megatron Configuration Parameters

The LoRA configuration is specified under the `policy.megatron_cfg.peft` section:

```yaml
policy:
megatron_cfg:
peft:
enabled: false # Set to True to enable LoRA fine-tuning
target_modules: [] # List of module names to apply LoRA, defaults to all linear layers
exclude_modules: [] # List of module names not to apply LoRa.
dim: 32 # LoRA rank (r): controls adaptation capacity
alpha: 32 # LoRA scaling factor (effective lr = alpha/dim)
dropout: 0.0 # Dropout probability for LoRA layers
dropout_position: "pre" # Dropout position: "pre" or "post"
lora_A_init_method: "xavier" # Initialization method for lora A: "xavier" or "uniform"
lora_B_init_method: "zero" # Initialization method for lora B: "zero"
a2a_experimental: false # Enables the experimental All-to-All (A2A) communication strategy.
lora_dtype: None # Weight's dtype
```

### Megatron Parameter Details
- **`enabled`** (bool): Whether to enable LoRA training
- **`target_modules`** (list): Specific module names to apply LoRA. Defaults to all linear layers if the list is left empty. Example: ['linear_qkv', 'linear_proj', 'linear_fc1', 'linear_fc2'].
- 'linear_qkv': Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention.
- 'linear_proj': Apply LoRA to the linear layer used for projecting the output of self-attention.
- 'linear_fc1': Apply LoRA to the first fully-connected layer in MLP.
- 'linear_fc2': Apply LoRA to the second fully-connected layer in MLP.
Target modules can also contain wildcards. For example, you can specify target_modules=['*.layers.0.*.linear_qkv', '*.layers.1.*.linear_qkv'] to add LoRA to only linear_qkv on the first two layers.
- **`exclude_modules`** (List[str], optional): A list of module names not to apply LoRa. It will match all nn.Linear & nn.Linear-adjacent modules whose name does not match any string in exclude_modules. If used, will require target_modules to be empty list or None.
- **`dim`** (int): LoRA rank (r). Lower values = fewer parameters but less capacity. Typical: 4, 8, 16, 32, 64
- **`alpha`** (int): LoRA scaling factor. Effective learning rate multiplier = `alpha/dim`. Typical: 16, 32, 64
- **`dropout`** (float): Dropout probability for regularization, defaults to 0.0
- **`dropout_position`** (str): Apply dropout before ("pre") or after ("post") LoRA
- **`lora_A_init`** (str): Initialization method for lora_A (choices: ['xavier', 'uniform']), defaults to xavier.
- **`lora_B_init`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".
- **`a2a_experimental`** (bool): Enables the experimental All-to-All (A2A) communication strategy. Defaults to False.
- **`lora_dtype`** (torch.dtype): Weight's dtype, by default will use orig_linear's but if they are quantized weights (e.g. 4bit) needs to be specified explicitly.

### Megatron Example Usage
The config uses DTensor by default, so the megatron backend needs to be explicitly enabled.
```sh
uv run examples/run_sft.py \
--config examples/configs/sft.yaml \
policy.dtensor_cfg.enabled=false \
policy.megatron_cfg.enabled=true \
policy.megatron_cfg.peft.enabled=true
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix parameter name inconsistency in documentation.

On Line 246, the parameter details section refers to lora_A_init but the configuration block above (line 227) correctly uses lora_A_init_method. The documentation should use consistent naming.

📝 Proposed fix for parameter name consistency
 - **`dropout`** (float): Dropout probability for regularization, defaults to 0.0
 - **`dropout_position`** (str): Apply dropout before ("pre") or after ("post") LoRA
-- **`lora_A_init`** (str): Initialization method for lora_A (choices: ['xavier', 'uniform']), defaults to xavier.
+- **`lora_A_init_method`** (str): Initialization method for lora_A (choices: ['xavier', 'uniform']), defaults to xavier.
 - **`lora_B_init`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".

Note: Line 247 also uses lora_B_init which is consistent with the config block (line 228: lora_B_init_method). For full consistency, consider:

-- **`lora_B_init`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".
+- **`lora_B_init_method`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".
🤖 Prompt for AI Agents
In @docs/guides/sft.md around lines 212 - 260, The docs have inconsistent
parameter names: the YAML uses lora_A_init_method and lora_B_init_method but the
"Megatron Parameter Details" uses lora_A_init and lora_B_init; update the
parameter names in the details section to lora_A_init_method and
lora_B_init_method (and keep their allowed values/descriptions unchanged) so
they match the config block and examples (reference the symbols
lora_A_init_method and lora_B_init_method).

Comment on lines +340 to +354
if cfg.peft is not None:
should_load_checkpoint = cfg.checkpoint.load is not None and checkpoint_exists(
cfg.checkpoint.load
)
if should_load_checkpoint:
# The finetune toggle is explicitly set to True in order to avoid loading optimizer and RNG states
# This is switched off here in order to load these states from the checkpoint
cfg.checkpoint.finetune = False
else:
should_load_checkpoint = (
cfg.checkpoint.load is not None and checkpoint_exists(cfg.checkpoint.load)
) or (
cfg.checkpoint.pretrained_checkpoint is not None
and checkpoint_exists(cfg.checkpoint.pretrained_checkpoint)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the checkpoint loading comment.

The comment on lines 345-347 is confusing:

# The finetune toggle is explicitly set to True in order to avoid loading optimizer and RNG states
# This is switched off here in order to load these states from the checkpoint
cfg.checkpoint.finetune = False

The comment states "set to True to avoid loading" but then "switched off to load," which seems contradictory. Typically:

  • finetune=True means skip loading optimizer/RNG states (starting fresh fine-tuning)
  • finetune=False means load optimizer/RNG states (resuming training)

The code sets finetune=False, which should enable loading of optimizer/RNG states when resuming PEFT training. Consider clarifying the comment to something like:

# When resuming PEFT training from a checkpoint, we need to load optimizer and RNG states.
# Set finetune=False to enable loading these states from the checkpoint.
cfg.checkpoint.finetune = False
🤖 Prompt for AI Agents
In @nemo_rl/models/policy/workers/megatron_policy_worker.py around lines 340 -
354, The comment about toggling finetune is contradictory; update the comment
near the checkpoint-loading logic in megatron_policy_worker (around the block
that sets should_load_checkpoint when cfg.peft is not None) to clearly state
that setting cfg.checkpoint.finetune = False causes optimizer and RNG states to
be loaded (i.e., resumes training) when a PEFT checkpoint is present; replace
the confusing lines with a concise note like "When resuming PEFT training from a
checkpoint, set cfg.checkpoint.finetune = False to enable loading optimizer and
RNG states from the checkpoint."

@yuki-97 yuki-97 merged commit 53342b1 into r0.5.0 Jan 8, 2026
82 of 88 checks passed
@yuki-97 yuki-97 deleted the cherry-pick-1629-r0.5.0 branch January 8, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation Run CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants