cp: feat: Megatron SFT LoRA (1629) into r0.5.0#1741
Conversation
Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughThis pull request adds comprehensive PEFT (LoRA) support for Megatron backend in supervised fine-tuning workflows. Changes include updated documentation and configuration specifications for Megatron LoRA parameters, implementation of PEFT initialization and checkpoint handling in the Megatron policy worker, new example configurations for Megatron LoRA setups, and corresponding functional and integration test scripts. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @docs/guides/sft.md:
- Around line 212-260: The docs have inconsistent parameter names: the YAML uses
lora_A_init_method and lora_B_init_method but the "Megatron Parameter Details"
uses lora_A_init and lora_B_init; update the parameter names in the details
section to lora_A_init_method and lora_B_init_method (and keep their allowed
values/descriptions unchanged) so they match the config block and examples
(reference the symbols lora_A_init_method and lora_B_init_method).
In @nemo_rl/models/policy/workers/megatron_policy_worker.py:
- Around line 340-354: The comment about toggling finetune is contradictory;
update the comment near the checkpoint-loading logic in megatron_policy_worker
(around the block that sets should_load_checkpoint when cfg.peft is not None) to
clearly state that setting cfg.checkpoint.finetune = False causes optimizer and
RNG states to be loaded (i.e., resumes training) when a PEFT checkpoint is
present; replace the confusing lines with a concise note like "When resuming
PEFT training from a checkpoint, set cfg.checkpoint.finetune = False to enable
loading optimizer and RNG states from the checkpoint."
🧹 Nitpick comments (1)
examples/configs/sft.yaml (1)
132-135: Consider more precise optimizer comment.The comment "When weight decay is set, it actually uses AdamW" appears on both the optimizer name and weight_decay fields. While technically correct, it might be clearer to note that:
- Line 132: The
"adam"optimizer with non-zero weight_decay automatically becomes AdamW in Megatron- Line 135: Non-zero weight_decay triggers AdamW behavior
This would help users understand that the behavior is conditional on the weight_decay value.
📝 Suggested comment improvements
- optimizer: "adam" # When weight decay is set, it actually uses AdamW + optimizer: "adam" # Automatically becomes AdamW when weight_decay > 0 lr: 5.0e-6 min_lr: 4.9999e-6 - weight_decay: 0.1 # When weight decay is set, it actually uses AdamW + weight_decay: 0.1 # Non-zero value triggers AdamW behavior (decoupled weight decay)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
docs/guides/sft.mdexamples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yamlexamples/configs/sft.yamlnemo_rl/models/policy/workers/megatron_policy_worker.pytests/functional/L1_Functional_Tests_GPU.shtests/functional/sft_automodel_lora.shtests/functional/sft_megatron_lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.shtests/test_suites/nightly.txt
🧰 Additional context used
📓 Path-based instructions (10)
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)
Files:
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
examples/configs/recipes/llm/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes
Files:
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year
Files:
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yamltests/functional/sft_megatron_lora.shtests/test_suites/nightly.txttests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.shtests/functional/sft_automodel_lora.shexamples/configs/sft.yamlnemo_rl/models/policy/workers/megatron_policy_worker.pytests/functional/L1_Functional_Tests_GPU.shdocs/guides/sft.mdtests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts
Files:
tests/functional/sft_megatron_lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.shtests/functional/sft_automodel_lora.shtests/functional/L1_Functional_Tests_GPU.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
**/*.{py,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)
Files:
tests/functional/sft_megatron_lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.shtests/functional/sft_automodel_lora.shnemo_rl/models/policy/workers/megatron_policy_worker.pytests/functional/L1_Functional_Tests_GPU.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
tests/test_suites/nightly.txt
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
Files:
tests/test_suites/nightly.txt
tests/test_suites/**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
tests/test_suites/**/*.sh: When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run
Files:
tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code
Files:
nemo_rl/models/policy/workers/megatron_policy_worker.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes
Files:
nemo_rl/models/policy/workers/megatron_policy_worker.py
docs/**/*.md
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section
Files:
docs/guides/sft.md
🧠 Learnings (6)
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes
Applied to files:
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.
Applied to files:
tests/functional/sft_megatron_lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run
Applied to files:
tests/functional/sft_megatron_lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.shtests/functional/L1_Functional_Tests_GPU.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.
Applied to files:
tests/functional/sft_megatron_lora.shtests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/nightly.txt : When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
Applied to files:
tests/test_suites/nightly.txt
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
tests/test_suites/nightly.txttests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.shtests/functional/L1_Functional_Tests_GPU.sh
🪛 Ruff (0.14.10)
nemo_rl/models/policy/workers/megatron_policy_worker.py
267-269: Avoid specifying long messages outside the exception class
(TRY003)
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (13)
tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh (1)
22-22: LGTM!The wandb project name update from
fruit_personal_debugtonemo-rlis appropriate for a shared testing environment.nemo_rl/models/policy/workers/megatron_policy_worker.py (3)
28-28: LGTM!The new imports appropriately support the PEFT/LoRA integration introduced in this PR.
Also applies to: 53-53, 90-90
263-269: LGTM!The validation correctly prevents the unsupported combination of MOE router freezing with PEFT. The error message is clear and actionable.
288-314: LGTM!The PEFT/LoRA object creation and hook registration logic is well-structured:
- LoRA configuration is properly extracted and validated
- The pre-wrap hook composition ensures PEFT initialization integrates correctly with the model wrapping process
tests/test_suites/nightly.txt (1)
75-75: LGTM!The new Megatron LoRA nightly test entry follows the established convention for adding nightly tests.
tests/functional/sft_automodel_lora.sh (1)
31-31: LGTM!The configuration key path update from
policy.dtensor_cfg.lora.enabledtopolicy.dtensor_cfg.lora_cfg.enabledcorrectly reflects the schema change for LoRA configuration introduced in this PR.tests/functional/L1_Functional_Tests_GPU.sh (1)
43-44: LGTM!The new LoRA functional test invocations follow the established pattern using
uv run --no-syncand are appropriately placed within the test suite.examples/configs/sft.yaml (1)
117-129: LGTM! Well-structured PEFT configuration block.The PEFT configuration block includes all necessary parameters with sensible defaults. The structure aligns with the documentation updates and the Megatron policy worker implementation.
tests/test_suites/llm/sft-llama3.1-8b-1n8g-megatron-lora.sh (1)
1-47: LGTM! Test script follows repository conventions.The script adheres to established patterns for test suite scripts in this repository:
- Standard configuration variables (NUM_NODES, NUM_RUNS, NUM_MINUTES) are correctly defined for external test infrastructure
- Uses
uv runas required by coding guidelines- Follows consistent cd and argument passing patterns
- Includes conditional metric validation only when MAX_STEPS is reached
- Properly cleans up checkpoint directory on success
Based on learnings, the shellcheck warnings about unused variables and unquoted expansions can be safely ignored as they follow repository conventions.
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-megatron-lora.yaml (1)
1-44: LGTM! Well-structured recipe configuration.The recipe file follows the naming convention and provides a complete, sensible configuration for Megatron LoRA fine-tuning:
- Correctly disables DTensor and enables Megatron with PEFT
- Uses appropriate LoRA parameters (dim=128, alpha=128)
- Separates model and tokenizer (using instruct tokenizer for base model)
- Configures appropriate training parameters and dataset
tests/functional/sft_megatron_lora.sh (1)
1-51: LGTM! Well-structured functional test.The functional test script is properly structured with:
- Cleanup trap to ensure checkpoint directory removal
- Proper environment setup including git safe.directory configuration
- Coverage tracking for code coverage metrics
- Appropriate test configuration with small model (Qwen3-0.6B) for fast testing
- Metric validation to ensure training loss decreases as expected
- Follows repository conventions for cd and argument passing
docs/guides/sft.md (2)
171-171: LGTM! Clear backend support statement.The updated note correctly clarifies that LoRA is supported with DTensor v2 and Megatron backends, with DTensor as the default. The requirement for
_v2=trueis important for users to know.
174-210: LGTM! Comprehensive DTensor configuration documentation.The DTensor configuration section provides:
- Clear YAML configuration example with all parameters
- Detailed parameter descriptions with typical values
- Important note about Triton kernel limitation with TP > 1
- Simple usage example
This will help users configure DTensor LoRA correctly.
| ### Megatron Configuration Parameters | ||
|
|
||
| The LoRA configuration is specified under the `policy.megatron_cfg.peft` section: | ||
|
|
||
| ```yaml | ||
| policy: | ||
| megatron_cfg: | ||
| peft: | ||
| enabled: false # Set to True to enable LoRA fine-tuning | ||
| target_modules: [] # List of module names to apply LoRA, defaults to all linear layers | ||
| exclude_modules: [] # List of module names not to apply LoRa. | ||
| dim: 32 # LoRA rank (r): controls adaptation capacity | ||
| alpha: 32 # LoRA scaling factor (effective lr = alpha/dim) | ||
| dropout: 0.0 # Dropout probability for LoRA layers | ||
| dropout_position: "pre" # Dropout position: "pre" or "post" | ||
| lora_A_init_method: "xavier" # Initialization method for lora A: "xavier" or "uniform" | ||
| lora_B_init_method: "zero" # Initialization method for lora B: "zero" | ||
| a2a_experimental: false # Enables the experimental All-to-All (A2A) communication strategy. | ||
| lora_dtype: None # Weight's dtype | ||
| ``` | ||
|
|
||
| ### Megatron Parameter Details | ||
| - **`enabled`** (bool): Whether to enable LoRA training | ||
| - **`target_modules`** (list): Specific module names to apply LoRA. Defaults to all linear layers if the list is left empty. Example: ['linear_qkv', 'linear_proj', 'linear_fc1', 'linear_fc2']. | ||
| - 'linear_qkv': Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention. | ||
| - 'linear_proj': Apply LoRA to the linear layer used for projecting the output of self-attention. | ||
| - 'linear_fc1': Apply LoRA to the first fully-connected layer in MLP. | ||
| - 'linear_fc2': Apply LoRA to the second fully-connected layer in MLP. | ||
| Target modules can also contain wildcards. For example, you can specify target_modules=['*.layers.0.*.linear_qkv', '*.layers.1.*.linear_qkv'] to add LoRA to only linear_qkv on the first two layers. | ||
| - **`exclude_modules`** (List[str], optional): A list of module names not to apply LoRa. It will match all nn.Linear & nn.Linear-adjacent modules whose name does not match any string in exclude_modules. If used, will require target_modules to be empty list or None. | ||
| - **`dim`** (int): LoRA rank (r). Lower values = fewer parameters but less capacity. Typical: 4, 8, 16, 32, 64 | ||
| - **`alpha`** (int): LoRA scaling factor. Effective learning rate multiplier = `alpha/dim`. Typical: 16, 32, 64 | ||
| - **`dropout`** (float): Dropout probability for regularization, defaults to 0.0 | ||
| - **`dropout_position`** (str): Apply dropout before ("pre") or after ("post") LoRA | ||
| - **`lora_A_init`** (str): Initialization method for lora_A (choices: ['xavier', 'uniform']), defaults to xavier. | ||
| - **`lora_B_init`** (str): Initialization method for the low-rank matrix B. Defaults to "zero". | ||
| - **`a2a_experimental`** (bool): Enables the experimental All-to-All (A2A) communication strategy. Defaults to False. | ||
| - **`lora_dtype`** (torch.dtype): Weight's dtype, by default will use orig_linear's but if they are quantized weights (e.g. 4bit) needs to be specified explicitly. | ||
|
|
||
| ### Megatron Example Usage | ||
| The config uses DTensor by default, so the megatron backend needs to be explicitly enabled. | ||
| ```sh | ||
| uv run examples/run_sft.py \ | ||
| --config examples/configs/sft.yaml \ | ||
| policy.dtensor_cfg.enabled=false \ | ||
| policy.megatron_cfg.enabled=true \ | ||
| policy.megatron_cfg.peft.enabled=true | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Fix parameter name inconsistency in documentation.
On Line 246, the parameter details section refers to lora_A_init but the configuration block above (line 227) correctly uses lora_A_init_method. The documentation should use consistent naming.
📝 Proposed fix for parameter name consistency
- **`dropout`** (float): Dropout probability for regularization, defaults to 0.0
- **`dropout_position`** (str): Apply dropout before ("pre") or after ("post") LoRA
-- **`lora_A_init`** (str): Initialization method for lora_A (choices: ['xavier', 'uniform']), defaults to xavier.
+- **`lora_A_init_method`** (str): Initialization method for lora_A (choices: ['xavier', 'uniform']), defaults to xavier.
- **`lora_B_init`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".Note: Line 247 also uses lora_B_init which is consistent with the config block (line 228: lora_B_init_method). For full consistency, consider:
-- **`lora_B_init`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".
+- **`lora_B_init_method`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".🤖 Prompt for AI Agents
In @docs/guides/sft.md around lines 212 - 260, The docs have inconsistent
parameter names: the YAML uses lora_A_init_method and lora_B_init_method but the
"Megatron Parameter Details" uses lora_A_init and lora_B_init; update the
parameter names in the details section to lora_A_init_method and
lora_B_init_method (and keep their allowed values/descriptions unchanged) so
they match the config block and examples (reference the symbols
lora_A_init_method and lora_B_init_method).
| if cfg.peft is not None: | ||
| should_load_checkpoint = cfg.checkpoint.load is not None and checkpoint_exists( | ||
| cfg.checkpoint.load | ||
| ) | ||
| if should_load_checkpoint: | ||
| # The finetune toggle is explicitly set to True in order to avoid loading optimizer and RNG states | ||
| # This is switched off here in order to load these states from the checkpoint | ||
| cfg.checkpoint.finetune = False | ||
| else: | ||
| should_load_checkpoint = ( | ||
| cfg.checkpoint.load is not None and checkpoint_exists(cfg.checkpoint.load) | ||
| ) or ( | ||
| cfg.checkpoint.pretrained_checkpoint is not None | ||
| and checkpoint_exists(cfg.checkpoint.pretrained_checkpoint) | ||
| ) |
There was a problem hiding this comment.
Clarify the checkpoint loading comment.
The comment on lines 345-347 is confusing:
# The finetune toggle is explicitly set to True in order to avoid loading optimizer and RNG states
# This is switched off here in order to load these states from the checkpoint
cfg.checkpoint.finetune = FalseThe comment states "set to True to avoid loading" but then "switched off to load," which seems contradictory. Typically:
finetune=Truemeans skip loading optimizer/RNG states (starting fresh fine-tuning)finetune=Falsemeans load optimizer/RNG states (resuming training)
The code sets finetune=False, which should enable loading of optimizer/RNG states when resuming PEFT training. Consider clarifying the comment to something like:
# When resuming PEFT training from a checkpoint, we need to load optimizer and RNG states.
# Set finetune=False to enable loading these states from the checkpoint.
cfg.checkpoint.finetune = False🤖 Prompt for AI Agents
In @nemo_rl/models/policy/workers/megatron_policy_worker.py around lines 340 -
354, The comment about toggling finetune is contradictory; update the comment
near the checkpoint-loading logic in megatron_policy_worker (around the block
that sets should_load_checkpoint when cfg.peft is not None) to clearly state
that setting cfg.checkpoint.finetune = False causes optimizer and RNG states to
be loaded (i.e., resumes training) when a PEFT checkpoint is present; replace
the confusing lines with a concise note like "When resuming PEFT training from a
checkpoint, set cfg.checkpoint.finetune = False to enable loading optimizer and
RNG states from the checkpoint."
beep boop [🤖]: Hi @arendu 👋,
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.