cp: `feat: DTensorPolicyV2 GPT-OSS SFT support (1470)` into `r0.5.0` by chtruong814 · Pull Request #1690 · NVIDIA-NeMo/RL

chtruong814 · 2025-12-23T05:12:46Z

beep boop [🤖]: Hi @adil-a 👋,

we've cherry picked #1470 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Release Notes

New Features
- Added checkpoint management system for distributed model training with support for LORA/PEFT configurations
- Enhanced DTensor v2 with dynamic backend selection and CPU offload support
- Added SFT training configuration for GPT-OSS 20B with expert parallelism
- Added Transformer Engine runtime patching utilities
Bug Fixes
- Fixed LoRA initialization to use standardized methods
- Corrected NeMo automodel import paths
Chores
- Updated dependencies: transformer-engine, deep_ep, and GPU acceleration libraries
- Reorganized distributed test suite with improved class-based structure
- Expanded test coverage for checkpoint management and configuration validation

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

github-actions · 2025-12-23T05:13:11Z

⚠️ File Consistency Check

Check based on commit: 0f41577 (PR #1690 from cherry-pick-1470-r0.5.0)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-23T05:13:42Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 0f41577 (PR #1690 from cherry-pick-1470-r0.5.0)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (r0.5.0 branch): https://github.com/NVIDIA-NeMo/Automodel/commits/910f4e0402ec3af0c3b8642639f0347732067630/
CURRENT (PR #1690 from cherry-pick-1470-r0.5.0): https://github.com/NVIDIA-NeMo/Automodel/commits/1d42deb98169fd94b54c714c0fe4bf308fe7115a/

Please ensure all submodule commits are fast-forwards of the r0.5.0 branch before merging.

coderabbitai · 2025-12-23T05:17:38Z

📝 Walkthrough

Walkthrough

This PR introduces Automodel and DeepEP integration for DTensor-based LLM training, including a new checkpoint management system, refactored Transformer-Engine patching, and dynamic attention implementation selection. Configuration structures for Automodel backends and new test coverage for checkpoint management and policy worker flows are added.

Changes

Cohort / File(s)	Summary
Automodel Configuration & Types `nemo_rl/models/policy/__init__.py`, `examples/configs/recipes/llm/sft-gpt-oss-20b-1n8g-fsdp8ep8-automodel.yaml`	Introduces TypedDict structures for AutomodelBackendConfig and AutomodelKwargs; extends DTensorConfig with optional automodel_kwargs field. New YAML recipe specifies Automodel training policy with FSDP8-EP8 configuration for GPT-OSS 20B.
Policy Worker Initialization & Warnings `nemo_rl/models/policy/lm_policy.py`	Adds runtime warning when TORCH_CUDA_ARCH_LIST environment variable is absent, noting requirement for DeepEP in DTensorPolicyWorker V2.
Automodel Import Path Updates `nemo_rl/models/policy/utils.py`	Updates import path for NeMo Automodel classes from `nemo_automodel.components._transformers.auto_model` to `nemo_automodel._transformers.auto_model`.
Transformer-Engine Runtime Patching `nemo_rl/models/policy/workers/patches.py`, `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Introduces new patches module with `_get_transformer_engine_file` and `apply_transformer_engine_patch` utilities. Refactors MegatronPolicyWorker to use externalized patching instead of internal implementation.
DTensor Policy Worker V2 Major Refactor `nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`	Substantial changes: (1) Early transformer-engine patching via `apply_transformer_engine_patch`; (2) Dynamic attention implementation selection based on sequence packing and context-parallel size; (3) Automodel kwargs augmentation with backend configuration and use_liger_kernel flag; (4) Integration with new AutomodelCheckpointManager for checkpoint operations; (5) FSDP2Manager-based device mesh and parallelization flow; (6) Gradient scaling and clipping with `scale_grads_and_clip_grad_norm` wrapper; (7) Precision handling via STRING_TO_DTYPE mapping.
Checkpoint Management System `nemo_rl/utils/automodel_checkpoint.py`	New AutomodelCheckpointManager class wrapping nemo_automodel's Checkpointer. Provides object-oriented checkpoint interface with rank-aware initialization, PEFT/LoRA configuration support, model state dict key tracking, and checkpoint addon management (ConsolidatedHFAddon, PeftAddon). Replaces functional checkpoint API.
Dependencies & Build Configuration `pyproject.toml`, `pyrefly.toml`, `nemo_rl/utils/venvs.py`	Adds transformer-engine[pytorch]==2.8.0, nv-grouped-gemm, and deep_ep dependencies to automodel group. Updates deep_ep git revision in vllm block. Moves shutil import to top-level in venvs.py. Updates pyrefly includes for patches.py and automodel_checkpoint.py modules.
Test Framework Updates `tests/unit/models/policy/test_dtensor_worker.py`, `tests/unit/models/policy/test_dtensor_worker_v2.py`	Reorganizes test_dtensor_worker.py with class-based test organization (TestSingleGPUCluster, TestTwoGPUCluster) and centralized _base_setup_impl helper. Expands test_dtensor_worker_v2.py with enhanced create_test_config signature (precision, expert_parallel_size, automodel_kwargs, checkpointing) and new create_test_batch helper.
New Test Suites `tests/unit/models/policy/test_automodel_types.py`, `tests/unit/models/policy/test_patches.py`, `tests/unit/utils/test_automodel_checkpoint.py`, `tests/test_suites/llm/sft-gpt-oss-20b-1n8g-fsdp8ep8-automodel.sh`, `tests/test_suites/nightly.txt`	Introduces unit tests for AutomodelBackendConfig TypedDict validation, Transformer-Engine patching logic (path resolution, patching application, module reload), and comprehensive AutomodelCheckpointManager functionality (distributed checkpoint save/load, format detection, PEFT handling). Adds integration test script for GPT-OSS 20B DeepEP training with metric checks.
LoRA Test Migration `tests/unit/models/dtensor/test_lora.py`	Removes internal _patched_init_lora_weights import and replaces with LinearLoRA.init_lora_weights method calls. Deletes test_lora_init_differs_from_upstream_buggy_version test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA-NeMo/RL#1023 — Introduces nemo-automodel checkpointing support and automodel_checkpoint module with wiring into DTensor/Policy worker workflows.
NVIDIA-NeMo/RL#1470 — Modifies DTensorPolicyWorkerV2, automodel checkpoint utilities, automodel import paths, and transformer-engine patching integration.
NVIDIA-NeMo/RL#1665 — Implements SDPA/attention backend selection (attn_impl/sdpa_method) computation in dtensor_policy_worker_v2.py for context-parallel handling.

Suggested labels

r0.5.0, CI:L1

Suggested reviewers

adil-a
terrykong
joyang-nv

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR description lacks test results, metrics, and convergence validation despite substantial changes to gradient scaling, loss scaling, attention selection, and checkpoint management logic.	Update PR description with test results, convergence validation, performance metrics, and configuration context used for testing the major refactoring.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly indicates this is a cherry-pick of DTensorPolicyV2 GPT-OSS SFT support (PR #1470) into the r0.5.0 branch, which accurately summarizes the main purpose of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 81.65% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-1470-r0.5.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/models/policy/test_dtensor_worker.py (1)
192-203: Same device inconsistency as in v2 tests.

sample_mask is created on CUDA (line 195) before the entire batch is moved to CPU (line 202). Consider creating it without specifying device.
🔎 Proposed fix
             **(
                 {
                     "labels": torch.randint(0, vocab_size, (batch_size, seq_len)),
-                    "sample_mask": torch.ones(batch_size).cuda(),
+                    "sample_mask": torch.ones(batch_size),
                 }
                 if mode == "train"
                 else {}
             ),

🧹 Nitpick comments (8)

nemo_rl/models/policy/lm_policy.py (1)
115-119: Add stacklevel parameter to warning for better debugging.

The warning correctly checks for TORCH_CUDA_ARCH_LIST and provides helpful guidance. However, adding stacklevel=2 will ensure the warning points to the caller's location rather than this line, making it easier to debug.
🔎 Proposed fix
 if "TORCH_CUDA_ARCH_LIST" not in os.environ:
     warnings.warn(
         "TORCH_CUDA_ARCH_LIST is not set. This is needed if using DeepEP in DTensorPolicyWorker V2. This variable is set in our container, but "
         "if you are running a custom container or baremetal, you may need to set this variable manually. Example: export TORCH_CUDA_ARCH_LIST='9.0 10.0'",
+        stacklevel=2,
     )
Based on static analysis hints.
tests/unit/models/policy/test_automodel_types.py (1)
20-25: Remove unnecessary noqa directive.

The static analysis indicates F401 (unused import) rule is not enabled, making the noqa: F401 directive unnecessary. The BackendConfig import is actually used at line 65, so even if the rule were enabled, this wouldn't be flagged.
🔎 Proposed fix
 try:
-    from nemo_automodel.components.moe.utils import BackendConfig  # noqa: F401
+    from nemo_automodel.components.moe.utils import BackendConfig

     NEMO_AUTOMODEL_AVAILABLE = True
tests/unit/utils/test_automodel_checkpoint.py (1)
92-115: Intentional exception swallowing for cleanup resilience.

The broad exception handling in _cleanup_dcp_planner_cache is appropriate here since this is a test cleanup helper that should not cause test failures. Consider adding a brief comment explaining this is intentional for test isolation.
🔎 Optional: Add explanatory comment
     except Exception:
-        pass
+        pass  # Cleanup should not fail tests; errors are non-critical
tests/unit/models/policy/test_patches.py (1)
185-216: Consider using _ for intentionally unused parameter.

The path parameter in mock_open_func is unused since the mock only needs to differentiate by mode. Using _ or _path would clarify intent.
🔎 Proposed fix
-        def mock_open_func(path, mode="r"):
+        def mock_open_func(_path, mode="r"):
             call_count[0] += 1
             if mode == "r":
                 mock_file_handle.read.return_value = self.UNPATCHED_CONTENT
             return mock_file_handle
nemo_rl/models/policy/workers/patches.py (1)
96-103: Consider moving imports to module level.

The importlib and sys imports inside the function could be moved to the top of the file for consistency with other imports.
🔎 Proposed fix
 import os
+import sys
+import importlib
 from importlib.util import find_spec
Then remove the local imports at lines 98-99.
nemo_rl/utils/automodel_checkpoint.py (1)
171-190: Accessing private _addons attribute is fragile.

This method directly manipulates self.checkpointer._addons, which is a private implementation detail of the Checkpointer class. If the underlying library changes this internal structure, this code will break silently.

Consider adding a comment explaining why this is necessary and/or wrapping in a try-except to handle potential API changes gracefully.
🔎 Proposed documentation
     def _rebuild_checkpointer_addons(self) -> None:
         """Rebuild the checkpointer's _addons list based on current config.

         The Checkpointer's _addons list is populated during __init__ based on config.
         When config changes (e.g., model_save_format or is_peft), we need to rebuild
         the addons list to match the new config.
+
+        Note: This accesses the private _addons attribute of Checkpointer.
+        This coupling is necessary because the Checkpointer doesn't expose
+        a public API to update addons after initialization.
         """
tests/unit/models/policy/test_dtensor_worker.py (2)
841-854: Use next(iter(...)) for cleaner single-element access.

Static analysis correctly suggests using next(iter(...)) instead of list(...)[0] for accessing the first element.
🔎 Proposed fix
-            param_sample = list(info["parameter_sample"].values())[0]
+            param_sample = next(iter(info["parameter_sample"].values()))
...
-        param_names = [list(info["parameter_sample"].keys())[0] for info in gpu_infos]
+        param_names = [next(iter(info["parameter_sample"].keys())) for info in gpu_infos]
...
-            param_device = list(info["parameter_sample"].values())[0]["device"]
+            param_device = next(iter(info["parameter_sample"].values()))["device"]
1093-1103: Rename unused loop variables to indicate intent.

Per static analysis, rename unused loop variables to underscore-prefixed names.
🔎 Proposed fix
-            for warmup_step in range(2):
+            for _warmup_step in range(2):
                 results = policy.train(data, loss_fn)

             # Measure FLOPS on 3 iterations
             print("Measuring FLOPS on 3 iterations...")
             time_begin = time.time()
-            for train_step in range(3):
+            for _train_step in range(3):
                 results = policy.train(data, loss_fn)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1a1e73 and 0f41577.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (21)

3rdparty/Automodel-workspace/Automodel
examples/configs/recipes/llm/sft-gpt-oss-20b-1n8g-fsdp8ep8-automodel.yaml
nemo_rl/models/policy/__init__.py
nemo_rl/models/policy/lm_policy.py
nemo_rl/models/policy/utils.py
nemo_rl/models/policy/workers/__init__.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
nemo_rl/models/policy/workers/megatron_policy_worker.py
nemo_rl/models/policy/workers/patches.py
nemo_rl/utils/automodel_checkpoint.py
nemo_rl/utils/venvs.py
pyproject.toml
pyrefly.toml
tests/test_suites/llm/sft-gpt-oss-20b-1n8g-fsdp8ep8-automodel.sh
tests/test_suites/nightly.txt
tests/unit/models/dtensor/test_lora.py
tests/unit/models/policy/test_automodel_types.py
tests/unit/models/policy/test_dtensor_worker.py
tests/unit/models/policy/test_dtensor_worker_v2.py
tests/unit/models/policy/test_patches.py
tests/unit/utils/test_automodel_checkpoint.py

🧰 Additional context used

📓 Path-based instructions (9)

examples/configs/recipes/**/*.yaml