refactor: Introduce BasePolicyWorker by ashors1 · Pull Request #1585 · NVIDIA-NeMo/RL

ashors1 · 2025-12-01T23:12:28Z

What does this PR do ?

Adds BasePolicyWorker class to enforce common APIs between DTensor and Megatron backends

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Release Notes

Chores
- Reorganized policy worker modules into a dedicated subdirectory for improved code structure and maintainability.
- Consolidated shared policy worker functionality into a common base implementation to reduce code duplication across worker implementations.
- Updated internal module references and configuration files to reflect the new directory structure.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: ashors1 <ashors@nvidia.com>

…icy-worker

github-actions · 2025-12-01T23:12:56Z

ℹ️ File Consistency Check

Check based on commit: a828cd2 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai · 2025-12-01T23:15:27Z

📝 Walkthrough

Walkthrough

Policy worker files are reorganized into a workers subdirectory. A new BasePolicyWorker abstract base class consolidates shared utilities for distributed setup, GPU info, ZMQ IPC, and profiling. Existing workers are updated to inherit from this base and have redundant methods removed. All references throughout the codebase are updated accordingly.

Changes

Cohort / File(s)	Summary
Base class abstraction `nemo_rl/models/policy/workers/base_policy_worker.py`	Introduces new abstract `BasePolicyWorker` class with shared methods for distributed setup, GPU/memory utilities, ZMQ IPC initialization, GPU profiling, and abstract methods for training, logprob computation, weight streaming, and checkpoint management.
Worker inheritance refactoring `nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`, `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Updates workers to inherit from `BasePolicyWorker`. Removes duplicate methods now provided by base class (lifecycle, device utilities, memory management, profiling helpers).
Policy worker selection `nemo_rl/models/policy/lm_policy.py`	Updates three internal worker class path strings to reflect new module structure under `workers` subpackage for MegatronPolicyWorker, DTensorPolicyWorkerV2, and DTensorPolicyWorker.
Runtime registry updates `nemo_rl/distributed/ray_actor_environment_registry.py`	Updates actor-to-executable mapping with new class paths for DTensorPolicyWorker, DTensorPolicyWorkerV2, and MegatronPolicyWorker under `workers` subpackage. Other registry entries unchanged.
Configuration and workflow paths `.github/workflows/_automodel_integration_check.yml`, `docs/fp8.md`, `tests/unit/_plugins/remote_select.py`, `tests/unit/environments/test_reward_model_environment.py`	Updates file path references and module paths in workflow checks, documentation traces, test examples, and runtime class resolution to reflect new `workers` subdirectory structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring extra attention:

Verify all abstract methods in BasePolicyWorker are correctly implemented by inheriting worker classes
Confirm removed methods from worker classes are functionally provided by the base class
Validate consistency of all registry and path updates across configuration files, ensuring no stale references remain
Check that method signatures and return types in base class match usage patterns in inheriting classes

Possibly related PRs

cp: fix: enhancing non-colocated refit performance by having inclusive comm group (1264) into r0.4.0 #1311: Introduces/centralizes init_collective with train_world_size parameter via new BasePolicyWorker class
cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0 #1398: Modifies MegatronPolicyWorker implementation including get_topk_logits and imports
feat: refit refactoring with zmq and overlapping #1267: Adds ZMQ-based IPC streaming methods (stream_weights_via_ipc_zmq, get_zmq_address) to policy workers

Suggested reviewers

terrykong
parthchadha
chtruong814

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major refactoring of policy worker classes but lacks documented test results, performance metrics, or regression testing evidence in the description.	Add test results documenting that refactored worker classes maintain backward compatibility and produce identical results, including unit and functional test outcomes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor: Introduce BasePolicyWorker' accurately captures the main change—introducing a new base class for policy workers to enforce common APIs.
Docstring Coverage	✅ Passed	Docstring coverage is 89.80% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ashors/base-policy-worker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-02T17:30:27Z

ℹ️ File Consistency Check

Check based on commit: 547f360 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (2)

572-813: Dummy microbatch handling in train() can corrupt metrics and losses

Inside the microbatch loop, dummy batches created for sequence packing are not cleanly excluded from metric aggregation:

loss_metrics and num_valid_samples are only set inside if mb_idx < iterator_len:.
For dummy batches (mb_idx >= iterator_len), you do loss *= 0 but still later run:
```
if num_valid_samples > 0:
    mb_losses.append(loss.item())
    all_mb_metrics.append(loss_metrics)
```
which reuses the previous real batch’s loss_metrics and num_valid_samples, effectively duplicating metrics and adding extra zero‑loss entries.

This skews all_mb_metrics, per‑GB loss accumulation, and the final global_loss, especially when there are many padding/dummy batches.

Here’s a minimal, localized fix that keeps gradients zeroed for dummy batches but cleanly skips their metrics:

@@
-                for mb_idx, mb in enumerate(
-                    itertools.chain(mb_iterator, dummy_iterator)
-                ):
+                for mb_idx, mb in enumerate(
+                    itertools.chain(mb_iterator, dummy_iterator)
+                ):
@@
-                    with torch.autocast(device_type="cuda", dtype=self.dtype):
+                    with torch.autocast(device_type="cuda", dtype=self.dtype):
@@
-                        # skip the update for dummy batches
-                        if mb_idx < iterator_len:
-                            ## scale by the number of global batches so we get the correct
-                            ## value when summing metrics across all microbatches
-                            for k in loss_metrics.keys():
-                                loss_metrics[k] /= num_global_batches
-                            num_valid_samples = loss_metrics["num_valid_samples"]
-                            loss_metrics["lr"] = self.optimizer.param_groups[0]["lr"]
-                            loss_metrics["global_valid_seqs"] = global_valid_seqs.item()
-                            loss_metrics["global_valid_toks"] = global_valid_toks.item()
-                        else:
-                            loss *= 0
-
-                        # Backward pass
+                        # skip the update for dummy batches
+                        is_dummy_mb = mb_idx >= iterator_len
+                        if not is_dummy_mb:
+                            # scale by the number of global batches so we get the correct
+                            # value when summing metrics across all microbatches
+                            for k in loss_metrics.keys():
+                                loss_metrics[k] /= num_global_batches
+                            num_valid_samples = loss_metrics["num_valid_samples"]
+                            loss_metrics["lr"] = self.optimizer.param_groups[0]["lr"]
+                            loss_metrics["global_valid_seqs"] = global_valid_seqs.item()
+                            loss_metrics["global_valid_toks"] = global_valid_toks.item()
+                        else:
+                            # Ensure dummy microbatches contribute neither gradients nor metrics
+                            num_valid_samples = 0
+                            loss *= 0
+
+                        # Backward pass
@@
-                    if num_valid_samples > 0:
+                    if (mb_idx < iterator_len) and (num_valid_samples > 0):
                         mb_losses.append(loss.item())
                         all_mb_metrics.append(loss_metrics)

This keeps the original gradient semantics (dummy batches still backpropagate zero) while ensuring metrics and loss aggregation only use real microbatches.

1276-1289: score() drops temperature scaling for non‑DTensor logits

In score() you correctly apply temperature scaling:

if not hasattr(outputs, "logits"):
    logits = self.model.lm_head(outputs.last_hidden_state)
else:
    logits = outputs.logits
# Apply temperature scaling
logits = self._apply_temperature_scaling(logits)

But immediately after you do:

if isinstance(logits, DTensor):
    logits = logits.to(torch.float32)
else:
    logits = outputs.logits.to(torch.float32)

For the non‑DTensor case this re‑derives logits from outputs.logits, discarding the temperature scaling that was just applied. It also relies on outputs outside the autocast block unnecessarily.

You can fix this by always casting from the already‑scaled logits tensor:

-                if isinstance(logits, DTensor):
-                    logits = logits.to(torch.float32)
-                else:
-                    logits = outputs.logits.to(torch.float32)
+                # Always cast from the already temperature‑scaled logits
+                if isinstance(logits, DTensor):
+                    logits = logits.to(torch.float32)
+                else:
+                    logits = logits.to(torch.float32)

Optionally del outputs right after extracting logits to free memory earlier.

🧹 Nitpick comments (3)

tests/unit/environments/test_reward_model_environment.py (1)

74-81: DTensor worker path selection looks consistent with registry and Policy

Switching reward_model_py_executable_class to the new workers subpackage and gating between V2/non‑V2 on dtensor_cfg["_v2"] matches the updated registry and Policy worker path selection. Consider adding/keeping a test that exercises the _v2=False branch as well so both worker classes stay covered.

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (1)

465-476: init_collective duplicates BasePolicyWorker logic

DTensorPolicyWorkerV2.init_collective() re‑implements exactly the same NCCL communicator setup that already exists in BasePolicyWorker.init_collective.

To reduce duplication and keep behavior consistent if the base implementation evolves, consider either:

Removing this override entirely and using the base method, or

Having this method delegate directly to super().init_collective(...) if you want to keep the docstring locally.
nemo_rl/models/policy/workers/base_policy_worker.py (1)
155-161: Consider adding return type hint for context manager.

The docstring indicates this is a context manager, but the signature lacks a return type hint. Based on the subclass implementations in the relevant snippets, this should return Generator[None, None, None].
+from typing import Generator
+
 @abstractmethod
-def use_reference_model(self):
+def use_reference_model(self) -> Generator[None, None, None]:
     """Context manager that temporarily swaps the reference model and active model.
     On entry: Moves model to CPU, moves reference_model to CUDA. Swaps the references
     On exit: Restores original references and re-flips cuda/cpu
     """
     ...

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66d80e6 and 547f360.

📒 Files selected for processing (9)

.github/workflows/_automodel_integration_check.yml (1 hunks)
docs/fp8.md (1 hunks)
nemo_rl/distributed/ray_actor_environment_registry.py (1 hunks)
nemo_rl/models/policy/lm_policy.py (2 hunks)
nemo_rl/models/policy/workers/base_policy_worker.py (1 hunks)
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (3 hunks)
nemo_rl/models/policy/workers/megatron_policy_worker.py (2 hunks)
tests/unit/_plugins/remote_select.py (2 hunks)
tests/unit/environments/test_reward_model_environment.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
nemo_rl/distributed/ray_actor_environment_registry.py
tests/unit/environments/test_reward_model_environment.py
nemo_rl/models/policy/workers/base_policy_worker.py
nemo_rl/models/policy/workers/megatron_policy_worker.py
tests/unit/_plugins/remote_select.py
nemo_rl/models/policy/lm_policy.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
nemo_rl/distributed/ray_actor_environment_registry.py
nemo_rl/models/policy/workers/base_policy_worker.py
nemo_rl/models/policy/workers/megatron_policy_worker.py
nemo_rl/models/policy/lm_policy.py

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
nemo_rl/distributed/ray_actor_environment_registry.py
.github/workflows/_automodel_integration_check.yml
tests/unit/environments/test_reward_model_environment.py
docs/fp8.md
nemo_rl/models/policy/workers/base_policy_worker.py
nemo_rl/models/policy/workers/megatron_policy_worker.py
tests/unit/_plugins/remote_select.py
nemo_rl/models/policy/lm_policy.py

**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
nemo_rl/distributed/ray_actor_environment_registry.py
tests/unit/environments/test_reward_model_environment.py
nemo_rl/models/policy/workers/base_policy_worker.py
nemo_rl/models/policy/workers/megatron_policy_worker.py
tests/unit/_plugins/remote_select.py
nemo_rl/models/policy/lm_policy.py

docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section

Files:

docs/fp8.md

🧬 Code graph analysis (3)

nemo_rl/distributed/ray_actor_environment_registry.py (1)

nemo_rl/distributed/virtual_cluster.py (1)

PY_EXECUTABLES (43-59)

nemo_rl/models/policy/workers/base_policy_worker.py (5)

nemo_rl/distributed/batched_data_dict.py (1)

BatchedDataDict (75-860)

nemo_rl/models/policy/interfaces.py (2)

LogprobOutputSpec (25-28)

ReferenceLogprobOutputSpec (31-34)

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (3)

init_collective (465-475)

use_reference_model (1581-1607)

train (489-865)

nemo_rl/utils/nvml.py (1)

get_device_uuid (55-77)

nemo_rl/models/policy/workers/megatron_policy_worker.py (2)

use_reference_model (1358-1406)

train (900-1150)

nemo_rl/models/policy/workers/megatron_policy_worker.py (1)

nemo_rl/models/policy/workers/base_policy_worker.py (1)

BasePolicyWorker (15-218)

🪛 GitHub Actions: Copyright check

nemo_rl/models/policy/workers/base_policy_worker.py

[error] 1-1: Copyright check failed: Found files with missing copyright notices.

🪛 Ruff (0.14.7)

nemo_rl/models/policy/workers/base_policy_worker.py

19-19: Unused method argument: train_world_size

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Lint check
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (12)

docs/fp8.md (1)

73-80: Updated error trace path matches new Megatron worker module

The FP8 training error example now points at nemo_rl.models.policy.workers.megatron_policy_worker.MegatronPolicyWorker, which matches the new workers subpackage layout and class name. No further changes needed.

tests/unit/_plugins/remote_select.py (1)

63-74: Remote‑select example updated to new workers layout

The example nodeid and mapped file path now reference models/policy/workers/dtensor_policy_worker.py, which aligns with the new directory structure. No behavioral impact.

.github/workflows/_automodel_integration_check.yml (1)

133-139: Automodel consistency check paths correctly point to workers subpackage

The DTensor worker file variables now reference nemo_rl/models/policy/workers/dtensor_policy_worker*.py, matching the code move. The synchronization logic and messages remain valid.

nemo_rl/distributed/ray_actor_environment_registry.py (1)

27-46: Actor registry updates align with new worker locations and backends

The registry now maps:

workers.dtensor_policy_worker.DTensorPolicyWorker → VLLM_EXECUTABLE (as per existing vLLM coupling comment)

workers.dtensor_policy_worker_v2.DTensorPolicyWorkerV2 → PY_EXECUTABLES.AUTOMODEL

workers.megatron_policy_worker.MegatronPolicyWorker → MCORE_EXECUTABLE

These FQNs match the new module structure and the corresponding backends. The reward‑model environment test and Policy also use the same strings.

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (1)

88-101: BasePolicyWorker integration for DTensorPolicyWorkerV2 looks structurally sound

Switching DTensorPolicyWorkerV2 to subclass BasePolicyWorker reuses shared logic for ZMQ setup, GPU info, free‑memory queries, reference‑policy logprobs, and profiling. The @ray.remote(... ) # pragma: no cover annotation also matches the coverage guideline for Ray actors under nemo_rl/.

nemo_rl/models/policy/lm_policy.py (1)

82-125: Worker class paths and _v2 switch match new workers layout

The Policy constructor now selects:

Megatron: "nemo_rl.models.policy.workers.megatron_policy_worker.MegatronPolicyWorker"

DTensor v2: "nemo_rl.models.policy.workers.dtensor_policy_worker_v2.DTensorPolicyWorkerV2" when dtensor_cfg["_v2"] is true

DTensor v1: "nemo_rl.models.policy.workers.dtensor_policy_worker.DTensorPolicyWorker" otherwise

These strings are consistent with the new module locations and the updated actor registry and tests.

If _v2 and any new dtensor_cfg keys (e.g., those used in DTensorPolicyWorkerV2 for cache clearing) were just introduced, ensure their TypedDict definitions and example YAML configs are updated to document purpose and valid values.

nemo_rl/models/policy/workers/megatron_policy_worker.py (1)

120-127: MegatronPolicyWorker cleanly migrates to BasePolicyWorker

Importing and subclassing BasePolicyWorker centralizes:

collective initialization (init_collective and model_update_group),

basic liveness/memory/profiling utilities,

ZMQ socket setup and get_free_memory_bytes, and

reference‑policy logprobs via get_reference_policy_logprobs + use_reference_model.

Within this worker you already set self.rank before any potential init_collective use and rely on maybe_init_zmq()/model_update_group only in methods that are now implemented in the base, so the refactor looks consistent with the Policy- and registry‑side changes.

nemo_rl/models/policy/workers/base_policy_worker.py (5)

18-36: Unused train_world_size parameter.

The train_world_size parameter is declared but never used in the method body. The docstring mentions it's "used in inference cluster" but that usage isn't implemented here.

If this parameter is reserved for future use or API consistency with subclasses, consider adding a comment or using _ = train_world_size to explicitly acknowledge it's intentionally unused. Otherwise, remove it.

38-49: LGTM with a note on implicit attributes.

These utility methods are straightforward. Note that get_gpu_info assumes self.model exists, which is an implicit contract subclasses must fulfill. This is acceptable for an ABC but could benefit from a class-level docstring documenting required attributes.

67-79: LGTM - ZMQ initialization is reasonable.

The lazy initialization pattern and timeout/linger settings are appropriate. Note that REQ sockets with bind() is an unusual pattern (typically REQ connects and REP binds), but this may be intentional for your IPC architecture.

88-93: LGTM!

The shutdown method correctly cleans up ZMQ resources with proper existence checks.

129-218: LGTM - Abstract interface is well-defined.

The abstract methods provide a clean contract for policy worker implementations. The signatures align with the existing subclass implementations shown in the relevant code snippets.

nemo_rl/models/policy/workers/base_policy_worker.py

Signed-off-by: ashors1 <ashors@nvidia.com>

jgerh

Completed tech pubs review and provided a few copyedits.

docs/fp8.md

Signed-off-by: ashors1 <ashors@nvidia.com>

…icy-worker

github-actions · 2025-12-03T17:35:42Z

ℹ️ File Consistency Check

Check based on commit: 39c5f36 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

nemo_rl/models/policy/workers/megatron_policy_worker.py

nemo_rl/models/policy/workers/dtensor_policy_worker.py

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-03T20:54:48Z

ℹ️ File Consistency Check

Check based on commit: a93db16 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…icy-worker

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-04T04:45:35Z

ℹ️ File Consistency Check

Check based on commit: 678e0ee (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-04T05:02:20Z

ℹ️ File Consistency Check

Check based on commit: 81b085b (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-04T05:06:53Z

ℹ️ File Consistency Check

Check based on commit: 6a2ee7c (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions · 2025-12-04T17:21:51Z

ℹ️ File Consistency Check

Check based on commit: 0b7e18b (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com>

ashors1 added 2 commits December 1, 2025 15:09

introduce BasePolicyWorker

d1b92f6

Signed-off-by: ashors1 <ashors@nvidia.com>

Merge branch 'main' of github.com:NVIDIA-NeMo/RL into ashors/base-pol…

a828cd2

…icy-worker

ashors1 requested review from a team as code owners December 1, 2025 23:12

github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Dec 1, 2025

ashors1 marked this pull request as draft December 1, 2025 23:12

fix imports

547f360

Signed-off-by: ashors1 <ashors@nvidia.com>

ashors1 marked this pull request as ready for review December 2, 2025 17:30

ashors1 requested review from hemildesai and yaoyu-33 December 2, 2025 17:30

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

nemo_rl/models/policy/workers/base_policy_worker.py Show resolved Hide resolved

nemo_rl/models/policy/workers/base_policy_worker.py Outdated Show resolved Hide resolved

nemo_rl/models/policy/workers/base_policy_worker.py Show resolved Hide resolved

terrykong reviewed Dec 2, 2025

View reviewed changes

nemo_rl/models/policy/workers/base_policy_worker.py Outdated Show resolved Hide resolved

nemo_rl/models/policy/workers/base_policy_worker.py Outdated Show resolved Hide resolved

ashors1 added 2 commits December 2, 2025 12:48

address comments

d8054fb

Signed-off-by: ashors1 <ashors@nvidia.com>

BasePolicyWorker --> AbstractPolicyWorker

6ec6706

Signed-off-by: ashors1 <ashors@nvidia.com>

jgerh reviewed Dec 2, 2025

View reviewed changes

docs/fp8.md Outdated Show resolved Hide resolved

docs/fp8.md Outdated Show resolved Hide resolved

docs/fp8.md Outdated Show resolved Hide resolved

docs/fp8.md Outdated Show resolved Hide resolved

docs/fp8.md Outdated Show resolved Hide resolved

ashors1 added 3 commits December 2, 2025 15:36

add copyright header

da556b8

Signed-off-by: ashors1 <ashors@nvidia.com>

address comments

7722704

Signed-off-by: ashors1 <ashors@nvidia.com>

Merge branch 'main' of github.com:NVIDIA-NeMo/RL into ashors/base-pol…

39c5f36

…icy-worker

terrykong reviewed Dec 3, 2025

View reviewed changes

nemo_rl/models/policy/workers/megatron_policy_worker.py Show resolved Hide resolved

nemo_rl/models/policy/workers/dtensor_policy_worker.py Show resolved Hide resolved

refactor dtensor policy worker v1

a93db16

Signed-off-by: ashors1 <ashors@nvidia.com>

Merge branch 'main' of github.com:NVIDIA-NeMo/RL into ashors/base-pol…

8c975d2

…icy-worker

small bug fixes

678e0ee

Signed-off-by: ashors1 <ashors@nvidia.com>

lint

81b085b

Signed-off-by: ashors1 <ashors@nvidia.com>

lint

6a2ee7c

Signed-off-by: ashors1 <ashors@nvidia.com>

terrykong previously approved these changes Dec 4, 2025

View reviewed changes

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Dec 4, 2025

terrykong enabled auto-merge (squash) December 4, 2025 06:53

terrykong temporarily deployed to nemo-ci December 4, 2025 06:53 — with GitHub Actions Inactive

terrykong linked an issue Dec 4, 2025 that may be closed by this pull request

MegatronPolicyWorker refactor #1593

Closed

3 tasks

terrykong temporarily deployed to nemo-ci December 4, 2025 07:25 — with GitHub Actions Inactive

fix unit tests

0b7e18b

Signed-off-by: ashors1 <ashors@nvidia.com>

ashors1 dismissed terrykong’s stale review via 0b7e18b December 4, 2025 17:21

terrykong approved these changes Dec 4, 2025

View reviewed changes

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 4, 2025

terrykong temporarily deployed to nemo-ci December 4, 2025 17:28 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci December 4, 2025 19:59 — with GitHub Actions Inactive

terrykong merged commit a99bc26 into main Dec 4, 2025
40 of 41 checks passed

terrykong deleted the ashors/base-policy-worker branch December 4, 2025 21:40

coderabbitai bot mentioned this pull request Dec 12, 2025

feat: Support Ray Compiled Graph for SFT #1612

Open

10 tasks

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

refactor: Introduce BasePolicyWorker (NVIDIA-NeMo#1585)

9197359

Signed-off-by: ashors1 <ashors@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026

refactor: Introduce BasePolicyWorker (NVIDIA-NeMo#1585)

c793b4f

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

refactor: Introduce BasePolicyWorker (NVIDIA-NeMo#1585)

06c7efc

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

refactor: Introduce BasePolicyWorker (#1585)

880b6d1

Signed-off-by: ashors1 <ashors@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

refactor: Introduce BasePolicyWorker (#1585)

596b902

Signed-off-by: ashors1 <ashors@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

refactor: Introduce BasePolicyWorker (#1585)

c026f77

Signed-off-by: ashors1 <ashors@nvidia.com>

Conversation

ashors1 commented Dec 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions bot commented Dec 1, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

coderabbitai bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Dec 2, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgerh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 3, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 3, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ashors1 commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 1, 2025 •

edited

Loading