Skip to content

refactor: Introduce BasePolicyWorker#1585

Merged
terrykong merged 14 commits intomainfrom
ashors/base-policy-worker
Dec 4, 2025
Merged

refactor: Introduce BasePolicyWorker#1585
terrykong merged 14 commits intomainfrom
ashors/base-policy-worker

Conversation

@ashors1
Copy link
Contributor

@ashors1 ashors1 commented Dec 1, 2025

What does this PR do ?

Adds BasePolicyWorker class to enforce common APIs between DTensor and Megatron backends

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • Chores
    • Reorganized policy worker modules into a dedicated subdirectory for improved code structure and maintainability.
    • Consolidated shared policy worker functionality into a common base implementation to reduce code duplication across worker implementations.
    • Updated internal module references and configuration files to reflect the new directory structure.

✏️ Tip: You can customize this high-level summary in your review settings.

@ashors1 ashors1 requested review from a team as code owners December 1, 2025 23:12
@github-actions github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Dec 1, 2025
@ashors1 ashors1 marked this pull request as draft December 1, 2025 23:12
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

ℹ️ File Consistency Check

Check based on commit: a828cd2 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

📝 Walkthrough

Walkthrough

Policy worker files are reorganized into a workers subdirectory. A new BasePolicyWorker abstract base class consolidates shared utilities for distributed setup, GPU info, ZMQ IPC, and profiling. Existing workers are updated to inherit from this base and have redundant methods removed. All references throughout the codebase are updated accordingly.

Changes

Cohort / File(s) Summary
Base class abstraction
nemo_rl/models/policy/workers/base_policy_worker.py
Introduces new abstract BasePolicyWorker class with shared methods for distributed setup, GPU/memory utilities, ZMQ IPC initialization, GPU profiling, and abstract methods for training, logprob computation, weight streaming, and checkpoint management.
Worker inheritance refactoring
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py, nemo_rl/models/policy/workers/megatron_policy_worker.py
Updates workers to inherit from BasePolicyWorker. Removes duplicate methods now provided by base class (lifecycle, device utilities, memory management, profiling helpers).
Policy worker selection
nemo_rl/models/policy/lm_policy.py
Updates three internal worker class path strings to reflect new module structure under workers subpackage for MegatronPolicyWorker, DTensorPolicyWorkerV2, and DTensorPolicyWorker.
Runtime registry updates
nemo_rl/distributed/ray_actor_environment_registry.py
Updates actor-to-executable mapping with new class paths for DTensorPolicyWorker, DTensorPolicyWorkerV2, and MegatronPolicyWorker under workers subpackage. Other registry entries unchanged.
Configuration and workflow paths
.github/workflows/_automodel_integration_check.yml, docs/fp8.md, tests/unit/_plugins/remote_select.py, tests/unit/environments/test_reward_model_environment.py
Updates file path references and module paths in workflow checks, documentation traces, test examples, and runtime class resolution to reflect new workers subdirectory structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring extra attention:

  • Verify all abstract methods in BasePolicyWorker are correctly implemented by inheriting worker classes
  • Confirm removed methods from worker classes are functionally provided by the base class
  • Validate consistency of all registry and path updates across configuration files, ensuring no stale references remain
  • Check that method signatures and return types in base class match usage patterns in inheriting classes

Possibly related PRs

Suggested reviewers

  • terrykong
  • parthchadha
  • chtruong814

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces major refactoring of policy worker classes but lacks documented test results, performance metrics, or regression testing evidence in the description. Add test results documenting that refactored worker classes maintain backward compatibility and produce identical results, including unit and functional test outcomes.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: Introduce BasePolicyWorker' accurately captures the main change—introducing a new base class for policy workers to enforce common APIs.
Docstring Coverage ✅ Passed Docstring coverage is 89.80% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ashors/base-policy-worker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 marked this pull request as ready for review December 2, 2025 17:30
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

ℹ️ File Consistency Check

Check based on commit: 547f360 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (2)

572-813: Dummy microbatch handling in train() can corrupt metrics and losses

Inside the microbatch loop, dummy batches created for sequence packing are not cleanly excluded from metric aggregation:

  • loss_metrics and num_valid_samples are only set inside if mb_idx < iterator_len:.

  • For dummy batches (mb_idx >= iterator_len), you do loss *= 0 but still later run:

    if num_valid_samples > 0:
        mb_losses.append(loss.item())
        all_mb_metrics.append(loss_metrics)

    which reuses the previous real batch’s loss_metrics and num_valid_samples, effectively duplicating metrics and adding extra zero‑loss entries.

This skews all_mb_metrics, per‑GB loss accumulation, and the final global_loss, especially when there are many padding/dummy batches.

Here’s a minimal, localized fix that keeps gradients zeroed for dummy batches but cleanly skips their metrics:

@@
-                for mb_idx, mb in enumerate(
-                    itertools.chain(mb_iterator, dummy_iterator)
-                ):
+                for mb_idx, mb in enumerate(
+                    itertools.chain(mb_iterator, dummy_iterator)
+                ):
@@
-                    with torch.autocast(device_type="cuda", dtype=self.dtype):
+                    with torch.autocast(device_type="cuda", dtype=self.dtype):
@@
-                        # skip the update for dummy batches
-                        if mb_idx < iterator_len:
-                            ## scale by the number of global batches so we get the correct
-                            ## value when summing metrics across all microbatches
-                            for k in loss_metrics.keys():
-                                loss_metrics[k] /= num_global_batches
-                            num_valid_samples = loss_metrics["num_valid_samples"]
-                            loss_metrics["lr"] = self.optimizer.param_groups[0]["lr"]
-                            loss_metrics["global_valid_seqs"] = global_valid_seqs.item()
-                            loss_metrics["global_valid_toks"] = global_valid_toks.item()
-                        else:
-                            loss *= 0
-
-                        # Backward pass
+                        # skip the update for dummy batches
+                        is_dummy_mb = mb_idx >= iterator_len
+                        if not is_dummy_mb:
+                            # scale by the number of global batches so we get the correct
+                            # value when summing metrics across all microbatches
+                            for k in loss_metrics.keys():
+                                loss_metrics[k] /= num_global_batches
+                            num_valid_samples = loss_metrics["num_valid_samples"]
+                            loss_metrics["lr"] = self.optimizer.param_groups[0]["lr"]
+                            loss_metrics["global_valid_seqs"] = global_valid_seqs.item()
+                            loss_metrics["global_valid_toks"] = global_valid_toks.item()
+                        else:
+                            # Ensure dummy microbatches contribute neither gradients nor metrics
+                            num_valid_samples = 0
+                            loss *= 0
+
+                        # Backward pass
@@
-                    if num_valid_samples > 0:
+                    if (mb_idx < iterator_len) and (num_valid_samples > 0):
                         mb_losses.append(loss.item())
                         all_mb_metrics.append(loss_metrics)

This keeps the original gradient semantics (dummy batches still backpropagate zero) while ensuring metrics and loss aggregation only use real microbatches.


1276-1289: score() drops temperature scaling for non‑DTensor logits

In score() you correctly apply temperature scaling:

if not hasattr(outputs, "logits"):
    logits = self.model.lm_head(outputs.last_hidden_state)
else:
    logits = outputs.logits
# Apply temperature scaling
logits = self._apply_temperature_scaling(logits)

But immediately after you do:

if isinstance(logits, DTensor):
    logits = logits.to(torch.float32)
else:
    logits = outputs.logits.to(torch.float32)

For the non‑DTensor case this re‑derives logits from outputs.logits, discarding the temperature scaling that was just applied. It also relies on outputs outside the autocast block unnecessarily.

You can fix this by always casting from the already‑scaled logits tensor:

-                if isinstance(logits, DTensor):
-                    logits = logits.to(torch.float32)
-                else:
-                    logits = outputs.logits.to(torch.float32)
+                # Always cast from the already temperature‑scaled logits
+                if isinstance(logits, DTensor):
+                    logits = logits.to(torch.float32)
+                else:
+                    logits = logits.to(torch.float32)

Optionally del outputs right after extracting logits to free memory earlier.

🧹 Nitpick comments (3)
tests/unit/environments/test_reward_model_environment.py (1)

74-81: DTensor worker path selection looks consistent with registry and Policy

Switching reward_model_py_executable_class to the new workers subpackage and gating between V2/non‑V2 on dtensor_cfg["_v2"] matches the updated registry and Policy worker path selection. Consider adding/keeping a test that exercises the _v2=False branch as well so both worker classes stay covered.

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (1)

465-476: init_collective duplicates BasePolicyWorker logic

DTensorPolicyWorkerV2.init_collective() re‑implements exactly the same NCCL communicator setup that already exists in BasePolicyWorker.init_collective.

To reduce duplication and keep behavior consistent if the base implementation evolves, consider either:

  • Removing this override entirely and using the base method, or
  • Having this method delegate directly to super().init_collective(...) if you want to keep the docstring locally.
nemo_rl/models/policy/workers/base_policy_worker.py (1)

155-161: Consider adding return type hint for context manager.

The docstring indicates this is a context manager, but the signature lacks a return type hint. Based on the subclass implementations in the relevant snippets, this should return Generator[None, None, None].

+from typing import Generator
+
 @abstractmethod
-def use_reference_model(self):
+def use_reference_model(self) -> Generator[None, None, None]:
     """Context manager that temporarily swaps the reference model and active model.
     On entry: Moves model to CPU, moves reference_model to CUDA. Swaps the references
     On exit: Restores original references and re-flips cuda/cpu
     """
     ...
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66d80e6 and 547f360.

📒 Files selected for processing (9)
  • .github/workflows/_automodel_integration_check.yml (1 hunks)
  • docs/fp8.md (1 hunks)
  • nemo_rl/distributed/ray_actor_environment_registry.py (1 hunks)
  • nemo_rl/models/policy/lm_policy.py (2 hunks)
  • nemo_rl/models/policy/workers/base_policy_worker.py (1 hunks)
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (3 hunks)
  • nemo_rl/models/policy/workers/megatron_policy_worker.py (2 hunks)
  • tests/unit/_plugins/remote_select.py (2 hunks)
  • tests/unit/environments/test_reward_model_environment.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • nemo_rl/distributed/ray_actor_environment_registry.py
  • tests/unit/environments/test_reward_model_environment.py
  • nemo_rl/models/policy/workers/base_policy_worker.py
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/unit/_plugins/remote_select.py
  • nemo_rl/models/policy/lm_policy.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • nemo_rl/distributed/ray_actor_environment_registry.py
  • nemo_rl/models/policy/workers/base_policy_worker.py
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • nemo_rl/models/policy/lm_policy.py
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • nemo_rl/distributed/ray_actor_environment_registry.py
  • .github/workflows/_automodel_integration_check.yml
  • tests/unit/environments/test_reward_model_environment.py
  • docs/fp8.md
  • nemo_rl/models/policy/workers/base_policy_worker.py
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/unit/_plugins/remote_select.py
  • nemo_rl/models/policy/lm_policy.py
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • nemo_rl/distributed/ray_actor_environment_registry.py
  • tests/unit/environments/test_reward_model_environment.py
  • nemo_rl/models/policy/workers/base_policy_worker.py
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/unit/_plugins/remote_select.py
  • nemo_rl/models/policy/lm_policy.py
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section

Files:

  • docs/fp8.md
🧬 Code graph analysis (3)
nemo_rl/distributed/ray_actor_environment_registry.py (1)
nemo_rl/distributed/virtual_cluster.py (1)
  • PY_EXECUTABLES (43-59)
nemo_rl/models/policy/workers/base_policy_worker.py (5)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-860)
nemo_rl/models/policy/interfaces.py (2)
  • LogprobOutputSpec (25-28)
  • ReferenceLogprobOutputSpec (31-34)
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (3)
  • init_collective (465-475)
  • use_reference_model (1581-1607)
  • train (489-865)
nemo_rl/utils/nvml.py (1)
  • get_device_uuid (55-77)
nemo_rl/models/policy/workers/megatron_policy_worker.py (2)
  • use_reference_model (1358-1406)
  • train (900-1150)
nemo_rl/models/policy/workers/megatron_policy_worker.py (1)
nemo_rl/models/policy/workers/base_policy_worker.py (1)
  • BasePolicyWorker (15-218)
🪛 GitHub Actions: Copyright check
nemo_rl/models/policy/workers/base_policy_worker.py

[error] 1-1: Copyright check failed: Found files with missing copyright notices.

🪛 Ruff (0.14.7)
nemo_rl/models/policy/workers/base_policy_worker.py

19-19: Unused method argument: train_world_size

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (12)
docs/fp8.md (1)

73-80: Updated error trace path matches new Megatron worker module

The FP8 training error example now points at nemo_rl.models.policy.workers.megatron_policy_worker.MegatronPolicyWorker, which matches the new workers subpackage layout and class name. No further changes needed.

tests/unit/_plugins/remote_select.py (1)

63-74: Remote‑select example updated to new workers layout

The example nodeid and mapped file path now reference models/policy/workers/dtensor_policy_worker.py, which aligns with the new directory structure. No behavioral impact.

.github/workflows/_automodel_integration_check.yml (1)

133-139: Automodel consistency check paths correctly point to workers subpackage

The DTensor worker file variables now reference nemo_rl/models/policy/workers/dtensor_policy_worker*.py, matching the code move. The synchronization logic and messages remain valid.

nemo_rl/distributed/ray_actor_environment_registry.py (1)

27-46: Actor registry updates align with new worker locations and backends

The registry now maps:

  • workers.dtensor_policy_worker.DTensorPolicyWorkerVLLM_EXECUTABLE (as per existing vLLM coupling comment)
  • workers.dtensor_policy_worker_v2.DTensorPolicyWorkerV2PY_EXECUTABLES.AUTOMODEL
  • workers.megatron_policy_worker.MegatronPolicyWorkerMCORE_EXECUTABLE

These FQNs match the new module structure and the corresponding backends. The reward‑model environment test and Policy also use the same strings.

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (1)

88-101: BasePolicyWorker integration for DTensorPolicyWorkerV2 looks structurally sound

Switching DTensorPolicyWorkerV2 to subclass BasePolicyWorker reuses shared logic for ZMQ setup, GPU info, free‑memory queries, reference‑policy logprobs, and profiling. The @ray.remote(... ) # pragma: no cover annotation also matches the coverage guideline for Ray actors under nemo_rl/.

nemo_rl/models/policy/lm_policy.py (1)

82-125: Worker class paths and _v2 switch match new workers layout

The Policy constructor now selects:

  • Megatron: "nemo_rl.models.policy.workers.megatron_policy_worker.MegatronPolicyWorker"
  • DTensor v2: "nemo_rl.models.policy.workers.dtensor_policy_worker_v2.DTensorPolicyWorkerV2" when dtensor_cfg["_v2"] is true
  • DTensor v1: "nemo_rl.models.policy.workers.dtensor_policy_worker.DTensorPolicyWorker" otherwise

These strings are consistent with the new module locations and the updated actor registry and tests.

If _v2 and any new dtensor_cfg keys (e.g., those used in DTensorPolicyWorkerV2 for cache clearing) were just introduced, ensure their TypedDict definitions and example YAML configs are updated to document purpose and valid values.

nemo_rl/models/policy/workers/megatron_policy_worker.py (1)

120-127: MegatronPolicyWorker cleanly migrates to BasePolicyWorker

Importing and subclassing BasePolicyWorker centralizes:

  • collective initialization (init_collective and model_update_group),
  • basic liveness/memory/profiling utilities,
  • ZMQ socket setup and get_free_memory_bytes, and
  • reference‑policy logprobs via get_reference_policy_logprobs + use_reference_model.

Within this worker you already set self.rank before any potential init_collective use and rely on maybe_init_zmq()/model_update_group only in methods that are now implemented in the base, so the refactor looks consistent with the Policy- and registry‑side changes.

nemo_rl/models/policy/workers/base_policy_worker.py (5)

18-36: Unused train_world_size parameter.

The train_world_size parameter is declared but never used in the method body. The docstring mentions it's "used in inference cluster" but that usage isn't implemented here.

If this parameter is reserved for future use or API consistency with subclasses, consider adding a comment or using _ = train_world_size to explicitly acknowledge it's intentionally unused. Otherwise, remove it.


38-49: LGTM with a note on implicit attributes.

These utility methods are straightforward. Note that get_gpu_info assumes self.model exists, which is an implicit contract subclasses must fulfill. This is acceptable for an ABC but could benefit from a class-level docstring documenting required attributes.


67-79: LGTM - ZMQ initialization is reasonable.

The lazy initialization pattern and timeout/linger settings are appropriate. Note that REQ sockets with bind() is an unusual pattern (typically REQ connects and REP binds), but this may be intentional for your IPC architecture.


88-93: LGTM!

The shutdown method correctly cleans up ZMQ resources with proper existence checks.


129-218: LGTM - Abstract interface is well-defined.

The abstract methods provide a clean contract for policy worker implementations. The signatures align with the existing subclass implementations shown in the relevant code snippets.

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Copy link
Contributor

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed tech pubs review and provided a few copyedits.

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

ℹ️ File Consistency Check

Check based on commit: 39c5f36 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: ashors1 <ashors@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

ℹ️ File Consistency Check

Check based on commit: a93db16 (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: ashors1 <ashors@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

Check based on commit: 678e0ee (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: ashors1 <ashors@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

Check based on commit: 81b085b (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: ashors1 <ashors@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

Check based on commit: 6a2ee7c (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

terrykong
terrykong previously approved these changes Dec 4, 2025
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Dec 4, 2025
@terrykong terrykong enabled auto-merge (squash) December 4, 2025 06:53
@terrykong terrykong linked an issue Dec 4, 2025 that may be closed by this pull request
3 tasks
Signed-off-by: ashors1 <ashors@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

ℹ️ File Consistency Check

Check based on commit: 0b7e18b (PR #1585 from ashors/base-policy-worker)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 4, 2025
@terrykong terrykong merged commit a99bc26 into main Dec 4, 2025
40 of 41 checks passed
@terrykong terrykong deleted the ashors/base-policy-worker branch December 4, 2025 21:40
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests CI Relating to CI documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MegatronPolicyWorker refactor

3 participants