Skip to content

feat: refit refactoring with zmq and overlapping#1267

Merged
terrykong merged 20 commits intomainfrom
zhiyul/refit_zmq_refactor
Oct 22, 2025
Merged

feat: refit refactoring with zmq and overlapping#1267
terrykong merged 20 commits intomainfrom
zhiyul/refit_zmq_refactor

Conversation

@ZhiyuLi-Nvidia
Copy link
Contributor

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia commented Oct 3, 2025

What does this PR do ?

Summary

  • As titled, we now use Ray solely for lightweight coordination and notifications, while leveraging ZeroMQ (ZMQ) for high-throughput, peer-to-peer (P2P) inter-process communication (IPC). This design avoids costly serialization and enables superior overlap between computation and data transfer.

  • Asynchronous Overlap with Ping-Pong Buffers: The weight update process (refit) now operates within an inner generator loop, triggered by notifications from the Ray coordinator. We achieve significant overlap between the weight gathering generator and the actual weight updates by using asynchronous ZMQ send/recv operations combined with a ping-pong buffer. This allows one buffer to be filled with incoming weights while the other is being used for computation, maximizing hardware utilization.

  • Unified Communication Protocol
    The ZMQ-based P2P communication logic has been standardized and applied across multiple components. This includes the async and sync LLM engines and all supported backends: DTensor, DTensor v2, and Mcore.

  • Significant Code Simplification
    By refactoring and aligning all components to use this single, efficient communication protocol, we have greatly simplified the codebase. This effort resulted in a net removal of approximately 170 lines of redundant or complex code.

Performance

  • dsv3 in mcore:
    • prepare_for_generation/total: 16% gain from 61s to 51s
    • prepare_for_generation/transfer_and_update_weights: 22% gain from 45s to 35s
  • llama-70b in mcore:
    • prepare_for_generation/total: 7% gain from 8.5 to 7.9 here
    • prepare_for_generation/transfer_and_update_weights: 12% from 2.5 to 2.2 here
  • llama-8b in dtensor v2:
    • prepare_for_generation/total: 25% improvement from 4s to 3s
    • prepare_for_generation/transfer_and_update_weights: regression from 0.2s to 0.5s
      • It is fine given relevance to changed position of garbage collection clear cache between offload and refit and total time is improved
  • gemma-3-27b-it in dtensor v2:
    • prepare_for_generation/total: 20% gain from 7.7s to 6.4s here
    • prepare _for_generation/transfer_and_update_weights: 6% gain from 3.0s to 2.8s here
  • dsv3 in mcore asyncLLM:
    • prepare_for_generation/total: 24% gain from 64.5s to 48.8s here
    • prepare_for_generation/transfer_and_update_weights: 27.6% gain from 50.3s to 36.5s here
  • llama-70b in mcore asyncLLM:
    • prepare_for_generation/total: 5% from 8.4 to 8.0here
    • prepare_for_generation/transfer_and_update_weights: 4% from 2.45 to 2.35 here

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features
    • Introduced streaming-based weight updates over ZMQ for refit, enabling direct model weight transfer between training and inference processes.
  • Performance Improvements
    • Reduced memory pressure via buffer-based streaming and alignment, improving refit throughput and stability on multi-GPU setups.
  • Reliability
    • Unified update flow with clearer success aggregation and synchronization, with better handling of timeouts and partial failures.
  • Chores
    • Added ZMQ as a new runtime dependency.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia requested review from a team as code owners October 3, 2025 10:56
@github-actions
Copy link

github-actions bot commented Oct 3, 2025

⚠️ File Consistency Check

Check based on commit: 8229960 (PR #1267 from zhiyul/refit_zmq_refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker_v2.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Update nemo_rl/models/policy/dtensor_policy_worker_v2.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

📝 Walkthrough

Walkthrough

Switches colocated refit from grouped CUDA-IPC handles to ZMQ-based streaming. Introduces ZMQ sockets on policy (REQ) and generation workers (REP), streams packed weights in aligned buffers, and awaits update futures. Refactors interfaces and vLLM paths accordingly; retains NCCL broadcast for non-colocated mode. Adds memory querying utilities and zmq dependency.

Changes

Cohort / File(s) Summary of Changes
Algorithm: GRPO refit path
nemo_rl/algorithms/grpo.py
Replace grouped IPC-transfer with streaming via ZMQ: compute buffer_size from config or free memory; call policy.stream_weights_via_ipc_zmq and generation.update_weights_via_ipc_zmq; await Ray futures; keep NCCL broadcast for non-colocated.
Generation interfaces & API
nemo_rl/models/generation/interfaces.py
Rename/retire update_weights_from_ipc_handles → update_weights_via_ipc_zmq; change return type to list[ray.ObjectRef]; no args.
vLLM backend internals
nemo_rl/models/generation/vllm/vllm_backend.py
Add ZMQ REP setup (get_zmq_address, maybe_init_zmq). Implement update_weights_via_ipc_zmq loop to receive packed tensors, rebuild CUDA tensors with alignment, load into model, send acks, and clean up. Remove old IPC-handle methods and packing structures. Adjust prepare_refit_info signature.
vLLM generation driver
nemo_rl/models/generation/vllm/vllm_generation.py
Replace update_weights_from_ipc_handles with update_weights_via_ipc_zmq returning futures; use async/sync worker calls without ipc_handles; raise if workers uninitialized.
vLLM workers (sync/async)
nemo_rl/models/generation/vllm/vllm_worker.py, nemo_rl/models/generation/vllm/vllm_worker_async.py
Replace IPC-handle based RPCs with unified "update_weights_via_ipc_zmq" collective RPC; remove ipc_handles arg; keep bool success semantics internally; simplify control flow.
Policy interfaces
nemo_rl/models/policy/interfaces.py
Remove prepare_weights_for_ipc and get_weights_ipc_handles. Add stream_weights_via_ipc_zmq method signature returning list[ray.ObjectRef].
LM policy
nemo_rl/models/policy/lm_policy.py
Remove IPC-handle preparation and retrieval; add get_free_memory_bytes; implement stream_weights_via_ipc_zmq that invokes worker-group streaming and returns futures.
DTensor policy worker
nemo_rl/models/policy/dtensor_policy_worker.py
Add ZMQ REQ setup (get_zmq_address, maybe_init_zmq); implement stream_weights_via_ipc_zmq using shared impl with DTensor-aware param generator; add get_free_memory_bytes.
Megatron policy worker
nemo_rl/models/policy/megatron_policy_worker.py
Add ZMQ REQ setup (get_zmq_address, maybe_init_zmq), get_free_memory_bytes; replace get_weights_ipc_handles with stream_weights_via_ipc_zmq using shared impl and HF export; unify metadata handling in prepare_refit_info.
Policy utils
nemo_rl/models/policy/utils.py
Add calculate_aligned_size; add stream_weights_via_ipc_zmq_impl with double buffering, aligned packing, CUDA sync, ZMQ send/ack loop, cleanup.
Dependencies
pyproject.toml
Add "zmq" dependency.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Trainer as Trainer (GRPO)
  participant Policy as Policy Worker(s) [REQ]
  participant Gen as Generation Worker(s) [REP]
  note over Trainer,Gen: Colocated refit via ZMQ streaming

  Trainer->>Policy: stream_weights_via_ipc_zmq(buffer_size_bytes)
  Trainer->>Gen: update_weights_via_ipc_zmq()
  par Stream batches
    loop For each packed buffer
      Policy->>Gen: ZMQ SEND(buffer bytes, metadata)
      Gen-->>Policy: ZMQ RECV(ack 0)
      Gen->>Gen: Rebuild tensors, load weights, sync stream
    end
    Policy->>Gen: ZMQ SEND("complete")
    Gen-->>Policy: ZMQ RECV(ack "done")
  end
  Gen-->>Trainer: futures resolved (success flags)
  Policy-->>Trainer: futures resolved
  Trainer->>Trainer: Validate all successes
  alt Any failure
    Trainer-->>Trainer: Raise refit update error
  end
Loading
sequenceDiagram
  participant Trainer as Trainer (GRPO)
  participant Policy as Policy
  participant Gen as Generation
  note over Trainer,Gen: Non-colocated path unchanged
  Trainer->>Policy: broadcast_weights_for_collective()
  Policy-->>Gen: NCCL broadcast
  Gen-->>Trainer: success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Suggested labels

CI:L1, r0.4.0

Suggested reviewers

  • terrykong
  • yfw
  • yaoyu-33

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.72% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning This PR is a major refactor introducing a new ZMQ-based IPC streaming path, changing multiple public interfaces and return types, and adding a new dependency, which can affect correctness, convergence, and performance. The PR description is essentially the default template and does not include any test results, numerics/convergence validation, or before/after performance data with configuration/context. Given the scope and potential to impact refit stability, GPU memory usage, and throughput, testing information is required to proceed. Therefore, this check fails due to missing testing and benchmarking evidence in the PR description. Please update the PR description to include: (1) unit/integration test results covering the new streaming path across DTensor/Megatron/vLLM workers in colocated and non-colocated modes, (2) numerics/convergence checks showing parity before/after refit on representative models and dtypes, (3) performance benchmarks (throughput/latency and memory usage) before vs. after with exact hardware, configs, and batch sizes, and (4) a brief note on breaking API changes with a migration plan or compatibility shims. CI logs or reproducible scripts/commands for these measurements should be linked. Once provided, this check can be re-evaluated.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly captures the core refactoring of the refit process to use ZMQ and buffer overlapping, matching the primary changes in the pull request without extraneous details or noise.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zhiyul/refit_zmq_refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (8)
nemo_rl/models/generation/interfaces.py (1)

236-238: Keep the IPC update contract abstract

GenerationInterface previously forced subclasses to implement the IPC update hook. Dropping @abstractmethod allows concrete classes to instantiate without overriding, only to fail later when the method is invoked. Please mark update_weights_via_ipc_zmq as abstract (as before) to enforce the contract at construction time.

-    def update_weights_via_ipc_zmq(self) -> list[ray.ObjectRef]:
+    @abstractmethod
+    def update_weights_via_ipc_zmq(self) -> list[ray.ObjectRef]:
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)

753-763: Refresh the docstring to match the new signature

The method no longer accepts ipc_handles, but the docstring still documents that parameter. Please align the docstring with the current API to avoid confusing downstream readers.

-    async def update_weights_via_ipc_zmq_async(
-        self,
-    ) -> bool:
-        """Async version of update_weights_from_ipc_handles.
-
-        Args:
-            ipc_handles (dict): Dictionary mapping device UUIDs (str) to parameter IPC handles.
+    async def update_weights_via_ipc_zmq_async(self) -> bool:
+        """Async counterpart of ``update_weights_via_ipc_zmq`` on the sync worker.
 
         Returns:
             bool: True if weights were successfully updated, False otherwise.
         """
nemo_rl/models/policy/dtensor_policy_worker.py (2)

1718-1724: Return type mismatch: cast NVML free memory to int.

nvml.get_free_memory_bytes returns float; method is annotated to return int.

-        return get_free_memory_bytes(device_idx)
+        return int(get_free_memory_bytes(device_idx))

1726-1751: Make streamed tensors contiguous and validate buffer size.

  • Add an assert to catch unintended zero-sized buffers.
  • Ensure tensors are contiguous before packing to avoid view/copy pitfalls.
 def stream_weights_via_ipc_zmq(self, buffer_size_bytes: int = 0) -> None:
     self.maybe_init_zmq()
@@
-        def dtensor_params_generator():
+        def dtensor_params_generator():
             """Generator that yields (name, tensor) pairs, converting DTensors to local tensors."""
             for name, tensor in self.model.state_dict().items():
                 if isinstance(tensor, DTensor):
                     # Convert DTensor to full tensor for streaming
-                    full_tensor = tensor.full_tensor()
-                    # Convert to target dtype
-                    yield name, full_tensor.to(self.dtype, non_blocking=True)
+                    full_tensor = tensor.full_tensor()
+                    # Convert to target dtype and ensure contiguity
+                    yield name, full_tensor.to(self.dtype, non_blocking=True).contiguous()
                 else:
-                    # Convert to target dtype
-                    yield name, tensor.to(self.dtype, non_blocking=True)
+                    # Convert to target dtype and ensure contiguity
+                    yield name, tensor.to(self.dtype, non_blocking=True).contiguous()
@@
-        stream_weights_via_ipc_zmq_impl(
+        assert buffer_size_bytes > 0, "buffer_size_bytes must be > 0"
+        stream_weights_via_ipc_zmq_impl(
             params_generator=dtensor_params_generator(),
             buffer_size_bytes=buffer_size_bytes,
             zmq_socket=self.zmq_socket,
             rank=self.rank,
             worker_name=str(self),
         )
nemo_rl/models/policy/lm_policy.py (1)

659-665: Cast free memory to int for annotated return type.

-        free_memory_bytes = min(ray.get(future) for future in futures)
-        return free_memory_bytes
+        free_memory_bytes = min(ray.get(future) for future in futures)
+        return int(free_memory_bytes)
nemo_rl/models/generation/vllm/vllm_generation.py (1)

751-779: Docstring and return type drift; clarify no-arg API and futures return.

Update to reflect ZMQ streaming and that this returns futures, not bool.

-    def update_weights_via_ipc_zmq(self) -> list[ray.ObjectRef]:
-        """Update weights of the policy using IPC handles, considering tensor parallelism.
-
-        For tp > 1, only the leader in each tensor parallel tied worker group will update weights.
-
-        Args:
-            ipc_handles (dict): Dictionary mapping device UUIDs (str) to parameter IPC handles.
-
-        Returns:
-            bool: True if weights were successfully updated, False otherwise.
-        """
+    def update_weights_via_ipc_zmq(self) -> list[ray.ObjectRef]:
+        """Kick off ZMQ-based IPC weight updates on workers.
+        Returns ray futures; caller should await them."""
@@
-        # Use run_all_workers_single_data since no data needs to be passed
+        # Use run_all_workers_single_data since no data needs to be passed
         futures = self.worker_group.run_all_workers_single_data(
             method_name,
             run_rank_0_only_axes=["tensor_parallel", "pipeline_parallel"],
         )
 
-        # this function should co-work with lm_policy, so we should wait for all futures to complete outside
+        # Wait for completion outside to coordinate with policy side
         return futures
nemo_rl/models/generation/vllm/vllm_worker.py (2)

701-739: Align NVTX tag and error message with new API name; keep collective_rpc call.

-    @wrap_with_nvtx_name("vllm_genertion_worker/update_weights_from_ipc_handles")
+    @wrap_with_nvtx_name("vllm_generation_worker/update_weights_via_ipc_zmq")
     def update_weights_via_ipc_zmq(self) -> bool:
@@
-            if self.cfg["vllm_cfg"]["async_engine"]:
-                raise RuntimeError(
-                    "update_weights_from_ipc_handles cannot be used with async_engine=True. Use update_weights_from_ipc_handles_async instead."
-                )
+            if self.cfg["vllm_cfg"]["async_engine"]:
+                raise RuntimeError(
+                    "update_weights_via_ipc_zmq cannot be used with async_engine=True. Use update_weights_via_ipc_zmq_async instead."
+                )
@@
-            result_or_coro = self.llm.collective_rpc(
-                "update_weights_via_ipc_zmq",
-                args=tuple(),
-            )
+            result_or_coro = self.llm.collective_rpc("update_weights_via_ipc_zmq", args=tuple())

727-733: Propagate worker failure context or raise.

Current print+False hides errors. Consider raising RuntimeError to fail fast in orchestration paths.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 557b7ec and 8229960.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • nemo_rl/algorithms/grpo.py (1 hunks)
  • nemo_rl/models/generation/interfaces.py (1 hunks)
  • nemo_rl/models/generation/vllm/vllm_backend.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_generation.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker_async.py (2 hunks)
  • nemo_rl/models/policy/dtensor_policy_worker.py (2 hunks)
  • nemo_rl/models/policy/interfaces.py (1 hunks)
  • nemo_rl/models/policy/lm_policy.py (1 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (5 hunks)
  • nemo_rl/models/policy/utils.py (2 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/generation/vllm/vllm_backend.py
  • nemo_rl/models/policy/utils.py
  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/models/policy/interfaces.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/generation/vllm/vllm_backend.py
  • nemo_rl/models/policy/utils.py
  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/models/policy/interfaces.py
🧬 Code graph analysis (9)
nemo_rl/models/generation/interfaces.py (3)
nemo_rl/models/generation/vllm/vllm_backend.py (1)
  • update_weights_via_ipc_zmq (84-161)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
  • update_weights_via_ipc_zmq (751-779)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
  • update_weights_via_ipc_zmq (702-738)
nemo_rl/models/generation/vllm/vllm_generation.py (4)
nemo_rl/models/generation/interfaces.py (1)
  • update_weights_via_ipc_zmq (236-238)
nemo_rl/models/generation/vllm/vllm_backend.py (1)
  • update_weights_via_ipc_zmq (84-161)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
  • update_weights_via_ipc_zmq (702-738)
nemo_rl/distributed/worker_groups.py (1)
  • run_all_workers_single_data (728-772)
nemo_rl/algorithms/grpo.py (9)
nemo_rl/models/policy/dtensor_policy_worker.py (2)
  • get_free_memory_bytes (1718-1723)
  • stream_weights_via_ipc_zmq (1727-1751)
nemo_rl/models/policy/lm_policy.py (2)
  • get_free_memory_bytes (659-664)
  • stream_weights_via_ipc_zmq (666-671)
nemo_rl/models/policy/megatron_policy_worker.py (2)
  • get_free_memory_bytes (1625-1630)
  • stream_weights_via_ipc_zmq (1634-1653)
nemo_rl/utils/nvml.py (1)
  • get_free_memory_bytes (80-90)
nemo_rl/models/policy/interfaces.py (1)
  • stream_weights_via_ipc_zmq (161-164)
nemo_rl/models/generation/interfaces.py (1)
  • update_weights_via_ipc_zmq (236-238)
nemo_rl/models/generation/vllm/vllm_backend.py (1)
  • update_weights_via_ipc_zmq (84-161)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
  • update_weights_via_ipc_zmq (751-779)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
  • update_weights_via_ipc_zmq (702-738)
nemo_rl/models/generation/vllm/vllm_worker.py (3)
nemo_rl/models/generation/interfaces.py (1)
  • update_weights_via_ipc_zmq (236-238)
nemo_rl/models/generation/vllm/vllm_backend.py (1)
  • update_weights_via_ipc_zmq (84-161)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
  • update_weights_via_ipc_zmq (751-779)
nemo_rl/models/policy/lm_policy.py (4)
nemo_rl/models/policy/dtensor_policy_worker.py (2)
  • get_free_memory_bytes (1718-1723)
  • stream_weights_via_ipc_zmq (1727-1751)
nemo_rl/models/policy/megatron_policy_worker.py (2)
  • get_free_memory_bytes (1625-1630)
  • stream_weights_via_ipc_zmq (1634-1653)
nemo_rl/distributed/worker_groups.py (1)
  • run_all_workers_single_data (728-772)
nemo_rl/models/policy/interfaces.py (1)
  • stream_weights_via_ipc_zmq (161-164)
nemo_rl/models/generation/vllm/vllm_backend.py (8)
nemo_rl/models/policy/dtensor_policy_worker.py (4)
  • get_zmq_address (1699-1701)
  • report_device_id (1686-1697)
  • maybe_init_zmq (1703-1708)
  • prepare_refit_info (1711-1716)
nemo_rl/models/policy/megatron_policy_worker.py (4)
  • get_zmq_address (1545-1547)
  • report_device_id (1532-1543)
  • maybe_init_zmq (1549-1554)
  • prepare_refit_info (1558-1572)
nemo_rl/models/generation/vllm/vllm_worker.py (3)
  • report_device_id (681-695)
  • prepare_refit_info (697-699)
  • update_weights_via_ipc_zmq (702-738)
nemo_rl/models/generation/interfaces.py (2)
  • prepare_refit_info (232-234)
  • update_weights_via_ipc_zmq (236-238)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
  • prepare_refit_info (732-749)
  • update_weights_via_ipc_zmq (751-779)
nemo_rl/utils/nsys.py (1)
  • wrap_with_nvtx_name (82-94)
nemo_rl/models/policy/utils.py (1)
  • calculate_aligned_size (256-266)
nemo_rl/models/generation/fp8.py (2)
  • is_fp8_model (202-213)
  • load_weights (289-316)
nemo_rl/models/policy/dtensor_policy_worker.py (7)
nemo_rl/models/generation/vllm/vllm_backend.py (4)
  • get_zmq_address (57-59)
  • report_device_id (52-55)
  • maybe_init_zmq (61-72)
  • prepare_refit_info (74-81)
nemo_rl/models/policy/megatron_policy_worker.py (6)
  • get_zmq_address (1545-1547)
  • report_device_id (1532-1543)
  • maybe_init_zmq (1549-1554)
  • prepare_refit_info (1558-1572)
  • get_free_memory_bytes (1625-1630)
  • stream_weights_via_ipc_zmq (1634-1653)
nemo_rl/models/policy/interfaces.py (2)
  • prepare_refit_info (157-158)
  • stream_weights_via_ipc_zmq (161-164)
nemo_rl/models/policy/lm_policy.py (3)
  • prepare_refit_info (648-657)
  • get_free_memory_bytes (659-664)
  • stream_weights_via_ipc_zmq (666-671)
nemo_rl/utils/nvml.py (1)
  • get_free_memory_bytes (80-90)
nemo_rl/utils/nsys.py (1)
  • wrap_with_nvtx_name (82-94)
nemo_rl/models/policy/utils.py (1)
  • stream_weights_via_ipc_zmq_impl (269-384)
nemo_rl/models/policy/megatron_policy_worker.py (3)
nemo_rl/models/policy/dtensor_policy_worker.py (5)
  • get_zmq_address (1699-1701)
  • report_device_id (1686-1697)
  • maybe_init_zmq (1703-1708)
  • get_free_memory_bytes (1718-1723)
  • stream_weights_via_ipc_zmq (1727-1751)
nemo_rl/models/policy/lm_policy.py (2)
  • get_free_memory_bytes (659-664)
  • stream_weights_via_ipc_zmq (666-671)
nemo_rl/models/policy/utils.py (1)
  • stream_weights_via_ipc_zmq_impl (269-384)
nemo_rl/models/policy/interfaces.py (3)
nemo_rl/models/policy/dtensor_policy_worker.py (1)
  • stream_weights_via_ipc_zmq (1727-1751)
nemo_rl/models/policy/lm_policy.py (1)
  • stream_weights_via_ipc_zmq (666-671)
nemo_rl/models/policy/megatron_policy_worker.py (1)
  • stream_weights_via_ipc_zmq (1634-1653)
🪛 Ruff (0.13.2)
nemo_rl/models/generation/vllm/vllm_generation.py

763-763: Avoid specifying long messages outside the exception class

(TRY003)

nemo_rl/models/generation/vllm/vllm_backend.py

155-155: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (2)
nemo_rl/models/policy/dtensor_policy_worker.py (1)

1711-1716: Ensure dtype metadata matches streamed dtype across workers.

You record (shape, self.dtype). That’s fine only if the streaming generator also casts to self.dtype for every tensor, and the receiver reconstructs using the same dtype. Please confirm MegatronPolicyWorker.prepare_refit_info also uses the same dtype to avoid size mismatches.

nemo_rl/models/policy/lm_policy.py (1)

666-672: LGTM: streaming API returns futures as intended.

@github-actions
Copy link

github-actions bot commented Oct 3, 2025

ℹ️ File Consistency Check

Check based on commit: 0d395af (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/refit_zmq_refactor branch from 0d395af to 3764bbd Compare October 3, 2025 17:53
@github-actions
Copy link

github-actions bot commented Oct 3, 2025

ℹ️ File Consistency Check

Check based on commit: 3764bbd (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Oct 3, 2025

ℹ️ File Consistency Check

Check based on commit: 73cfc13 (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@ZhiyuLi-Nvidia
Copy link
Contributor Author

@coderabbitai generate docstrings

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

✅ Actions performed

Docstrings generation started.

coderabbitai bot added a commit that referenced this pull request Oct 3, 2025
Docstrings generation was requested by @ZhiyuLi-Nvidia.

* #1267 (comment)

The following files were modified:

* `nemo_rl/algorithms/grpo.py`
* `nemo_rl/models/generation/interfaces.py`
* `nemo_rl/models/generation/vllm/vllm_backend.py`
* `nemo_rl/models/generation/vllm/vllm_generation.py`
* `nemo_rl/models/generation/vllm/vllm_worker.py`
* `nemo_rl/models/generation/vllm/vllm_worker_async.py`
* `nemo_rl/models/policy/dtensor_policy_worker.py`
* `nemo_rl/models/policy/dtensor_policy_worker_v2.py`
* `nemo_rl/models/policy/interfaces.py`
* `nemo_rl/models/policy/lm_policy.py`
* `nemo_rl/models/policy/megatron_policy_worker.py`
* `nemo_rl/models/policy/utils.py`
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

Note

Generated docstrings for this pull request at #1270

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/refit_zmq_refactor branch from 73cfc13 to b9f427b Compare October 4, 2025 00:34
@github-actions
Copy link

github-actions bot commented Oct 4, 2025

ℹ️ File Consistency Check

Check based on commit: b9f427b (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@guyueh1 guyueh1 linked an issue Oct 6, 2025 that may be closed by this pull request
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia requested a review from guyueh1 October 6, 2025 22:54
@ZhiyuLi-Nvidia
Copy link
Contributor Author

ZhiyuLi-Nvidia commented Oct 6, 2025

@guyueh1 @terrykong @parthchadha @yuki-97 @yfw could you take a review?

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

ℹ️ File Consistency Check

Check based on commit: 6e17ece (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

ℹ️ File Consistency Check

Check based on commit: 73b0b0b (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/refit_zmq_refactor branch from 73b0b0b to bca5826 Compare October 8, 2025 05:41
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 21, 2025
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: 7a85634 (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: 1da3381 (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 22, 2025
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: f319c1c (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/refit_zmq_refactor branch from f319c1c to 364b492 Compare October 22, 2025 06:14
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: 364b492 (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/refit_zmq_refactor branch from 364b492 to 62acf7a Compare October 22, 2025 06:20
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: 62acf7a (PR #1267 from zhiyul/refit_zmq_refactor)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 22, 2025
@terrykong terrykong merged commit 3a69c21 into main Oct 22, 2025
40 of 42 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 2, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refit Performance for Large MoE Models

5 participants