feat: Port vllm port allocator to Rust in bindings #3125

grahamking · 2025-09-18T21:20:03Z

Because port allocation code is duplicated in several backend components.
Because the bindings now hide etcd, making it easier to replace.

I tried to stay faithful to the original vllm port allocation and reservation code.

Summary by CodeRabbit

New Features
- Runtime-based port configuration replaces the previous etcd path for vLLM workers.
- Added API to allocate contiguous port blocks: DistributedRuntime.allocate_port_block(namespace, port_min, port_max, block_size, context=None).
Refactor
- Migrated port allocation logic across the vLLM backend to use the distributed runtime, improving reliability and reducing contention.
Documentation
- Updated comments to reference shared storage instead of etcd.
Chores
- Added dependencies to Python bindings (local-ip-address, rand, socket2); removed an obsolete license header.

coderabbitai · 2025-09-18T21:31:42Z

Walkthrough

Ports allocation is migrated from an ETCD-based mechanism to a DistributedRuntime-based API across vLLM components. Python call sites now pass runtime and namespace. Rust/PyO3 bindings add DistributedRuntime.allocate_port_block with input validation and atomic reservation. Types and signatures in ports/args/main are updated accordingly; publisher comments are adjusted.

Changes

Cohort / File(s)	Summary
Runtime-based port allocation migration `components/backends/vllm/src/dynamo/vllm/args.py`, `components/backends/vllm/src/dynamo/vllm/main.py`, `components/backends/vllm/src/dynamo/vllm/ports.py`	Replace ETCD-centric APIs with DistributedRuntime-driven allocation. Update function/class signatures (remove EtcdContext, add runtime/namespace), rename configure_ports_with_etcd → configure_ports, rework allocation calls to runtime.allocate_port_block.
Bindings and runtime API extension `lib/bindings/python/rust/lib.rs`, `lib/bindings/python/src/dynamo/_core.pyi`, `lib/bindings/python/Cargo.toml`	Add DistributedRuntime.allocate_port_block (PyO3), input checks, randomized selection, socket binding, cleanup/retries. Update EtcdKvCache new behavior. Add deps: local-ip-address, rand, socket2. Expose new method in .pyi.
Comment-only updates `components/backends/vllm/src/dynamo/vllm/publisher.py`	Replace “etcd” references with “shared storage” in comments; no functional changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor W as Worker
  participant A as vLLM args.py
  participant P as vLLM ports.py
  participant R as DistributedRuntime (bindings)

  W->>A: await configure_ports(runtime, config)
  A->>P: allocate_and_reserve_port(runtime, namespace, metadata, range)
  P->>R: allocate_port_block(namespace, min, max, block_size, context)
  Note right of R: Validates inputs<br/>randomizes candidates<br/>binds sockets<br/>atomic reservation
  R-->>P: [ports]
  P-->>A: port(s)
  A-->>W: configured ports
  Note over R,P: Errors: release sockets, cleanup, retry (bounded)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: add better port logic (#2175) #2192 — Prior ETCD-backed port allocation; this PR refactors the same APIs to use DistributedRuntime.
fix: add better port logic #2175 — Introduced ETCD allocation in vLLM (ports.py, args.py, main.py); current changes replace those paths.
feat: kvbm + connector #2258 — Adjusts DistributedRuntime bindings; directly connected to adding allocate_port_block here.

Poem

A rabbit taps ports in a tidy row,
From runtime streams the numbers flow.
No more etcd burrows to keep,
Blocks reserved while sockets sleep.
Hop, hop! The workers cheer—
Fresh paths, clear lanes, all ports appear. 🐇⚓️

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description does not follow the repository's required template and is missing the Overview, Details, "Where should the reviewer start?" section, and Related Issues; it only provides brief motivation and a fidelity note. There are no file-level pointers, testing instructions, or explicit mentions of breaking API changes despite several public signature modifications in the patch. Because the required template fields are absent, the description is insufficient for effective reviewer onboarding.	Please update the PR description to match the repository template by adding an Overview summarizing intent, a Details section listing the key files and changes, and a "Where should the reviewer start?" section that points to the most important files/lines. Include Related Issues (e.g., "closes #xxx"), testing or reproduction steps, and call out any breaking API or public-surface changes and migration notes so reviewers can assess impact and test properly.
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly identifies the primary change — migrating the vllm port allocator into Rust bindings — and aligns with the main file changes (Rust binding additions and port-allocation refactor). It is concise and directly related to the changeset, so a reviewer scanning history will understand the main intent. The phrasing is slightly awkward (duplicate "Port") but remains understandable.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/backends/vllm/src/dynamo/vllm/publisher.py (1)
163-171: DEPRECATE factory setters: add deprecation note + type hints; keep behavior for one release.

Add DEPRECATED comments and type annotations on the two setters so callers see the deprecation now while retaining behavior for one release; update/flush call sites listed below.

Call sites to update:

components/backends/vllm/src/dynamo/vllm/publisher.py (setters definitions).

components/backends/vllm/src/dynamo/vllm/main.py: lines ~205-206 (factory.set_num_gpu_blocks_all / set_request_total_slots_all).

examples/multimodal/components/publisher.py: lines ~175-181 (publisher setters).

examples/multimodal/components/worker.py: lines ~150-155 (calls to stats_logger.set_num_gpu_blocks_all / set_request_total_slots_all).

Suggested diff (apply to the publisher methods):
-    # TODO Remove once we publish metadata to shared storage
-    def set_num_gpu_blocks_all(self, num_blocks):
+    # DEPRECATED: prefer shared metadata; slated for removal.
+    def set_num_gpu_blocks_all(self, num_blocks: int) -> None:
         if self.created_logger:
             self.created_logger.set_num_gpu_block(num_blocks)
@@
-    def set_request_total_slots_all(self, request_total_slots):
+    # DEPRECATED: prefer shared metadata; slated for removal.
+    def set_request_total_slots_all(self, request_total_slots: int) -> None:
         if self.created_logger:
             self.created_logger.set_num_request_total_slots(request_total_slots)

🧹 Nitpick comments (14)

components/backends/vllm/src/dynamo/vllm/publisher.py (5)

53-56: Replace ad‑hoc setter with typed, deprecated API and plan metadata move.

These fields should come from shared storage/runtime metadata. Until then, add type hints and mark as deprecated to ease removal.

-    # TODO: Remove this and pass as metadata through shared storage
-    def set_num_gpu_block(self, num_blocks):
-        self.num_gpu_block = num_blocks
+    # DEPRECATED: prefer metadata from shared storage; slated for removal.
+    def set_num_gpu_block(self, num_blocks: int) -> None:
+        self.num_gpu_block = int(num_blocks)

57-60: Same here: add typing and deprecate.

Mirror the approach for total slots.

-    # TODO: Remove this and pass as metadata through shared storage
-    def set_num_request_total_slots(self, request_total_slots):
-        self.request_total_slots = request_total_slots
+    # DEPRECATED: prefer metadata from shared storage; slated for removal.
+    def set_num_request_total_slots(self, request_total_slots: int) -> None:
+        self.request_total_slots = int(request_total_slots)

69-89: Don’t republish static metadata on every record; clamp/round KV blocks and hit‑rate.

request_total_slots and kv_total_blocks are static per model/GPU; publish once (e.g., init) and omit from hot‑path updates.
kv_active_blocks uses truncation; prefer clamped round to avoid under‑count and float drift.
Bound hit_rate to [0, 1].

-        # they should be part of some runtime metadata tied to MDC or put in shared storage ?
+        # they should be part of runtime metadata (MDC/shared storage), not republished each tick.
@@
-        hit_rate = 0
-        if scheduler_stats.prefix_cache_stats.queries > 0:
-            hit_rate = (
-                scheduler_stats.prefix_cache_stats.hits
-                / scheduler_stats.prefix_cache_stats.queries
-            )
+        hit_rate = 0.0
+        if scheduler_stats.prefix_cache_stats.queries > 0:
+            hit_rate = scheduler_stats.prefix_cache_stats.hits / scheduler_stats.prefix_cache_stats.queries
+            hit_rate = max(0.0, min(1.0, float(hit_rate)))
@@
-        kv_stats = KvStats(
-            kv_active_blocks=int(self.num_gpu_block * scheduler_stats.kv_cache_usage),
+        usage = max(0.0, min(1.0, float(scheduler_stats.kv_cache_usage)))
+        active_blocks = int(round(self.num_gpu_block * usage))
+        active_blocks = max(0, min(self.num_gpu_block, active_blocks))
+        kv_stats = KvStats(
+            kv_active_blocks=active_blocks,
             kv_total_blocks=self.num_gpu_block,
-            gpu_cache_usage_perc=scheduler_stats.kv_cache_usage,
+            gpu_cache_usage_perc=usage,

150-158: Auto‑initialize metrics endpoint once created.

Call init_publish here so first snapshot has static metadata set and zeros elsewhere.

         logger = DynamoStatLoggerPublisher(
             self.component, dp_rank, metrics_labels=self.metrics_labels
         )
         self.created_logger = logger
-
-        return logger
+        logger.init_publish()
+        return logger

61-66: iteration_stats and engine_idx are unused.

If vLLM doesn’t require them, prefix with underscores to signal intent; otherwise, integrate into metrics.

-    def record(
-        self,
-        scheduler_stats: SchedulerStats,
-        iteration_stats: Optional[IterationStats],
-        engine_idx: int = 0,
-    ):
+    def record(
+        self,
+        scheduler_stats: SchedulerStats,
+        _iteration_stats: Optional[IterationStats],
+        _engine_idx: int = 0,
+    ) -> None:

components/backends/vllm/src/dynamo/vllm/main.py (1)

204-204: Consider improving the TODO comment.

The comment could be more specific about what needs to be done.
-    # TODO Hack to get data, move this to registering in shared storage somewhere
+    # TODO: Move GPU blocks and slot configuration to a centralized registry service

components/backends/vllm/src/dynamo/vllm/args.py (1)

276-276: Improve error message clarity.

The error message could be more helpful for users.

-            "config.kv_port is not set; call configure_ports(...) before overwrite_args "
+            "config.kv_port is not set; ensure configure_ports(...) is called before overwrite_args "

lib/bindings/python/rust/lib.rs (2)

556-566: Consider the user experience for EtcdKvCache instantiation.

While redirecting users to the factory method is good, the error message could be clearer.
-        Err(PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(
-            "EtcdKvCache must be created using the 'new' class method",
-        ))
+        Err(PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(
+            "EtcdKvCache cannot be instantiated directly. Use EtcdKvCache.create(...) instead",
+        ))
528-538: Consider IPv6 support in socket binding.

Function forces IPv4 (socket2::Domain::IPV4). Repo already references IPv6 fallbacks (lib/bindings/python/rust/lib.rs:547–548, lib/runtime/src/pipeline/network/tcp/server.rs:130) and metrics accepts IPv4/IPv6 (components/metrics/src/lib.rs:194). Make the domain configurable, attempt IPv6 if IPv4 bind fails, or create an IPv6 dual‑stack socket (clear IPV6_V6ONLY) to cover both.

components/backends/vllm/src/dynamo/vllm/ports.py (5)

13-13: Use type-only import to avoid import-time dependency/cycles.

Avoid importing DistributedRuntime at module import; gate it under TYPE_CHECKING and quote annotations in signatures.

-from dynamo.runtime import DistributedRuntime
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from dynamo.runtime import DistributedRuntime

-async def allocate_and_reserve_port_block(
-    runtime: DistributedRuntime, namespace: str, request: PortAllocationRequest
-) -> list[int]:
+async def allocate_and_reserve_port_block(
+    runtime: "DistributedRuntime", namespace: str, request: PortAllocationRequest
+) -> list[int]:

-async def allocate_and_reserve_port(
-    runtime: DistributedRuntime,
-    namespace: str,
+async def allocate_and_reserve_port(
+    runtime: "DistributedRuntime",
+    namespace: str,

Also applies to: 67-69, 100-101

71-79: Validate inputs early and fix docstring; remove stale comment.

Add explicit guards for block_size vs range length; document runtime and namespace; drop the obsolete comment.

-    """
-    Allocate a contiguous block of ports from the specified range and atomically reserve them.
-    Returns a list of all allocated ports in order.
-
-    Args:
-        request: PortAllocationRequest containing all allocation parameters
-
-    Returns:
-        list[int]: List of all allocated ports in ascending order
-    """
-    # Create a list of valid starting ports (must have room for the entire block)
+    """
+    Allocate a contiguous block of ports from the specified range and atomically reserve them.
+    Returns a list of all allocated ports in order.
+
+    Args:
+        runtime: Distributed runtime used to perform the atomic reservation.
+        namespace: Logical namespace for the reservation keys.
+        request: PortAllocationRequest containing all allocation parameters.
+
+    Returns:
+        list[int]: List of all allocated ports in ascending order.
+    """
+    # Validate request before crossing the FFI boundary
+    range_len = request.port_range.max - request.port_range.min + 1
+    if request.block_size < 1:
+        raise ValueError("block_size must be >= 1")
+    if request.block_size > range_len:
+        raise ValueError(
+            f"block_size {request.block_size} exceeds range length {range_len} "
+            f"({request.port_range.min}-{request.port_range.max})"
+        )

Also applies to: 80-81, 90-96

48-55: Add dataclass-level validation for block_size.

Backstop invalid block_size at construction time; catches bugs closer to the source.

 @dataclass
 class PortAllocationRequest:
     """Parameters for port allocation"""
 
     metadata: PortMetadata
     port_range: DynamoPortRange
     block_size: int = 1
+
+    def __post_init__(self):
+        if self.block_size < 1:
+            raise ValueError("block_size must be >= 1")
+        range_len = self.port_range.max - self.port_range.min + 1
+        if self.block_size > range_len:
+            raise ValueError(
+                f"block_size {self.block_size} exceeds range length {range_len} "
+                f"({self.port_range.min}-{self.port_range.max})"
+            )

106-115: Docstring args incomplete; add defensive check for empty result.

Document runtime and namespace; guard against an unexpected empty return.

-    """
-    Allocate a port from the specified range and atomically reserve it.
-    This is a convenience wrapper around allocate_and_reserve_port_block with block_size=1.
-
-    Args:
-        metadata: Port metadata / context
-        port_range: DynamoPortRange object specifying min and max ports to try
-
-    Returns:
-        int: The allocated port number
-    """
+    """
+    Allocate a port from the specified range and atomically reserve it.
+    Convenience wrapper around allocate_and_reserve_port_block with block_size=1.
+
+    Args:
+        runtime: Distributed runtime used to perform the atomic reservation.
+        namespace: Logical namespace for the reservation keys.
+        metadata: Port metadata / context.
+        port_range: Port range to search (inclusive).
+
+    Returns:
+        int: The allocated port number.
+    """
@@
-    allocated_ports = await allocate_and_reserve_port_block(runtime, namespace, request)
-    return allocated_ports[0]  # Return the single allocated port
+    allocated_ports = await allocate_and_reserve_port_block(runtime, namespace, request)
+    if not allocated_ports:
+        raise RuntimeError("allocate_port_block returned no ports")
+    return allocated_ports[0]

Also applies to: 121-122

41-46: Update ETCD wording in metadata docstring.

This module no longer exposes ETCD; make the comment runtime-agnostic.

-    """Metadata to store with port reservations in ETCD"""
+    """Metadata attached to port reservations in the distributed runtime"""

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef6734d and 7eb75ef.

⛔ Files ignored due to path filters (1)

lib/bindings/python/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

components/backends/vllm/src/dynamo/vllm/args.py (5 hunks)
components/backends/vllm/src/dynamo/vllm/main.py (3 hunks)
components/backends/vllm/src/dynamo/vllm/ports.py (2 hunks)
components/backends/vllm/src/dynamo/vllm/publisher.py (3 hunks)
lib/bindings/python/Cargo.toml (1 hunks)
lib/bindings/python/rust/lib.rs (3 hunks)
lib/bindings/python/src/dynamo/_core.pyi (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-07-01T13:55:03.940Z

Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.

Applied to files:

components/backends/vllm/src/dynamo/vllm/main.py

🧬 Code graph analysis (5)

lib/bindings/python/src/dynamo/_core.pyi (1)

lib/bindings/python/rust/lib.rs (2)

allocate_port_block (383-509)

namespace (373-378)

components/backends/vllm/src/dynamo/vllm/main.py (1)

components/backends/vllm/src/dynamo/vllm/args.py (4)

Config (39-64)

configure_ports (198-258)

overwrite_args (336-363)

parse_args (67-195)

components/backends/vllm/src/dynamo/vllm/ports.py (2)

lib/bindings/python/src/dynamo/_core.pyi (5)

DistributedRuntime (31-62)

namespace (38-42)

block_size (625-629)

block_size (648-652)

allocate_port_block (51-56)

lib/bindings/python/rust/lib.rs (2)

namespace (373-378)

allocate_port_block (383-509)

lib/bindings/python/rust/lib.rs (4)

lib/runtime/src/pipeline/network/egress/push_router.rs (2)

rand (169-169)

err (245-245)

lib/bindings/python/src/dynamo/_core.pyi (5)

namespace (38-42)

block_size (625-629)

block_size (648-652)

allocate_port_block (51-56)

new (117-135)

lib/runtime/src/distributed.rs (2)

namespace (216-218)

etcd_client (269-271)

lib/bindings/python/rust/llm/entrypoint.rs (1)

to_pyerr (285-290)

components/backends/vllm/src/dynamo/vllm/args.py (2)

lib/bindings/python/src/dynamo/_core.pyi (2)

DistributedRuntime (31-62)

namespace (38-42)

components/backends/vllm/src/dynamo/vllm/ports.py (1)

allocate_and_reserve_port_block (67-96)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Build and Test - vllm
GitHub Check: Build and Test - sglang
GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/runtime/examples)

🔇 Additional comments (15)

components/backends/vllm/src/dynamo/vllm/publisher.py (1)

101-108: Consider skipping publish until metadata is initialized.

StatLoggerFactory.init_publish() is invoked in components/backends/vllm/src/dynamo/vllm/main.py (≈ lines 205–207) but only after AsyncLLM.from_vllm_config(...) creates engine_client — a DynamoStatLoggerPublisher may be constructed and emit metrics with default num_gpu_block/request_total_slots=1 before metadata is set. Move init_publish() to before engine_client creation or gate publishing inside DynamoStatLoggerPublisher until the setters run.

lib/bindings/python/Cargo.toml (1)

40-43: LGTM! Dependencies align with port allocation requirements.

The added dependencies are appropriate for the port allocation functionality:

local-ip-address for detecting the local IP

rand for randomizing port selection to reduce contention

socket2 for low-level socket binding to verify port availability

components/backends/vllm/src/dynamo/vllm/main.py (2)

25-25: LGTM! Clean migration from ETCD-based to runtime-based port configuration.

The import change properly reflects the new runtime-based approach.

66-66: LGTM! Simplified port configuration call.

The change correctly passes the runtime object to the new configure_ports function, removing the dependency on ETCD client.

components/backends/vllm/src/dynamo/vllm/args.py (4)

15-15: LGTM! Correct import for runtime-based approach.

The import of DistributedRuntime aligns with the new port allocation strategy.

198-199: LGTM! Clean function signature update.

The function signature properly accepts the runtime object and maintains backward compatibility through the Config parameter.

208-212: LGTM! Correct usage of the new runtime-based port allocation.

The code properly uses the runtime object for port allocation with appropriate parameters.

236-238: LGTM! Proper usage of block allocation for NIXL ports.

The block allocation correctly reserves consecutive ports needed for NIXL side channel communication.

lib/bindings/python/rust/lib.rs (7)

11-11: LGTM! Appropriate use of rand for port randomization.

The import is used correctly to reduce contention during port allocation.

382-509: Well-implemented port allocation with proper error handling and resource cleanup.

The implementation shows excellent practices:

Input validation for block size

Randomized candidate selection to reduce contention

Proper socket binding to ensure ports are available

Atomic ETCD reservation with rollback on failure

Clean resource management with RAII for sockets

393-397: Consider a more specific error type for validation errors.

Using PyValueError for input validation is appropriate.

417-418: Good optimization - limiting candidates to avoid excessive attempts.

The code properly caps the number of candidates to MAX_ALLOCATE_ATTEMPTS to prevent excessive iterations.

487-493: Excellent cleanup logic for partial reservations.

The code properly cleans up any partially reserved ports in ETCD when a reservation fails midway. The warning log for cleanup failures is appropriate since these are best-effort operations.

544-552: Good IPv6 fallback implementation.

The function properly attempts IPv4 first and falls back to IPv6 if needed, which is a robust approach.

445-456: Verify socket binding behavior across platforms.

socket2 is used (lib/bindings/python/rust/lib.rs — bind_tcp_port; lib/runtime/src/pipeline/network/tcp/server.rs); I didn't find cfg(target_os) branches in the socket code. Verify:

SO_REUSEADDR vs SO_REUSEPORT semantics when binding multiple sockets.

IPV6_V6ONLY / IPv4-vs-IPv6 (dual-stack) behavior across Linux/macOS/Windows.

Windows-specific differences (from_raw_fd vs from_raw_socket, FD semantics) and any assumptions about Unix FDs.

Add platform-gated handling or CI tests if behavior differs.

lib/bindings/python/src/dynamo/_core.pyi

- Because port allocation code is duplicated in several backend components. - Because the bindings now hide etcd, making it easier to replace. I tried to stay faithful to the original vllm port allocation and reservation code. Signed-off-by: Graham King <[email protected]>

Signed-off-by: Graham King <[email protected]>

whoisj

Pretty happy with this. Have a couple of questions, and a couple of nits which can be ignored.

components/backends/vllm/src/dynamo/vllm/main.py

components/backends/vllm/src/dynamo/vllm/ports.py

examples/multimodal/utils/args.py

lib/bindings/python/rust/lib.rs

Signed-off-by: Graham King <[email protected]>

whoisj

approved based on @grahamking's "fixing it" comment.

grahamking requested review from a team as code owners September 18, 2025 21:20

pull-request-size bot added the size/L label Sep 18, 2025

github-actions bot added the feat label Sep 18, 2025

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

lib/bindings/python/src/dynamo/_core.pyi Outdated Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB September 18, 2025 22:01 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 18, 2025 22:02 Inactive

alec-flowers approved these changes Sep 19, 2025

View reviewed changes

grahamking added 3 commits September 19, 2025 10:13

Fix type hints

c1752fc

Signed-off-by: Graham King <[email protected]>

chore(multimodal): Port allocation using DistributedRuntime

0828708

Signed-off-by: Graham King <[email protected]>

grahamking force-pushed the gk-reserve-ports branch from a17b300 to 0828708 Compare September 19, 2025 14:13

grahamking requested review from biswapanda, hhzhang16, indrajit96, krishung5 and whoisj as code owners September 19, 2025 14:13

pull-request-size bot added size/XL and removed size/L labels Sep 19, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 14:13 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 14:14 Inactive

whoisj reviewed Sep 19, 2025

View reviewed changes

components/backends/vllm/src/dynamo/vllm/main.py Outdated Show resolved Hide resolved

components/backends/vllm/src/dynamo/vllm/ports.py Outdated Show resolved Hide resolved

examples/multimodal/utils/args.py Show resolved Hide resolved

lib/bindings/python/rust/lib.rs Show resolved Hide resolved

Address comments

e3cd070

Signed-off-by: Graham King <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 16:20 Inactive

grahamking requested a review from whoisj September 19, 2025 16:22

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 16:23 Inactive

whoisj approved these changes Sep 19, 2025

View reviewed changes

grahamking enabled auto-merge (squash) September 19, 2025 16:56

grahamking merged commit 3865a94 into main Sep 19, 2025
17 of 20 checks passed

grahamking deleted the gk-reserve-ports branch September 19, 2025 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Port vllm port allocator to Rust in bindings #3125

feat: Port vllm port allocator to Rust in bindings #3125

Uh oh!

grahamking commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 18, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

whoisj left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Port vllm port allocator to Rust in bindings #3125

feat: Port vllm port allocator to Rust in bindings #3125

Uh oh!

Conversation

grahamking commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 18, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

grahamking commented Sep 18, 2025 •

edited by coderabbitai bot

Loading