[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill by robertgshaw2-redhat · Pull Request #35764 · vllm-project/vllm

robertgshaw2-redhat · 2026-03-02T16:11:36Z

Summary

Problem: The static VLLM_NIXL_ABORT_REQUEST_TIMEOUT on the P side causes premature KV block expiry when the D queue is bursty, and wastes memory when set large enough to be safe.
Solution: Replace it with an active lease mechanism — D workers periodically refresh the lease while requests are queued, so P only frees blocks if D truly goes silent (crashed or failed).

Key changes

D scheduler (NixlConnectorScheduler): tracks all do_remote_prefill=True requests in _requires_lease_dict (req_id → P host/http_port) until they are scheduled. A daemon thread (nixl-d-lease-refresh) POSTs to P's /internal/nixl/lease_refresh every timeout // 3 seconds.
P API server: new route POST /internal/nixl/lease_refresh (registered unconditionally; no-op on non-NIXL instances) calls engine_client.call_utility_async("nixl_lease_refresh", request_ids).
P EngineCore: new nixl_lease_refresh() utility method delegates to connector.refresh_lease(), which runs in the engine loop thread so no locking is needed.
P scheduler (NixlConnectorScheduler.refresh_lease): updates _lease_refreshes[req_id] = now + timeout; passed to the worker via NixlConnectorMetadata.reqs_to_refresh each step.
P worker (NixlConnectorWorker): applies refreshes in start_load_kv; get_finished now does a full scan instead of early-break (expiry order is no longer guaranteed monotone after refreshes).
New env var VLLM_NIXL_HTTP_PORT (default 8000): tells D where P's HTTP server is; P includes remote_http_port in kv_transfer_params.

Thread safety

State	Writers	Readers	Lock
`_requires_lease_dict`	scheduler thread	lease-refresh thread	`_requires_lease_lock`
`_lease_refreshes`	engine loop (`refresh_lease`)	engine loop (`build_connector_meta`)	none (single thread)

Test plan

test_abort_timeout_on_prefiller — unchanged behaviour: D never adds the request to _requires_lease_dict (no remote_host), so P expires after VLLM_NIXL_ABORT_REQUEST_TIMEOUT as before
test_disagg_accuracy.py — D refreshes leases while requests queue; transfer completes normally
Manual: kill D mid-queue, verify P frees blocks after timeout

🤖 Generated with Claude Code

Replace the static P-side KV block timeout with an active lease mechanism. D workers periodically POST /internal/nixl/lease_refresh to extend the hold window while requests sit in the D queue, preventing premature block expiry on bursty workloads without requiring a large static timeout. - D scheduler tracks pending remote-prefill requests in `_requires_lease_dict`; a background thread POSTs refreshes every `timeout // 3` seconds - P scheduler receives refreshes via new EngineCore utility method `nixl_lease_refresh`, stores updated expiry in `_lease_refreshes`, and passes them to the P worker through `NixlConnectorMetadata` - P worker applies refreshes in `start_load_kv` and does a full scan (not early-break) in `get_finished` since expiry order may change - New `VLLM_NIXL_HTTP_PORT` env var (default 8000) lets D locate P's HTTP server; P includes it in `kv_transfer_params` - New FastAPI route registered unconditionally in `build_app`; no-op on non-NIXL instances Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Robert Shaw <robshaw@redhat.com>

gemini-code-assist

Code Review

This pull request introduces a KV lease refresh mechanism for disaggregated prefill in vLLM, addressing the issue of premature KV block expiry due to static timeout values. The changes include modifications to the D scheduler, P API server, P EngineCore, P scheduler, and P worker, along with a new environment variable for configuring the P's HTTP port. The implementation involves tracking requests requiring lease refresh, periodically refreshing leases from D workers, and updating lease expiry on the P side. The changes also include a full scan in get_finished to handle lease refreshes.

gemini-code-assist · 2026-03-02T16:15:00Z

+                with self._requires_lease_lock:
+                    if request.request_id not in self._requires_lease_dict:
+                        remote_host = params.get("remote_host", "")
+                        remote_http_port = params.get(
+                            "remote_http_port", envs.VLLM_NIXL_HTTP_PORT
+                        )
+                        if remote_host:
+                            self._requires_lease_dict[request.request_id] = (
+                                remote_host,
+                                remote_http_port,
+                            )


The code adds the remote_host and remote_http_port to _requires_lease_dict if remote_host exists. However, it does not handle the case where remote_host is empty. This could lead to issues if a request is added to _requires_lease_dict without a valid remote_host, potentially causing the lease refresh mechanism to fail. It's critical to ensure that only requests with valid remote_host values are added to _requires_lease_dict to prevent unexpected behavior.

if remote_host: self._requires_lease_dict[request.request_id] = ( remote_host, remote_http_port, )

gemini-code-assist · 2026-03-02T16:15:00Z

+                    with urlreq.urlopen(req, timeout=5) as resp:
+                        if resp.status != 200:
+                            raise RuntimeError(f"HTTP {resp.status}")


The code checks the HTTP status code but raises a generic RuntimeError without providing specific details about the error. This makes it difficult to debug issues related to lease refresh failures. It's critical to include the URL in the error message to provide more context for debugging.

if resp.status != 200: raise RuntimeError(f"HTTP {resp.status} at {url}")

gemini-code-assist · 2026-03-02T16:15:00Z

+        expired = [
+            req_id for req_id, expires in self._reqs_to_send.items() if now >= expires
+        ]


The code iterates through self._reqs_to_send to identify expired requests. However, it doesn't handle potential exceptions that might occur during the iteration or within the list comprehension. This could lead to the loop terminating prematurely and not releasing all expired KV blocks. It's critical to add error handling to ensure that all expired requests are processed and their KV blocks are released, even if some requests encounter issues.

now = time.perf_counter() expired = [] for req_id, expires in self._reqs_to_send.items(): try: if now >= expires: expired.append(req_id) except Exception as e: logger.warning(f"Error checking expiry for request {req_id}: {e}")

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify · 2026-03-06T07:57:05Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

markmc · 2026-03-25T11:34:40Z

xref #38027

markmc · 2026-04-16T13:46:47Z

Relevant: https://github.com/llm-d/llm-d/blob/84ba2f7004abf3ba0e323fd8ffe3f8ce3c94656f/docs/wip-docs-new/architecture/advanced/disaggregation/operations-vllm.md

KV blocks are stranded on the P instance until the timeout VLLM_NIXL_ABORT_REQUEST_TIMEOUT, which defaults to 480s. We are currently working on a lease-extension strategy that will dramatically shorten the timeout window.

mergify · 2026-04-23T06:06:12Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify Bot added frontend v1 kv-connector labels Mar 2, 2026

gemini-code-assist Bot reviewed Mar 2, 2026

View reviewed changes

Robert Shaw added 4 commits March 2, 2026 21:39

humans are still needed to write code

934224a

Signed-off-by: Robert Shaw <robshaw@redhat.com>

update from nixl to internal

4b554d1

Signed-off-by: Robert Shaw <robshaw@redhat.com>

refactor a bit

a250ae3

Signed-off-by: Robert Shaw <robshaw@redhat.com>

revert spurious change

275da3c

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify Bot added the needs-rebase label Mar 6, 2026

mergify Bot removed the needs-rebase label Apr 23, 2026

mergify Bot added the needs-rebase label Apr 23, 2026

Dao007forever mentioned this pull request Apr 29, 2026

[Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection to free stranded KV blocks #41269

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill#35764

[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill#35764
robertgshaw2-redhat wants to merge 5 commits intomainfrom
lease-refresh

robertgshaw2-redhat commented Mar 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

mergify Bot commented Mar 6, 2026

Uh oh!

markmc commented Mar 25, 2026

Uh oh!

markmc commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

robertgshaw2-redhat commented Mar 2, 2026

Summary

Key changes

Thread safety

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Mar 6, 2026

Uh oh!

markmc commented Mar 25, 2026

Uh oh!

markmc commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants