[Nixl][PD] Lease renewal TTL KV blocks on P by NickLucche · Pull Request #41383 · vllm-project/vllm

NickLucche · 2026-04-30T15:17:51Z

This PR implements a KV Cache lease renewal mechanism for optimizing the time of remote blocks retention.
The effort is described in more details here https://docs.google.com/document/d/1i-O6kqY7WfF1lPyyftRpCQt5fwnFYIEDZKCxyB51Sjg/edit?usp=sharing.

TL;DR: VLLM_NIXL_ABORT_REQUEST_TIMEOUT single timeout is too simple and leads to P holding requests for too long when D crashes.
We need a more dynamic TTL “lease renewal” system that minimizes the time blocks are stranded on P.
At the same time, we also need a way for D to extend TTL of requests blocks in P that are currently in the waiting queue.
This ensures traffic surges on D do not lead to blocks “early-free” due to congestion

Test with

 --kv-transfer-config '{
    "kv_connector": "NixlConnector",
    "kv_role": "kv_both",
    "kv_connector_extra_config": {
      "initial_kv_lease": 20,
      "heartbeat_interval": 3,
      "heartbeat_lease_extension": 10
    }
  }'

pytest -v tests/v1/kv_connector/unit/test_nixl_heartbeat.py

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-04-30T15:18:58Z

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

This pull request introduces a heartbeat mechanism for KV transfer to maintain the availability of KV blocks on the producer side while requests are queued in the consumer's scheduler. It implements an ObservableRequestQueue wrapper in the V1 scheduler, allowing connectors to receive callbacks on queue additions and removals. The NixlConnector leverages these callbacks to track requests and periodically send heartbeat notifications to remote engines to extend block leases. Feedback for this PR focuses on the robustness of the lease timeout mechanism and potential race conditions when updating request expiration times in the worker.

gemini-code-assist · 2026-04-30T15:23:06Z

+        new_expiry = time.perf_counter() + self._lease_extension
+        for req_id in payload.split(","):
+            if req_id in self._reqs_to_send:
+                old = self._reqs_to_send[req_id]
+                self._reqs_to_send[req_id] = max(old, new_expiry)


The heartbeat extension logic self._reqs_to_send[req_id] = max(old, new_expiry) is susceptible to race conditions if multiple heartbeats are processed concurrently or if the scheduler updates the expiration time simultaneously. Consider using a thread-safe update mechanism or ensuring that the heartbeat processing is serialized with other operations that modify _reqs_to_send.

Hard to imagine we'll end up overwriting a later expiry with a newer expiry? Or certainly not in a way that would be significant?

mergify · 2026-04-30T16:23:02Z

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

markmc

First pass, sorry if it's a bit scattered!

markmc · 2026-05-01T14:30:37Z

+        # request-fails=>do some policy retry, compute locally
+        #  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+        #  [[0, 1, 2, _, 4, 5, 6, 7, 8, 9]]
+        #  SW[ __________[11, 12, 13]]


Where did this come from?

I think this was me showing how request level recovery works in code to Summer lol

orozery · 2026-05-04T07:34:42Z

I think we should try simplifying the base.py/scheduler.py changes.

First, isn't it enough to just notify the connector whenever a new request is added to the waiting queue?
So simply add to Scheduler.add_request, after self.requests[request.request_id] = request:

if self.connector is not None:
    self.connector.on_new_request(request)

We can in the future consolidate request_finished and new_request to a some single request_event function.

NickLucche · 2026-05-04T13:54:55Z

@markmc thanks for the review man!

@orozery thanks for checking, let's chat about interface on the design doc, might be easier
https://docs.google.com/document/d/1i-O6kqY7WfF1lPyyftRpCQt5fwnFYIEDZKCxyB51Sjg/edit?usp=sharing

First, isn't it enough to just notify the connector whenever a new request is added to the waiting queue?

yes, but I've taken the chance to try and tackle request queue observability in a more generalized way with this PR.
The rationale is the same: avoid having to expand the number of callbacks with spot-on patches.
Happy to iterate on the design

ivanium · 2026-05-04T22:35:27Z

            yield heapq.heappop(heap_copy)


+class ObservableRequestQueue(RequestQueue):


Shall we put this inside the connector base or even in nixl connector? Curious about the consideration here.

We could potentially get rid of callbacks too if putting this ObservableRequestQueue inside the nixl connector

@ivanium could you elaborate your proposal a bit? Are you saying we should move the queue within the connector..?

Yes. On second thought, I think this aligns well with @orozery 's proposal

NickLucche · 2026-05-05T14:07:57Z

@ivanium in light of the v2 interface push, I am going to implement @orozery suggestion and keep things easy here wrt API.
Happy to re-purpose this ObservableQ in the future if needs be.

mergify · 2026-05-05T14:27:05Z

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-05T15:58:20Z

Documentation preview: https://vllm--41383.org.readthedocs.build/en/41383/

NickLucche · 2026-05-05T15:59:20Z

@markmc Addressed your review, thanks again!
Done the following

separate TTL for blocks on D
single kv_lease_duration arg (others are derived/hardcoded)
docs
guard remote_agents access with a lock

mergify · 2026-05-05T16:07:14Z

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

markmc · 2026-05-06T11:13:02Z

First, isn't it enough to just notify the connector whenever a new request is added to the waiting queue?

yes, but I've taken the chance to try and tackle request queue observability in a more generalized way with this PR. The rationale is the same: avoid having to expand the number of callbacks with spot-on patches. Happy to iterate on the design

@ivanium in light of the v2 interface push, I am going to implement @orozery suggestion and keep things easy here wrt API.
Happy to re-purpose this ObservableQ in the future if needs be.

FTR ... @NickLucche and I chatted briefly about this offline, and my view was

For good or bad, it's a public interface with external users, we have a general sense that maybe it has evolved a little too organically to date, and it's really hard to change it now
And so it's natural to want to design a new hook like this in a way that is a least plausibly more general than the specific use case we're adding it for
I think ObservableQ is a pretty nice abstraction, but I do wonder whether we might find a need to go beyond observability, and e.g. allow connectors to reject new requests for whatever reason
That said, it also feels significantly over-engineered for this specific use case, and we might well find there are no other use cases. I like the simplicity of the latest on_new_request() hook
In general, we're going to struggle to get this interface into a nice spot without feedback from connector authors about their use cases, and with external connectors, we're going to struggle to get that feedback and evolve the interface
If/when we do v2, if we were able to make the public interface for external connectors strictly limited to the most common integration use cases, and have a private, more expansive, more experimental, rapidly evolving interface for in-tree connectors with more boundary-pushing use cases ... that would be ideal

markmc

Looking good, only pretty minor stuff in comments

markmc · 2026-05-06T11:25:56Z

            logger.warning(
                "Releasing expired KV blocks for request %s which were "
-                "retrieved by %d decode worker(s) within %d seconds.",
+                "retrieved by %d decode worker(s) before lease expired.",


Hmm, is this what we'll see on D in the bidrectional case?

changing to say 'remote'

markmc · 2026-05-06T11:28:54Z

+        new_expiry = time.perf_counter() + self._lease_extension
+        for req_id in payload.split(","):
+            if req_id in self._reqs_to_send:
+                old = self._reqs_to_send[req_id]
+                self._reqs_to_send[req_id] = max(old, new_expiry)


Hard to imagine we'll end up overwriting a later expiry with a newer expiry? Or certainly not in a way that would be significant?

mergify · 2026-05-06T14:19:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: NickLucche <nlucches@redhat.com>

…ucture Signed-off-by: NickLucche <nlucches@redhat.com>

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche · 2026-05-06T09:47:45Z

+    def _ensure_handshake(
+        self,
+        engine_id: EngineId,


minor refactor to avoid duplicating handshake code (which is nasty with locks)

NickLucche · 2026-05-06T14:12:44Z

            logger.warning(
                "Releasing expired KV blocks for request %s which were "
-                "retrieved by %d decode worker(s) within %d seconds.",
+                "retrieved by %d decode worker(s) before lease expired.",


changing to say 'remote'

NickLucche · 2026-05-06T14:23:10Z

thanks @markmc , addressed comments and rebased!

markmc · 2026-05-06T14:58:10Z

Not a blocker, fine with you merging as-is, but re the warning log on freeing blocks in the bidrectional case:

This shouldn't be a warning - it's standard operation
It's not an expired "lease"

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat, xuechendi and ywang96 as code owners April 30, 2026 15:17

claude Bot reviewed Apr 30, 2026

View reviewed changes

mergify Bot added v1 kv-connector labels Apr 30, 2026

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

markmc requested changes May 1, 2026

View reviewed changes

ivanium reviewed May 4, 2026

View reviewed changes

mergify Bot added the documentation Improvements or additions to documentation label May 5, 2026

markmc reviewed May 6, 2026

View reviewed changes

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label May 6, 2026

NickLucche mentioned this pull request May 6, 2026

[Roadmap]: PD Disaggregation with NixlConnector Roadmap #33702

Open

54 tasks

mergify Bot added the needs-rebase label May 6, 2026

NickLucche added 14 commits May 6, 2026 14:19

initial changes in rfc

89ae25c

Signed-off-by: NickLucche <nlucches@redhat.com>

stop hb on worker->sched update_connector_output; optimize hb datastr…

49fc91f

…ucture Signed-off-by: NickLucche <nlucches@redhat.com>

handle bidir xfer

c3b3729

Signed-off-by: NickLucche <nlucches@redhat.com>

tests

c4c2197

Signed-off-by: NickLucche <nlucches@redhat.com>

compat version

4bbb593

Signed-off-by: NickLucche <nlucches@redhat.com>

drop observableQ in favor of on_new_request

95c2e49

Signed-off-by: NickLucche <nlucches@redhat.com>

separate TTL for blocks on D

0faa905

Signed-off-by: NickLucche <nlucches@redhat.com>

single kv_lease_duration arg

0bcdcba

Signed-off-by: NickLucche <nlucches@redhat.com>

docs

4d2dc80

Signed-off-by: NickLucche <nlucches@redhat.com>

guard remote_agents access with a lock

5e662b5

Signed-off-by: NickLucche <nlucches@redhat.com>

precommit

63ccd3a

Signed-off-by: NickLucche <nlucches@redhat.com>

minor

0626d1a

Signed-off-by: NickLucche <nlucches@redhat.com>

minor

3319176

Signed-off-by: NickLucche <nlucches@redhat.com>

decoder ttl docs

85ebe86

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche force-pushed the nixl-heartbeat branch from ca93e4b to 85ebe86 Compare May 6, 2026 14:20

mergify Bot removed the needs-rebase label May 6, 2026

NickLucche commented May 6, 2026

View reviewed changes

markmc approved these changes May 6, 2026

View reviewed changes

fix tests

f1b91a6

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche force-pushed the nixl-heartbeat branch from 203997c to f1b91a6 Compare May 7, 2026 09:21

This was referenced May 8, 2026

[Nixl][PD] Lease renewal TTL KV blocks on P #38027

Closed

[Core][KV Connector] Bounded early prefetch for waiting requests #42086

Open

		yield heapq.heappop(heap_copy)


		class ObservableRequestQueue(RequestQueue):

Uh oh!

Conversation

NickLucche commented Apr 30, 2026

Test with

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

mergify Bot commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 30, 2026

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orozery commented May 4, 2026

Uh oh!

NickLucche commented May 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche commented May 5, 2026

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

NickLucche commented May 5, 2026

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

markmc commented May 6, 2026

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 6, 2026

Uh oh!

Choose a reason for hiding this comment