Skip to content

[Nixl][PD] Lease renewal TTL KV blocks on P#41383

Open
NickLucche wants to merge 15 commits intovllm-project:mainfrom
NickLucche:nixl-heartbeat
Open

[Nixl][PD] Lease renewal TTL KV blocks on P#41383
NickLucche wants to merge 15 commits intovllm-project:mainfrom
NickLucche:nixl-heartbeat

Conversation

@NickLucche
Copy link
Copy Markdown
Collaborator

This PR implements a KV Cache lease renewal mechanism for optimizing the time of remote blocks retention.
The effort is described in more details here https://docs.google.com/document/d/1i-O6kqY7WfF1lPyyftRpCQt5fwnFYIEDZKCxyB51Sjg/edit?usp=sharing.

TL;DR: VLLM_NIXL_ABORT_REQUEST_TIMEOUT single timeout is too simple and leads to P holding requests for too long when D crashes.
We need a more dynamic TTL “lease renewal” system that minimizes the time blocks are stranded on P.
At the same time, we also need a way for D to extend TTL of requests blocks in P that are currently in the waiting queue.
This ensures traffic surges on D do not lead to blocks “early-free” due to congestion

Test with

 --kv-transfer-config '{
    "kv_connector": "NixlConnector",
    "kv_role": "kv_both",
    "kv_connector_extra_config": {
      "initial_kv_lease": 20,
      "heartbeat_interval": 3,
      "heartbeat_lease_extension": 10
    }
  }'
pytest -v tests/v1/kv_connector/unit/test_nixl_heartbeat.py

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a heartbeat mechanism for KV transfer to maintain the availability of KV blocks on the producer side while requests are queued in the consumer's scheduler. It implements an ObservableRequestQueue wrapper in the V1 scheduler, allowing connectors to receive callbacks on queue additions and removals. The NixlConnector leverages these callbacks to track requests and periodically send heartbeat notifications to remote engines to extend block leases. Feedback for this PR focuses on the robustness of the lease timeout mechanism and potential race conditions when updating request expiration times in the worker.

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py
Comment on lines +1831 to +1835
new_expiry = time.perf_counter() + self._lease_extension
for req_id in payload.split(","):
if req_id in self._reqs_to_send:
old = self._reqs_to_send[req_id]
self._reqs_to_send[req_id] = max(old, new_expiry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The heartbeat extension logic self._reqs_to_send[req_id] = max(old, new_expiry) is susceptible to race conditions if multiple heartbeats are processed concurrently or if the scheduler updates the expiration time simultaneously. Consider using a thread-safe update mechanism or ensuring that the heartbeat processing is serialized with other operations that modify _reqs_to_send.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to imagine we'll end up overwriting a later expiry with a newer expiry? Or certainly not in a way that would be significant?

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, sorry if it's a bit scattered!

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/metadata.py
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/worker.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/worker.py
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/worker.py Outdated
Comment thread vllm/v1/core/sched/scheduler.py Outdated
# request-fails=>do some policy retry, compute locally
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# [[0, 1, 2, _, 4, 5, 6, 7, 8, 9]]
# SW[ __________[11, 12, 13]]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come from?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was me showing how request level recovery works in code to Summer lol

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented May 4, 2026

I think we should try simplifying the base.py/scheduler.py changes.

First, isn't it enough to just notify the connector whenever a new request is added to the waiting queue?
So simply add to Scheduler.add_request, after self.requests[request.request_id] = request:

if self.connector is not None:
    self.connector.on_new_request(request)

We can in the future consolidate request_finished and new_request to a some single request_event function.

@NickLucche
Copy link
Copy Markdown
Collaborator Author

@markmc thanks for the review man!

@orozery thanks for checking, let's chat about interface on the design doc, might be easier
https://docs.google.com/document/d/1i-O6kqY7WfF1lPyyftRpCQt5fwnFYIEDZKCxyB51Sjg/edit?usp=sharing

First, isn't it enough to just notify the connector whenever a new request is added to the waiting queue?

yes, but I've taken the chance to try and tackle request queue observability in a more generalized way with this PR.
The rationale is the same: avoid having to expand the number of callbacks with spot-on patches.
Happy to iterate on the design

Comment thread vllm/v1/core/sched/request_queue.py Outdated
yield heapq.heappop(heap_copy)


class ObservableRequestQueue(RequestQueue):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put this inside the connector base or even in nixl connector? Curious about the consideration here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially get rid of callbacks too if putting this ObservableRequestQueue inside the nixl connector

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanium could you elaborate your proposal a bit? Are you saying we should move the queue within the connector..?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. On second thought, I think this aligns well with @orozery 's proposal

@NickLucche
Copy link
Copy Markdown
Collaborator Author

@ivanium in light of the v2 interface push, I am going to implement @orozery suggestion and keep things easy here wrt API.
Happy to re-purpose this ObservableQ in the future if needs be.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

Documentation preview: https://vllm--41383.org.readthedocs.build/en/41383/

@mergify mergify Bot added the documentation Improvements or additions to documentation label May 5, 2026
@NickLucche
Copy link
Copy Markdown
Collaborator Author

@markmc Addressed your review, thanks again!
Done the following

  • separate TTL for blocks on D
  • single kv_lease_duration arg (others are derived/hardcoded)
  • docs
  • guard remote_agents access with a lock

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@markmc
Copy link
Copy Markdown
Member

markmc commented May 6, 2026

First, isn't it enough to just notify the connector whenever a new request is added to the waiting queue?

yes, but I've taken the chance to try and tackle request queue observability in a more generalized way with this PR. The rationale is the same: avoid having to expand the number of callbacks with spot-on patches. Happy to iterate on the design

@ivanium in light of the v2 interface push, I am going to implement @orozery suggestion and keep things easy here wrt API.
Happy to re-purpose this ObservableQ in the future if needs be.

FTR ... @NickLucche and I chatted briefly about this offline, and my view was

  • For good or bad, it's a public interface with external users, we have a general sense that maybe it has evolved a little too organically to date, and it's really hard to change it now
  • And so it's natural to want to design a new hook like this in a way that is a least plausibly more general than the specific use case we're adding it for
  • I think ObservableQ is a pretty nice abstraction, but I do wonder whether we might find a need to go beyond observability, and e.g. allow connectors to reject new requests for whatever reason
  • That said, it also feels significantly over-engineered for this specific use case, and we might well find there are no other use cases. I like the simplicity of the latest on_new_request() hook
  • In general, we're going to struggle to get this interface into a nice spot without feedback from connector authors about their use cases, and with external connectors, we're going to struggle to get that feedback and evolve the interface
  • If/when we do v2, if we were able to make the public interface for external connectors strictly limited to the most common integration use cases, and have a private, more expansive, more experimental, rapidly evolving interface for in-tree connectors with more boundary-pushing use cases ... that would be ideal

Copy link
Copy Markdown
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, only pretty minor stuff in comments

Comment thread vllm/v1/core/sched/scheduler.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl/scheduler.py
logger.warning(
"Releasing expired KV blocks for request %s which were "
"retrieved by %d decode worker(s) within %d seconds.",
"retrieved by %d decode worker(s) before lease expired.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this what we'll see on D in the bidrectional case?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing to say 'remote'

Comment on lines +1831 to +1835
new_expiry = time.perf_counter() + self._lease_extension
for req_id in payload.split(","):
if req_id in self._reqs_to_send:
old = self._reqs_to_send[req_id]
self._reqs_to_send[req_id] = max(old, new_expiry)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to imagine we'll end up overwriting a later expiry with a newer expiry? Or certainly not in a way that would be significant?

@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label May 6, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 6, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 6, 2026
NickLucche added 14 commits May 6, 2026 14:19
Signed-off-by: NickLucche <nlucches@redhat.com>
…ucture

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Comment on lines +586 to +588
def _ensure_handshake(
self,
engine_id: EngineId,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor refactor to avoid duplicating handshake code (which is nasty with locks)

logger.warning(
"Releasing expired KV blocks for request %s which were "
"retrieved by %d decode worker(s) within %d seconds.",
"retrieved by %d decode worker(s) before lease expired.",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing to say 'remote'

@NickLucche
Copy link
Copy Markdown
Collaborator Author

thanks @markmc , addressed comments and rebased!

@markmc
Copy link
Copy Markdown
Member

markmc commented May 6, 2026

Not a blocker, fine with you merging as-is, but re the warning log on freeing blocks in the bidrectional case:

  1. This shouldn't be a warning - it's standard operation
  2. It's not an expired "lease"

Signed-off-by: NickLucche <nlucches@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants