[Nixl][PD] Lease renewal TTL KV blocks on P#41383
[Nixl][PD] Lease renewal TTL KV blocks on P#41383NickLucche wants to merge 15 commits intovllm-project:mainfrom
Conversation
|
Hi @NickLucche, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request introduces a heartbeat mechanism for KV transfer to maintain the availability of KV blocks on the producer side while requests are queued in the consumer's scheduler. It implements an ObservableRequestQueue wrapper in the V1 scheduler, allowing connectors to receive callbacks on queue additions and removals. The NixlConnector leverages these callbacks to track requests and periodically send heartbeat notifications to remote engines to extend block leases. Feedback for this PR focuses on the robustness of the lease timeout mechanism and potential race conditions when updating request expiration times in the worker.
| new_expiry = time.perf_counter() + self._lease_extension | ||
| for req_id in payload.split(","): | ||
| if req_id in self._reqs_to_send: | ||
| old = self._reqs_to_send[req_id] | ||
| self._reqs_to_send[req_id] = max(old, new_expiry) |
There was a problem hiding this comment.
The heartbeat extension logic self._reqs_to_send[req_id] = max(old, new_expiry) is susceptible to race conditions if multiple heartbeats are processed concurrently or if the scheduler updates the expiration time simultaneously. Consider using a thread-safe update mechanism or ensuring that the heartbeat processing is serialized with other operations that modify _reqs_to_send.
There was a problem hiding this comment.
Hard to imagine we'll end up overwriting a later expiry with a newer expiry? Or certainly not in a way that would be significant?
|
Hi @NickLucche, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
markmc
left a comment
There was a problem hiding this comment.
First pass, sorry if it's a bit scattered!
| # request-fails=>do some policy retry, compute locally | ||
| # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] | ||
| # [[0, 1, 2, _, 4, 5, 6, 7, 8, 9]] | ||
| # SW[ __________[11, 12, 13]] |
There was a problem hiding this comment.
I think this was me showing how request level recovery works in code to Summer lol
|
I think we should try simplifying the base.py/scheduler.py changes. First, isn't it enough to just notify the connector whenever a new request is added to the waiting queue? We can in the future consolidate |
|
@markmc thanks for the review man! @orozery thanks for checking, let's chat about interface on the design doc, might be easier
yes, but I've taken the chance to try and tackle request queue observability in a more generalized way with this PR. |
| yield heapq.heappop(heap_copy) | ||
|
|
||
|
|
||
| class ObservableRequestQueue(RequestQueue): |
There was a problem hiding this comment.
Shall we put this inside the connector base or even in nixl connector? Curious about the consideration here.
There was a problem hiding this comment.
We could potentially get rid of callbacks too if putting this ObservableRequestQueue inside the nixl connector
There was a problem hiding this comment.
@ivanium could you elaborate your proposal a bit? Are you saying we should move the queue within the connector..?
There was a problem hiding this comment.
Yes. On second thought, I think this aligns well with @orozery 's proposal
|
Hi @NickLucche, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Documentation preview: https://vllm--41383.org.readthedocs.build/en/41383/ |
|
@markmc Addressed your review, thanks again!
|
|
Hi @NickLucche, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
FTR ... @NickLucche and I chatted briefly about this offline, and my view was
|
markmc
left a comment
There was a problem hiding this comment.
Looking good, only pretty minor stuff in comments
| logger.warning( | ||
| "Releasing expired KV blocks for request %s which were " | ||
| "retrieved by %d decode worker(s) within %d seconds.", | ||
| "retrieved by %d decode worker(s) before lease expired.", |
There was a problem hiding this comment.
Hmm, is this what we'll see on D in the bidrectional case?
There was a problem hiding this comment.
changing to say 'remote'
| new_expiry = time.perf_counter() + self._lease_extension | ||
| for req_id in payload.split(","): | ||
| if req_id in self._reqs_to_send: | ||
| old = self._reqs_to_send[req_id] | ||
| self._reqs_to_send[req_id] = max(old, new_expiry) |
There was a problem hiding this comment.
Hard to imagine we'll end up overwriting a later expiry with a newer expiry? Or certainly not in a way that would be significant?
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: NickLucche <nlucches@redhat.com>
…ucture Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
| def _ensure_handshake( | ||
| self, | ||
| engine_id: EngineId, |
There was a problem hiding this comment.
minor refactor to avoid duplicating handshake code (which is nasty with locks)
| logger.warning( | ||
| "Releasing expired KV blocks for request %s which were " | ||
| "retrieved by %d decode worker(s) within %d seconds.", | ||
| "retrieved by %d decode worker(s) before lease expired.", |
There was a problem hiding this comment.
changing to say 'remote'
|
thanks @markmc , addressed comments and rebased! |
|
Not a blocker, fine with you merging as-is, but re the warning log on freeing blocks in the bidrectional case:
|
This PR implements a KV Cache lease renewal mechanism for optimizing the time of remote blocks retention.
The effort is described in more details here https://docs.google.com/document/d/1i-O6kqY7WfF1lPyyftRpCQt5fwnFYIEDZKCxyB51Sjg/edit?usp=sharing.
TL;DR:
VLLM_NIXL_ABORT_REQUEST_TIMEOUTsingle timeout is too simple and leads to P holding requests for too long when D crashes.We need a more dynamic TTL “lease renewal” system that minimizes the time blocks are stranded on P.
At the same time, we also need a way for D to extend TTL of requests blocks in P that are currently in the waiting queue.
This ensures traffic surges on D do not lead to blocks “early-free” due to congestion
Test with