[Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection to free stranded KV blocks by Dao007forever · Pull Request #41269 · vllm-project/vllm

Dao007forever · 2026-04-29T18:46:55Z

Summary

Fixes a bug where prefill KV blocks on the P (prefill) node are stranded for up to VLLM_NIXL_ABORT_REQUEST_TIMEOUT (default 480s) when the D (decode) node rejects an incoming request before it is admitted to the engine scheduler.

When a request that carries KV-transfer params (do_remote_prefill=True, remote_block_ids, remote_engine_id, etc.) is rejected on D for reasons like:

Render / chat-template error
Model existence check failure (_check_model)
Input validation error
Engine errored
Beam-search-with-stream rejection
previous_response_id not found (responses API)

…D never opens a NIXL transfer for it, so P never receives the implicit "transfer complete → free blocks" signal. The blocks linger until the abort timeout fires.

This PR adds an explicit early-rejection notification:

The OpenAI-compatible serving layer (chat_completion, completion, responses) calls engine_client.notify_kv_transfer_request_rejected(...) on every pre-admission rejection path that has kv_transfer_params.do_remote_prefill=True.
AsyncLLM → EngineCoreClient (in-process / sync MP / async MP / DPLB-async MP) → EngineCore → Scheduler → KVConnectorBase_V1.request_rejected_before_admission(...).
NixlConnectorScheduler recognizes the params and enqueues an empty _reqs_need_recv entry. On the next scheduler tick the worker side issues the notification that releases the remote blocks. To make sure that tick actually happens when D has no other in-flight work, Scheduler.has_requests() now also reports pending connector metadata.
MultiConnector fans out to its child connectors and accepts the first one that recognizes the params.
DPLBAsyncMPClient broadcasts the notification to all local DP engines when no data_parallel_rank header is present (the rejection happens before admission, so the request isn't yet tracked in reqs_in_flight).

The _reqs_need_recv value type changed from (Request, BlockIds) to (dict[str, Any], BlockIds) because pre-admission rejections do not have a Request object — only the raw kv_transfer_params dict — and that's all the existing code on the _build_kv_connector_meta side actually consumed (req.kv_transfer_params).

If do_remote_prefill is set but the required remote_* metadata is incomplete, the connector logs a warning and returns False (no-op) rather than guessing.

Why this is not duplicating an existing PR

I checked open PRs and issues with searches like stranded prefill, stranded KV blocks rejected NIXL, request_rejected_before_admission, kv_transfer_params reject, and decode reject prefill. The closest matches are:

[Nixl][PD] Lease renewal TTL KV blocks on P #38027 [Nixl][PD] Lease renewal TTL KV blocks on P — bounds stranding via heartbeat-based TTL refresh. Complementary, not a duplicate: it shortens the window for crashes / disconnects, but pre-admission rejections still need an explicit "this transfer will never start" signal so blocks free immediately instead of after the (shortened) lease.
[Feat][NIXL] Add KV lease refresh mechanism for disaggregated prefill #35764 [Feat][NIXL] Add KV lease refresh mechanism — same area, same rationale as [Nixl][PD] Lease renewal TTL KV blocks on P #38027.
[Bugfix][KV Transfer] Reject NixlConnector + expandable_segments:True #41237 [Bugfix][KV Transfer] Reject NixlConnector + expandable_segments:True — unrelated config-validation fix.

No existing PR covers the rejected-before-admission notification path.

Test plan

New unit tests added:
- tests/v1/kv_connector/unit/test_nixl_connector.py::test_rejected_remote_prefill_request_enqueues_empty_recv — verifies the connector enqueues an empty recv with the original remote_* params and Scheduler.has_requests() flips True until the tick flushes it.
- tests/v1/kv_connector/unit/test_nixl_connector.py::test_rejected_remote_prefill_request_missing_metadata_is_ignored — verifies the connector no-ops (and does not mutate do_remote_prefill) when required remote_* fields are missing.
- tests/v1/kv_connector/unit/test_multi_connector.py::test_request_rejected_before_admission_uses_first_accepting_connector — verifies short-circuit fan-out behavior in MultiConnector.
Local: .venv/bin/python -m pytest tests/v1/kv_connector/unit/test_multi_connector.py tests/v1/kv_connector/unit/test_nixl_connector.py -v — 76 passed, 2 skipped, 1 unrelated failure (test_multi_example_connector_consistency, which fails on OSError: gated repo meta-llama/Llama-3.2-1B-Instruct — unrelated to this change).
Pre-commit hooks staged / will run on push.
Reviewer should confirm behavior end-to-end on a P/D NIXL deployment by sending an oversized prompt (or one whose chat template fails) carrying kv_transfer_params.do_remote_prefill=True and observing that the P-side block usage drops immediately rather than after VLLM_NIXL_ABORT_REQUEST_TIMEOUT.

AI assistance

This change was prepared with assistance from Claude (Anthropic). I (the human submitter) reviewed every changed line, ran the tests above, and can defend the design end-to-end.

🤖 Generated with Claude Code

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request implements a mechanism to notify KV connectors when a request is rejected before engine admission, allowing the NIXL connector to release remote KV blocks. It adds a new interface method and updates the OpenAI serving entrypoints to trigger cleanup during early failures. The review feedback identifies that the current error handling in the serving layer should be expanded to include adapter resolution and model identification, ensuring that all pre-admission failures are correctly handled.

…to free stranded KV blocks When a request carrying KV-transfer params is rejected on the D node before it is admitted to the engine scheduler (e.g. validation failure, render error, model check failure), the P node has no way to learn about the rejection and the prefill KV blocks remain pinned until VLLM_NIXL_ABORT_REQUEST_TIMEOUT (default 480s). This change plumbs a `notify_kv_transfer_request_rejected` path from the OpenAI-compatible serving layer down through the engine client, EngineCore, scheduler, and KV connector. For NIXL, the connector schedules an empty recv with the original `remote_*` params so the worker side issues a notification that frees the prefill blocks immediately. The scheduler also exposes `has_requests()` so the engine loop wakes up to flush the cleanup even when no admitted requests are running. Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: Dao Le <Dao007forever@gmail.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: Dao007forever <dao007forever@gmail.com>

liuzijing2014 · 2026-05-05T19:16:58Z

Q: what's the benefit vs setting VLLM_NIXL_ABORT_REQUEST_TIMEOUT=60s and let prefill naturally timeout? 60s feel fine for edge case handling (there are many cases where decode could fail and never fetch from prefill e.g. engine level kv allocation failure).

Dao007forever · 2026-05-05T19:30:21Z

60s is a very large in our tests, that reduced the throughput significantly as the concurrency was only ~4-5 and 1 hanging request affect ~25% concurrency.

njhill

LGTM now thanks @Dao007forever, just one more minor comment from a last review.

Signed-off-by: Dao Le <Dao007forever@gmail.com>

Dao007forever requested review from ApostaC, DarkLight1337, NickLucche, WoosukKwon, aarnphm, alexm-redhat, chaunceyjiang, heheda12345, njhill, orozery, robertgshaw2-redhat, russellb, xuechendi and ywang96 as code owners April 29, 2026 18:46

claude Bot reviewed Apr 29, 2026

View reviewed changes

mergify Bot added frontend v1 bug Something isn't working kv-connector labels Apr 29, 2026

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread vllm/entrypoints/openai/chat_completion/serving.py Outdated

Comment thread vllm/entrypoints/openai/completion/serving.py Outdated

Comment thread vllm/entrypoints/openai/responses/serving.py Outdated

Dao007forever and others added 3 commits April 30, 2026 17:51

Improve

ae55bd8

Signed-off-by: Dao Le <Dao007forever@gmail.com>

simplification wip

bf4ce90

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Dao007forever force-pushed the bug/notif-rejected branch from 1f424e4 to bf4ce90 Compare April 30, 2026 17:52

njhill added 2 commits April 30, 2026 11:53

more simplification

5044ff9

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Merge remote-tracking branch 'origin/main' into bug/notif-rejected

a8022e0

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 30, 2026

Dao007forever added 3 commits April 30, 2026 21:17

Merge branch 'main' into bug/notif-rejected

4c0084e

Merge branch 'main' into bug/notif-rejected

090443f

Signed-off-by: Dao007forever <dao007forever@gmail.com>

Merge branch 'main' into bug/notif-rejected

426abcd

Merge branch 'main' into bug/notif-rejected

88cdc9d

njhill approved these changes May 5, 2026

View reviewed changes

Comment thread vllm/v1/engine/__init__.py Outdated

Dao007forever and others added 4 commits May 6, 2026 01:09

Rename var

f2e9add

Signed-off-by: Dao Le <Dao007forever@gmail.com>

Merge branch 'main' into bug/notif-rejected

73e9b6b

Merge branch 'main' into bug/notif-rejected

4e149b7

Merge branch 'main' into bug/notif-rejected

767a1b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection to free stranded KV blocks#41269

[Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection to free stranded KV blocks#41269
Dao007forever wants to merge 13 commits intovllm-project:mainfrom
Dao007forever:bug/notif-rejected

Dao007forever commented Apr 29, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liuzijing2014 commented May 5, 2026

Uh oh!

Dao007forever commented May 5, 2026

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Dao007forever commented Apr 29, 2026

Summary

Why this is not duplicating an existing PR

Test plan

AI assistance

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liuzijing2014 commented May 5, 2026

Uh oh!

Dao007forever commented May 5, 2026

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants