[V1][Scheduler] Reject impossible waiting requests that exceed KV capacity by Bortlesboat · Pull Request #39828 · vllm-project/vllm

Bortlesboat · 2026-04-14T20:12:20Z

Summary

reject waiting requests that cannot fit even in an empty KV cache
continue scheduling later waiting requests instead of leaving the waiting queue stuck behind an impossible request
emit an explicit scheduler error output for the rejected request
add regression coverage for both scheduler_reserve_full_isl=True and scheduler_reserve_full_isl=False

Root cause

A waiting request whose prompt could never fit inside the engine's full KV cache stayed at the head of the waiting queue. The scheduler broke out of the waiting loop without rejecting that request, so later schedulable requests never got a chance to run.

Fixes #39734.

Why this is not duplicate work

checked the current discussion on issue #39734 on April 14, 2026
searched open PRs for 39734 and for KV cache capacity scheduler deadlock
did not find an existing open fix covering this waiting-queue rejection path

Testing

PYTHONPATH=$PWD .venv/bin/python -m pytest tests/v1/core/test_scheduler.py -k "test_schedule_rejects_waiting_request_exceeding_kv_capacity or test_schedule or test_stop_via_update_from_output" -v
git diff --check

AI assistance

This change was prepared with AI assistance, then reviewed and validated locally before submission.

Signed-off-by: Andrew <andre@Andrews.localdomain>

gemini-code-assist

Code Review

This pull request introduces logic to identify and reject requests that exceed the total KV cache capacity of the model. It adds a check in the scheduler to determine if a request can fit in an empty cache; if not, the request is finished with an error status and a descriptive stop reason. The changes include refactoring the KVCacheManager to support this check, updating the Scheduler to handle rejected requests via a new pending_outputs buffer, and adding a comprehensive test case to verify this behavior. I have no feedback to provide.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

ivanium

LGTM in general. Left a minor comment on nits.

One thing to note is that the root cause is still that the engine should auto-fit max_model_len within the kv cache space, while our current kv cache usage estimation is inaccurate for SWA and mamba state.

Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

njhill · 2026-04-30T04:40:06Z

Thanks @Bortlesboat, but I think we might not want this change. The check that is done is known up-front and so should be used to reject the request before it reaches the scheduler (making sure the effective max_model_len is correct and will always fit in an empty kvcache).

And that should be the case now that #40946 and #41069 are merged.

So it doesn't make sense to add complexity to the scheduler for this imo.

njhill · 2026-04-30T04:41:15Z

@Bortlesboat of course please let us know if you can reproduce the issue on the latest main branch.

Bortlesboat · 2026-04-30T17:51:20Z

Closing — agreed. Read through #40946 and #41069: the SWA admission-gate fix and num_gpu_blocks_override accounting handle this at the right layer (engine validation, not scheduler). The scheduler-side guard here would have masked the real divergence between startup pool sizing and runtime admission that #40946 fixes. Thanks for the pointer.

Reject

f17995a

Signed-off-by: Andrew <andre@Andrews.localdomain>

mergify Bot added the v1 label Apr 14, 2026

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

he-yufeng mentioned this pull request Apr 16, 2026

[Bug]: Scheduler deadlocks when request exceeds KV cache capacity but is within max_model_len #39734

Closed

1 task

Bortlesboat changed the title ~~[codex] Reject impossible waiting requests that exceed KV capacity~~ [V1][Scheduler] Reject impossible waiting requests that exceed KV capacity Apr 20, 2026

Bortlesboat marked this pull request as ready for review April 20, 2026 04:45

Bortlesboat requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners April 20, 2026 04:45

claude Bot reviewed Apr 20, 2026

View reviewed changes

Merge branch 'main' into codex/vllm-scheduler-kv-capacity-39734

42afa03

ivanium reviewed Apr 26, 2026

View reviewed changes

Comment thread vllm/v1/core/sched/scheduler.py Outdated

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 26, 2026

Bortlesboat added 3 commits April 29, 2026 14:44

Address scheduler KV capacity review nit

fa7c643

Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

Merge branch 'main' into codex/vllm-scheduler-kv-capacity-39734

1240dba

Handle missing pending scheduler outputs

6e29a8c

Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

Bortlesboat closed this Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V1][Scheduler] Reject impossible waiting requests that exceed KV capacity#39828

[V1][Scheduler] Reject impossible waiting requests that exceed KV capacity#39828
Bortlesboat wants to merge 5 commits intovllm-project:mainfrom
Bortlesboat:codex/vllm-scheduler-kv-capacity-39734

Bortlesboat commented Apr 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

ivanium left a comment •

edited

Loading

Uh oh!

Uh oh!

njhill commented Apr 30, 2026

Uh oh!

njhill commented Apr 30, 2026

Uh oh!

Bortlesboat commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Bortlesboat commented Apr 14, 2026

Summary

Root cause

Why this is not duplicate work

Testing

AI assistance

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

ivanium left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill commented Apr 30, 2026

Uh oh!

njhill commented Apr 30, 2026

Uh oh!

Bortlesboat commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ivanium left a comment •

edited

Loading