scheduler: Cache also the last block after KV recving by orozery · Pull Request #32168 · vllm-project/vllm

orozery · 2026-01-12T10:56:59Z

This PR fixes the scheduler to commit the last full block of KV data that was async received.

@robertgshaw2-redhat this is modifying code you introduced in #17751.
I think it's safe to cache that last block as well, but not sure.
cc @njhill

BTW, do we really have to re-compute the last token, or can we somehow re-use the KV data that we saved for it?

Note

Ensures KV blocks are fully cached after async KV receive while preserving correct sampling behavior.

In Scheduler._update_waiting_for_remote_kv, after caching received blocks, sets num_computed_tokens to request.num_tokens - 1 when equal, so the last token is recomputed next step
Previously decremented before caching; now the last full block is cached too, improving cache commit behavior for completed blocks

^{Written by Cursor Bugbot for commit 0076065. This will update automatically on new commits. Configure here.}

This commit fixes the scheduler to commit the last full block of KV data that was async received. Signed-off-by: Or Ozeri <oro@il.ibm.com>

gemini-code-assist

Code Review

This pull request correctly fixes an issue in the scheduler where the last block of asynchronously received KV data was not being cached. By moving the decrement of num_computed_tokens to after the call to cache_blocks, you ensure that the complete KV cache for all received tokens is stored in the prefix cache. The subsequent decrement is still necessary to trigger the recomputation of the last token, which is required to generate logits for sampling the next token. This change is safe and improves caching behavior as intended.

heheda12345

For general case without kv connector, we need to recompute the last token to generate the logprobs to sample the first output token.

Any strong reason to cache the last block?

orozery · 2026-01-14T07:57:36Z

Any strong reason to cache the last block?

If you don't cache the last block you will have to recompute the entire last block, not just the last token.
I think the question should be the opposite:
Is there any reason why not to cache the last block?

orozery · 2026-03-06T08:26:41Z

Superseded by #34616.

scheduler: Cache also the last block after KV recving

0076065

This commit fixes the scheduler to commit the last full block of KV data that was async received. Signed-off-by: Or Ozeri <oro@il.ibm.com>

orozery requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners January 12, 2026 10:57

mergify bot added the v1 label Jan 12, 2026

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

heheda12345 reviewed Jan 14, 2026

View reviewed changes

orozery closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scheduler: Cache also the last block after KV recving#32168

scheduler: Cache also the last block after KV recving#32168
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:sched-cache-last-async-loaded-block

orozery commented Jan 12, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

heheda12345 left a comment

Uh oh!

orozery commented Jan 14, 2026

Uh oh!

orozery commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

orozery commented Jan 12, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

orozery commented Jan 14, 2026

Uh oh!

orozery commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orozery commented Jan 12, 2026 •

edited by github-actions bot

Loading