Support prefix cache hits in unified prefill path by Kingwl · Pull Request #185 · vllm-project/vllm-metal

Kingwl · 2026-03-20T18:17:37Z

This PR fixes a crash in the Metal unified paged prefill path when vLLM core prefix caching schedules a new request with num_computed_tokens > 0.

The root cause was that the unified path only tracked the current prefill chunk, but not the full request tokens, and it also assumed start_pos == 0 for complete prefills on new requests. This PR updates the model runner to carry both chunk tokens and full request tokens through the unified prefill flow, correctly updates paged sequence lengths for non-zero start_pos, and removes the incorrect assertion for new complete prefills with cached prefixes.

With this change, the long-context sonnet serve benchmark that previously failed with AssertionError: new complete prefill with start_pos > 0 not supported now completes successfully with prefix caching enabled.

Fixed #184

Signed-off-by: kingwl <kingwenlu@gmail.com>

WindChimeRan · 2026-03-20T20:40:50Z

Thanks for the quick catch!

explicitly disable incompatible vLLM core prefix caching when this path is active.

I think we should explicitly disable prefix caching for now, because we have to and will support prefix caching on the paged path.

But prefix caching is not an one-day job. Please join the discussion in #182

Will take a closer look on your other fixes soon!

This PR temporarily disables vLLM core prefix caching on Metal when paged attention is enabled. As #185 's comments Fixed #184 Signed-off-by: kingwl <kingwenlu@gmail.com>

Support prefix cache hits in unified prefill path

046cb31

Signed-off-by: kingwl <kingwenlu@gmail.com>

Kingwl closed this Mar 21, 2026

Kingwl mentioned this pull request Mar 21, 2026

Temporarily disable prefix caching on Metal #187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support prefix cache hits in unified prefill path#185

Support prefix cache hits in unified prefill path#185
Kingwl wants to merge 1 commit intovllm-project:mainfrom
Kingwl:fix/prefix-cache-engine-crash

Kingwl commented Mar 20, 2026

Uh oh!

WindChimeRan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Kingwl commented Mar 20, 2026

Uh oh!

WindChimeRan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants