Skip to content

Temporarily disable prefix caching on Metal#187

Merged
WindChimeRan merged 1 commit intovllm-project:mainfrom
Kingwl:fix/prefix-cache-engine-crash-disable-prefix-cache
Mar 21, 2026
Merged

Temporarily disable prefix caching on Metal#187
WindChimeRan merged 1 commit intovllm-project:mainfrom
Kingwl:fix/prefix-cache-engine-crash-disable-prefix-cache

Conversation

@Kingwl
Copy link
Copy Markdown
Contributor

@Kingwl Kingwl commented Mar 21, 2026

This PR temporarily disables vLLM core prefix caching on Metal when paged attention is enabled.

As #185 's comments
Fixed #184

Signed-off-by: kingwl <kingwenlu@gmail.com>
Copy link
Copy Markdown
Collaborator

@WindChimeRan WindChimeRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two non-blocking questions:

  1. For dev purposes, should we add an env var override (e.g. VLLM_METAL_FORCE_PREFIX_CACHING=1) so we can test the prefix caching path while it's disabled by default? We'll need to develop this soon. (This is an open question. Maybe you have better idea)

  2. The closed #185 had some bookkeeping fixes that look independent of the prefix caching issue. Would you consider splitting those out into a separate PR? I haven't verified them in detail yet, but they seem like useful prep work.

@WindChimeRan WindChimeRan merged commit c8d2715 into vllm-project:main Mar 21, 2026
5 checks passed
@Kingwl
Copy link
Copy Markdown
Contributor Author

Kingwl commented Mar 21, 2026

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Engine crashes when prefix caching hits new complete prefill in Metal paged unified path

2 participants