[fix][cpu] fix prefill attention in CPU attention backend #27035

fadara01 · 2025-10-16T16:04:31Z

[fix][cpu] fix prefill attention in CPU attention backend

Disables prefix caching because prefill attention can't handle paged KV cache
Fixes Q/K/V used during prefill on mixed prefill/decode requests

Purpose

Fixes #27034

Test Plan

test script attached to #27034

Test Result

Output of test script attached to #27034 is same when prompts are batched and when prompts are ran one at a time

Essential Elements of an Effective PR Description Checklist

[Y] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[Y] The test plan, such as providing test command.
[Y] The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

fadara01 · 2025-10-16T16:05:11Z

Hi @bigPYJ1151 - would you be able to review this please?

- Disables prefix caching because prefill attention can't handle paged KV cache - Fixes Q/K/V used during prefill on mixed prefill/decode requests Signed-off-by: Fadi Arafeh <[email protected]>

gemini-code-assist

Code Review

This pull request introduces fixes for prefill attention in the CPU attention backend. The changes correctly handle mixed prefill/decode requests by adjusting the starting indices for Q/K/V tensors and correctly slicing sequence lengths for prefill requests. Additionally, it disables prefix caching on certain CPU architectures where it is not supported. The changes are logical, well-implemented, and address the described issues effectively. I have no major concerns.

LucasWilkinson · 2025-10-17T15:40:46Z

@bigPYJ1151 do you think you can help look at this? not that well versed in the CPU backend

bigPYJ1151 · 2025-10-18T09:15:43Z

After some tests I found even set enable_chunked_prefill=False, the attention backend still got mixed batches. It's different from V0.
This PR looks reasonable and fixed the bug. Please fix the failed pre-commit check, thanks :)

Signed-off-by: Fadi Arafeh <[email protected]>

fadara01 · 2025-10-18T09:41:22Z

Thanks for your review @bigPYJ1151
pre-commit is passing now.

bigPYJ1151

tests passed https://buildkite.com/vllm/fastcheck/builds/44835#0199f6d7-07d7-464e-bc52-158fb48ecf83

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

fadara01 requested a review from LucasWilkinson as a code owner October 16, 2025 16:04

mergify bot added the v1 label Oct 16, 2025

[fix][cpu] fix prefill attention in CPU attention backend

86151e9

- Disables prefix caching because prefill attention can't handle paged KV cache - Fixes Q/K/V used during prefill on mixed prefill/decode requests Signed-off-by: Fadi Arafeh <[email protected]>

fadara01 force-pushed the fix_attention branch from 78e0cf1 to 86151e9 Compare October 16, 2025 16:13

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

Merge branch 'main' into fix_attention

a44d191

LucasWilkinson requested a review from bigPYJ1151 October 17, 2025 15:41

fadara01 and others added 2 commits October 18, 2025 10:21

Merge branch 'main' into fix_attention

d3781d2

fix formatting

bf57438

Signed-off-by: Fadi Arafeh <[email protected]>

fadara01 force-pushed the fix_attention branch from 35dfc81 to bf57438 Compare October 18, 2025 09:35

bigPYJ1151 approved these changes Oct 18, 2025

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) October 18, 2025 11:29

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 18, 2025

Merge branch 'main' into fix_attention

243b9aa

bigPYJ1151 merged commit ab4be40 into vllm-project:main Oct 18, 2025
50 checks passed

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

9ae193f

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

adabeyta pushed a commit to adabeyta/vllm that referenced this pull request Oct 20, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

43b53c8

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

cfaf133

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

961fb6f

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

b4aa02d

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

635e0f1

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]> Signed-off-by: 0xrushi <[email protected]>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

ba586ff

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

b3a86c7

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

551c581

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[fix][cpu] fix prefill attention in CPU attention backend (vllm-proje…

00aca83

…ct#27035) Signed-off-by: Fadi Arafeh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[fix][cpu] fix prefill attention in CPU attention backend #27035

[fix][cpu] fix prefill attention in CPU attention backend #27035

Uh oh!

fadara01 commented Oct 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

fadara01 commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

LucasWilkinson commented Oct 17, 2025

Uh oh!

bigPYJ1151 commented Oct 18, 2025

Uh oh!

fadara01 commented Oct 18, 2025

Uh oh!

bigPYJ1151 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[fix][cpu] fix prefill attention in CPU attention backend #27035

[fix][cpu] fix prefill attention in CPU attention backend #27035

Uh oh!

Conversation

fadara01 commented Oct 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

fadara01 commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

LucasWilkinson commented Oct 17, 2025

Uh oh!

bigPYJ1151 commented Oct 18, 2025

Uh oh!

fadara01 commented Oct 18, 2025

Uh oh!

bigPYJ1151 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fadara01 commented Oct 16, 2025 •

edited by github-actions bot

Loading