[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA by zhewenl · Pull Request #28424 · vllm-project/vllm

zhewenl · 2025-11-10T21:58:49Z

Purpose

This PR will help [Core] Remove xformers dependency #28287 (comment) by changing the ground truth for attention backend from xformers to pytorch SDPA
Fixes some incompatibilities on AMD:
a. The ROCm paged attention kernel expects 32-bit int tensors, but the test passes 64-bit torch.long tensors.
b. ROCm paged attention kernel only supports auto, fp8, and fp8_e4m3 KV cache dtypes.

Test Plan

H100(https://gist.github.com/zhewenl/5ada1e5f360c4d230bc8b6eff47effcc):

pytest -s -v 'tests/kernels/attention/test_prefix_prefill.py'
...
================================================================== warnings summary ==================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 192 passed, 192 skipped, 2 warnings in 208.65s (0:03:28) ==============================================

MI300 (failing with some numeric issues, will track in a separate issue/PR): https://gist.github.com/zhewenl/3224057e57aad300341c8a0d66bd9878

pytest -s -v 'tests/kernels/attention/test_prefix_prefill.py'
...
================================================================= short test summary info ==================================================================
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-auto-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-fp8-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-auto-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-fp8-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!

Results (299.93s (0:04:59)):
     156 passed
       4 failed
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-auto-dtype0-128-1-64]
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-fp8-dtype0-128-1-64]
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-auto-dtype0-128-1-64]
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-fp8-dtype0-128-1-64]
     224 skipped

Signed-off-by: zhewenli <zhewenli@meta.com>

DarkLight1337 · 2025-11-11T06:04:52Z

cc @ywang96

ywang96 · 2025-11-11T08:19:41Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the attention backend for test_prefix_prefill from xformers to PyTorch's Scaled Dot Product Attention (SDPA), which is a great move for standardization and potentially performance. The changes also include important compatibility fixes for ROCm, such as using int32 for certain tensors and skipping unsupported fp8_e5m2 configurations. The implementation of the SDPA reference is well-structured. I've included one suggestion to improve the performance of the newly added test utility function for creating attention masks, which should help reduce the overall test execution time.

tests/kernels/attention/test_prefix_prefill.py

Signed-off-by: Roger Wang <hey@rogerw.io>

ywang96 · 2025-11-11T10:18:36Z

b6f852f saves CI about 10 seconds (better than nothing I guess)

ywang96

Thanks for working on this!

…ormers to SDPA (vllm-project#28424) Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>

refactor xformers

5fa92e6

Signed-off-by: zhewenli <zhewenli@meta.com>

zhewenl force-pushed the refactor-test-prefix-prefill branch from 7553a5f to 5fa92e6 Compare November 11, 2025 05:03

zhewenl requested review from tjtanaa, ywang96 and zhuohan123 November 11, 2025 05:07

zhewenl changed the title ~~[WIP] Refactor test prefix prefill with SDPA~~ [CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA Nov 11, 2025

zhewenl marked this pull request as ready for review November 11, 2025 05:13

zhewenl requested review from WoosukKwon, mgoin, tlrmchlsmth and yewentao256 as code owners November 11, 2025 05:13

zhewenl mentioned this pull request Nov 11, 2025

[Core] Remove xformers dependency #28287

Closed

5 tasks

ywang96 self-assigned this Nov 11, 2025

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

tests/kernels/attention/test_prefix_prefill.py Show resolved Hide resolved

ywang96 added 2 commits November 11, 2025 10:17

vectorize

b6f852f

Signed-off-by: Roger Wang <hey@rogerw.io>

Merge branch 'main' into refactor-test-prefix-prefill

46ade75

ywang96 approved these changes Nov 11, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025

DarkLight1337 merged commit e553424 into vllm-project:main Nov 11, 2025
20 checks passed

zhewenl deleted the refactor-test-prefix-prefill branch November 11, 2025 17:10

zhewenl mentioned this pull request Nov 11, 2025

[CI Failure][AMD]: test_prefix_prefill failing on AMD with numeric issues #28490

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA#28424

[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA#28424
DarkLight1337 merged 3 commits intovllm-project:mainfrom
zhewenl:refactor-test-prefix-prefill

zhewenl commented Nov 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

DarkLight1337 commented Nov 11, 2025

Uh oh!

ywang96 commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ywang96 commented Nov 11, 2025

Uh oh!

ywang96 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zhewenl commented Nov 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

DarkLight1337 commented Nov 11, 2025

Uh oh!

ywang96 commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ywang96 commented Nov 11, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhewenl commented Nov 10, 2025 •

edited by github-actions bot

Loading