Skip to content

[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA#28424

Merged
DarkLight1337 merged 3 commits intovllm-project:mainfrom
zhewenl:refactor-test-prefix-prefill
Nov 11, 2025
Merged

[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA#28424
DarkLight1337 merged 3 commits intovllm-project:mainfrom
zhewenl:refactor-test-prefix-prefill

Conversation

@zhewenl
Copy link
Copy Markdown
Collaborator

@zhewenl zhewenl commented Nov 10, 2025

Purpose

  1. This PR will help [Core] Remove xformers dependency #28287 (comment) by changing the ground truth for attention backend from xformers to pytorch SDPA
  2. Fixes some incompatibilities on AMD:
    a. The ROCm paged attention kernel expects 32-bit int tensors, but the test passes 64-bit torch.long tensors.
    b. ROCm paged attention kernel only supports auto, fp8, and fp8_e4m3 KV cache dtypes.

Test Plan

H100(https://gist.github.com/zhewenl/5ada1e5f360c4d230bc8b6eff47effcc):

pytest -s -v 'tests/kernels/attention/test_prefix_prefill.py'
...
================================================================== warnings summary ==================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 192 passed, 192 skipped, 2 warnings in 208.65s (0:03:28) ==============================================

MI300 (failing with some numeric issues, will track in a separate issue/PR): https://gist.github.com/zhewenl/3224057e57aad300341c8a0d66bd9878

pytest -s -v 'tests/kernels/attention/test_prefix_prefill.py'
...
================================================================= short test summary info ==================================================================
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-auto-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-fp8-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-auto-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!
FAILED tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-fp8-dtype0-128-1-64] - AssertionError: Tensor-likes are not close!

Results (299.93s (0:04:59)):
     156 passed
       4 failed
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-auto-dtype0-128-1-64]
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:0-fp8-dtype0-128-1-64]
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-auto-dtype0-128-1-64]
         - tests/kernels/attention/test_prefix_prefill.py:101 test_contexted_kv_attention[chunked_prefill_paged_decode-0-cuda:1-fp8-dtype0-128-1-64]
     224 skipped

Signed-off-by: zhewenli <zhewenli@meta.com>
@zhewenl zhewenl force-pushed the refactor-test-prefix-prefill branch from 7553a5f to 5fa92e6 Compare November 11, 2025 05:03
@zhewenl zhewenl changed the title [WIP] Refactor test prefix prefill with SDPA [CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA Nov 11, 2025
@zhewenl zhewenl marked this pull request as ready for review November 11, 2025 05:13
@zhewenl zhewenl mentioned this pull request Nov 11, 2025
5 tasks
@DarkLight1337
Copy link
Copy Markdown
Member

cc @ywang96

@ywang96 ywang96 self-assigned this Nov 11, 2025
@ywang96
Copy link
Copy Markdown
Member

ywang96 commented Nov 11, 2025

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the attention backend for test_prefix_prefill from xformers to PyTorch's Scaled Dot Product Attention (SDPA), which is a great move for standardization and potentially performance. The changes also include important compatibility fixes for ROCm, such as using int32 for certain tensors and skipping unsupported fp8_e5m2 configurations. The implementation of the SDPA reference is well-structured. I've included one suggestion to improve the performance of the newly added test utility function for creating attention masks, which should help reduce the overall test execution time.

@ywang96
Copy link
Copy Markdown
Member

ywang96 commented Nov 11, 2025

b6f852f saves CI about 10 seconds (better than nothing I guess)

Copy link
Copy Markdown
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025
@DarkLight1337 DarkLight1337 merged commit e553424 into vllm-project:main Nov 11, 2025
20 checks passed
@zhewenl zhewenl deleted the refactor-test-prefix-prefill branch November 11, 2025 17:10
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…ormers to SDPA (vllm-project#28424)

Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants