[Attention] Refactor CUDA attention backend selection logic #24794

MatthewBonanni · 2025-09-13T05:58:31Z

Purpose

CudaPlatformBase.get_attention_backend_cls has gotten complex and messy over time. This PR cleans up the logic (without changing the behavior) and standardizes the interface.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-09-13T05:59:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/platforms/cuda.py

Signed-off-by: Matthew Bonanni <[email protected]>

LucasWilkinson · 2025-11-10T18:15:20Z

@MatthewBonanni how hard would it be to keep backwards compatibility between _Backend and AttentionBackendEnum for a version with a warning?

LucasWilkinson · 2025-11-10T18:34:01Z

With #26487 potential in the pipe what do we think about having a get_mla_attn_backend_cls instead of is_mla? @Yikun ?

Signed-off-by: Matthew Bonanni <[email protected]>

MatthewBonanni · 2025-11-10T18:48:50Z

@MatthewBonanni how hard would it be to keep backwards compatibility between _Backend and AttentionBackendEnum for a version with a warning?

@LucasWilkinson done in d0f4698

NickLucche · 2025-11-10T18:55:00Z

Discussed offline thanks for the work @MatthewBonanni !

wangxiyuan · 2025-11-11T10:56:23Z

vllm/attention/backends/registry.py

+        return AttentionBackendEnum[name]
+
+
+class _Backend(metaclass=_BackendMeta):


Nice change

mgoin

Release has been cut, let's go for it on main

hmellor · 2025-11-11T13:44:53Z

The merge commit of this PR failed pre-commit because the base of the branch was out of date

…ject#24794) Signed-off-by: Matthew Bonanni <[email protected]> Signed-off-by: Matthew Bonanni <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: leo-pony <[email protected]>

…ject#24794) Signed-off-by: Matthew Bonanni <[email protected]> Signed-off-by: Matthew Bonanni <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

mergify bot added rocm Related to AMD ROCm speculative-decoding v1 tpu Related to Google TPUs labels Sep 13, 2025

mergify bot added the needs-rebase label Sep 13, 2025

njhill reviewed Sep 13, 2025

View reviewed changes

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

MatthewBonanni changed the title ~~[Attention] Refactor CUDA attention backend selection logic~~ [WIP][Attention] Refactor CUDA attention backend selection logic Sep 13, 2025

MatthewBonanni changed the title ~~[WIP][Attention] Refactor CUDA attention backend selection logic~~ [Attention] Refactor CUDA attention backend selection logic Sep 16, 2025

MatthewBonanni marked this pull request as ready for review September 16, 2025 13:22

MatthewBonanni requested review from LucasWilkinson, NickLucche, WoosukKwon, alexm-redhat, benchislett, bigPYJ1151, comaniac, gshtras, jikunshang, luccafong, mgoin, robertgshaw2-redhat, tdoublep, youkaichao, ywang96 and zhuohan123 as code owners September 16, 2025 13:22

mergify bot removed the needs-rebase label Sep 16, 2025

MatthewBonanni requested review from sighingnow, tlrmchlsmth and yewentao256 as code owners September 17, 2025 19:23

Merge branch 'main' into backend_selection_refactor

58e0639

Signed-off-by: Matthew Bonanni <[email protected]>

MatthewBonanni requested a review from tjtanaa as a code owner November 10, 2025 17:31

LucasWilkinson mentioned this pull request Nov 10, 2025

[CI/Test Fix] Fix CP tests on Blackwell #28404

Merged

add _Backend backward compatibility

d0f4698

Signed-off-by: Matthew Bonanni <[email protected]>

wangxiyuan approved these changes Nov 11, 2025

View reviewed changes

vllm/attention/backends/registry.py

return AttentionBackendEnum[name]

class _Backend(metaclass=_BackendMeta):

Copy link

Contributor

wangxiyuan Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change

mergify bot added the nvidia label Nov 11, 2025

github-project-automation bot added this to NVIDIA Nov 11, 2025

mgoin approved these changes Nov 11, 2025

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Nov 11, 2025

mgoin merged commit b30dfa0 into vllm-project:main Nov 11, 2025
68 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Nov 11, 2025

github-project-automation bot moved this from Ready to Done in NVIDIA Nov 11, 2025

github-project-automation bot moved this to Done in Structured Output Nov 11, 2025

MatthewBonanni deleted the backend_selection_refactor branch November 11, 2025 14:29

MatthewBonanni mentioned this pull request Nov 11, 2025

[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device #26487

Merged

5 tasks

hl475 mentioned this pull request Nov 12, 2025

[CI Failure] Fix backend selection for encoder-only models #28534

Merged

5 tasks

mgoin mentioned this pull request Nov 12, 2025

[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support #28561

Merged

5 tasks

njhill mentioned this pull request Nov 12, 2025

[BugFix] Fix mm_encoder_attn_backend arg type checking #28599

Merged

MatthewBonanni mentioned this pull request Nov 13, 2025

[Attention][Bugfix] Fix FA sink support #28660

Merged

5 tasks

leo-pony added a commit to 22dimensions/vllm-ascend that referenced this pull request Nov 19, 2025

vllm break of PR:vllm-project/vllm#24794

111f445

Signed-off-by: leo-pony <[email protected]>

NickLucche mentioned this pull request Nov 20, 2025

[Attention] Refactor FA block_size limitations to hybrid models only #29084

Merged

wangxiyuan pushed a commit to wangxiyuan/vllm-ascend that referenced this pull request Nov 24, 2025

vllm break of PR:vllm-project/vllm#24794

1fe271a

Signed-off-by: leo-pony <[email protected]>

wangxiyuan pushed a commit to wangxiyuan/vllm-ascend that referenced this pull request Nov 24, 2025

vllm break of PR:vllm-project/vllm#24794

58303c8

Signed-off-by: leo-pony <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Attention] Refactor CUDA attention backend selection logic #24794

[Attention] Refactor CUDA attention backend selection logic #24794

Uh oh!

MatthewBonanni commented Sep 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Sep 13, 2025

Uh oh!

Uh oh!

LucasWilkinson commented Nov 10, 2025

Uh oh!

LucasWilkinson commented Nov 10, 2025

Uh oh!

MatthewBonanni commented Nov 10, 2025

Uh oh!

NickLucche commented Nov 10, 2025

Uh oh!

wangxiyuan Nov 11, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

hmellor commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

		return AttentionBackendEnum[name]


		class _Backend(metaclass=_BackendMeta):

Uh oh!

[Attention] Refactor CUDA attention backend selection logic #24794

[Attention] Refactor CUDA attention backend selection logic #24794

Uh oh!

Conversation

MatthewBonanni commented Sep 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Sep 13, 2025

Uh oh!

Uh oh!

LucasWilkinson commented Nov 10, 2025

Uh oh!

LucasWilkinson commented Nov 10, 2025

Uh oh!

MatthewBonanni commented Nov 10, 2025

Uh oh!

NickLucche commented Nov 10, 2025

Uh oh!

wangxiyuan Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hmellor commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

MatthewBonanni commented Sep 13, 2025 •

edited by github-actions bot

Loading