Skip to content

Conversation

@MatthewBonanni
Copy link
Contributor

@MatthewBonanni MatthewBonanni commented Sep 13, 2025

Purpose

CudaPlatformBase.get_attention_backend_cls has gotten complex and messy over time. This PR cleans up the logic (without changing the behavior) and standardizes the interface.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added rocm Related to AMD ROCm speculative-decoding v1 tpu Related to Google TPUs labels Sep 13, 2025
@mergify
Copy link

mergify bot commented Sep 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 13, 2025
@MatthewBonanni MatthewBonanni changed the title [Attention] Refactor CUDA attention backend selection logic [WIP][Attention] Refactor CUDA attention backend selection logic Sep 13, 2025
@MatthewBonanni MatthewBonanni changed the title [WIP][Attention] Refactor CUDA attention backend selection logic [Attention] Refactor CUDA attention backend selection logic Sep 16, 2025
@MatthewBonanni MatthewBonanni marked this pull request as ready for review September 16, 2025 13:22
@mergify mergify bot removed the needs-rebase label Sep 16, 2025
@LucasWilkinson
Copy link
Collaborator

@MatthewBonanni how hard would it be to keep backwards compatibility between _Backend and AttentionBackendEnum for a version with a warning?

@LucasWilkinson
Copy link
Collaborator

With #26487 potential in the pipe what do we think about having a get_mla_attn_backend_cls instead of is_mla? @Yikun ?

@MatthewBonanni
Copy link
Contributor Author

@MatthewBonanni how hard would it be to keep backwards compatibility between _Backend and AttentionBackendEnum for a version with a warning?

@LucasWilkinson done in d0f4698

@NickLucche
Copy link
Collaborator

Discussed offline thanks for the work @MatthewBonanni !

return AttentionBackendEnum[name]


class _Backend(metaclass=_BackendMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release has been cut, let's go for it on main

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Nov 11, 2025
@mgoin mgoin merged commit b30dfa0 into vllm-project:main Nov 11, 2025
68 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Nov 11, 2025
@hmellor
Copy link
Member

hmellor commented Nov 11, 2025

The merge commit of this PR failed pre-commit because the base of the branch was out of date

@MatthewBonanni MatthewBonanni deleted the backend_selection_refactor branch November 11, 2025 14:29
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
…ject#24794)

Signed-off-by: Matthew Bonanni <[email protected]>
Signed-off-by: Matthew Bonanni <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
leo-pony added a commit to 22dimensions/vllm-ascend that referenced this pull request Nov 19, 2025
wangxiyuan pushed a commit to wangxiyuan/vllm-ascend that referenced this pull request Nov 24, 2025
wangxiyuan pushed a commit to wangxiyuan/vllm-ascend that referenced this pull request Nov 24, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…ject#24794)

Signed-off-by: Matthew Bonanni <[email protected]>
Signed-off-by: Matthew Bonanni <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector multi-modality Related to multi-modality (#4194) new-model Requests to new models nvidia performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding structured-output tpu Related to Google TPUs v1

Projects

Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

9 participants