[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform by Isotr0py · Pull Request #30212 · vllm-project/vllm

Isotr0py · 2025-12-07T15:41:14Z

Purpose

Currently, there are too many arguments used by platform's get_attn_backend_cls, while not all platforms use all of them. And it will also easily break OOT platform when introduce attention feature with new argument like use_mla and use_sink.
To avoid this kind of mess and breakage, this PR wrap platform's attention selection arguments into hashable AttentionSelectorConfig, so that platform can use these arguments on demand and no longer need to update interface for each feature update.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini-code-assist

Code Review

This pull request refactors the attention backend selection mechanism by introducing an AttentionSelectorConfig NamedTuple. This new configuration object encapsulates various attention parameters such as head size, data type, KV cache data type, block size, MLA usage, sink token presence, sparse attention usage, and attention type. The get_attn_backend and _cached_get_attn_backend functions in vllm/attention/selector.py are updated to create and pass this single config object instead of multiple individual arguments. This change propagates throughout the platform-specific attention backend selection logic in vllm/platforms/cpu.py, vllm/platforms/cuda.py, and vllm/platforms/interface.py, where relevant methods like get_attn_backend_cls and get_valid_backends are modified to accept and utilize the AttentionSelectorConfig object, simplifying their signatures and improving parameter management. Additionally, logging for attention configurations is updated to use the __repr__ method of the new config object.

mergify · 2025-12-07T15:54:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

chatgpt-codex-connector · 2025-12-09T06:39:19Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Isotr0py · 2025-12-09T06:39:42Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the attention backend selection by introducing AttentionSelectorConfig to encapsulate the configuration parameters. This is a great improvement as it simplifies the function signatures in platform-specific modules and makes the interface more stable for out-of-tree platforms. The changes are applied consistently across all relevant files. I've found one minor issue with an incomplete __repr__ implementation which could affect debugging.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com>

mergify · 2025-12-09T13:31:11Z

Hi @Isotr0py, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

DarkLight1337

LGTM, thanks for cleaning this up! cc @tjtanaa

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py · 2025-12-15T15:24:39Z

-                    head_size,
-                    dtype,
-                    kv_cache_dtype,
-                    None,


Just found that block_size is set to None here to bypass attention backend validation for state space model, otherwise the validation will fail:

[2025-12-15T14:12:44Z] models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM] The fast path for Bamba will be used when running the model on a GPU ... [2025-12-15T14:13:30Z] (EngineCore_DP0 pid=2108) ERROR 12-15 06:13:30 [core.py:866] ValueError: No valid attention backend found for cuda with AttentionSelectorConfig(head_size=64, dtype=torch.bfloat16, kv_cache_dtype=auto, block_size=48, use_mla=False, has_sink=False, use_sparse=False, use_mm_prefix=False, attn_type=decoder). Reasons: {FLASH_ATTN: [block_size not supported], FLASHINFER: [block_size not supported], TRITON_ATTN: [block_size not supported], FLEX_ATTENTION: [block_size not supported]}.

Perhaps we need to update FlashAttention backend's get_supported_kernel_block_sizes for state space model? @tdoublep

…akpoint for OOT platform (vllm-project#30212) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>

…os_emb (#725) Fix for vllm-project/vllm#30212 + cherry pick #724 --------- Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>

### What this PR does / why we need it? Upstream vLLM PR #30212 vllm-project/vllm#30212 refactored the attention backend selection interface, This PR adapts vllm-ascend's get_attn_backend_cls to align with the new upstream standard, ensuring compatibility and reducing maintenance overhead. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? co-author:[leo-pony][nengjunma@outlook.com](mailto:nengjunma@outlook.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: zxwang <1476209578@qq.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

…os_emb (vllm-project#725) Fix for vllm-project/vllm#30212 + cherry pick vllm-project#724 --------- Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai> Signed-off-by: lvkaokao <kaokao.lv@intel.com>

### What this PR does / why we need it? Upstream vLLM PR #30212 vllm-project/vllm#30212 refactored the attention backend selection interface, This PR adapts vllm-ascend's get_attn_backend_cls to align with the new upstream standard, ensuring compatibility and reducing maintenance overhead. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? co-author:[leo-pony][nengjunma@outlook.com](mailto:nengjunma@outlook.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: zxwang <1476209578@qq.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

…akpoint for OOT platform (vllm-project#30212) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…os_emb (vllm-project#725) Fix for vllm-project/vllm#30212 + cherry pick vllm-project#724 --------- Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>

### What this PR does / why we need it? Upstream vLLM PR #30212 vllm-project/vllm#30212 refactored the attention backend selection interface, This PR adapts vllm-ascend's get_attn_backend_cls to align with the new upstream standard, ensuring compatibility and reducing maintenance overhead. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? co-author:[leo-pony][nengjunma@outlook.com](mailto:nengjunma@outlook.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: zxwang <1476209578@qq.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Isotr0py added 2 commits December 7, 2025 23:17

init

cf2108d

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

update

83b1f8c

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify Bot added the nvidia label Dec 7, 2025

github-project-automation Bot added this to NVIDIA Dec 7, 2025

gemini-code-assist Bot reviewed Dec 7, 2025

View reviewed changes

Isotr0py mentioned this pull request Dec 7, 2025

[v1] Add PrefixLM support to FlexAttention backend #27938

Merged

5 tasks

mergify Bot added the needs-rebase label Dec 7, 2025

Merge remote-tracking branch 'upstream/main' into refactor-attn-select

1e7ace7

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify Bot removed the needs-rebase label Dec 7, 2025

update remain platforms

ef7b1b2

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify Bot added rocm Related to AMD ROCm tpu Related to Google TPUs labels Dec 7, 2025

Isotr0py changed the title ~~[Draft][Platform] Refactor Platform attention backend selection to avoid braekpoint for OOT platform~~ [Draft][Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform Dec 8, 2025

Isotr0py added 3 commits December 9, 2025 09:25

make mypy happy

4081af3

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix mypy

f3f61fc

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge branch 'main' into refactor-attn-select

a9978c0

Isotr0py marked this pull request as ready for review December 9, 2025 06:39

Isotr0py requested review from LucasWilkinson, NickLucche, bigPYJ1151, jikunshang and tjtanaa as code owners December 9, 2025 06:39

Isotr0py changed the title ~~[Draft][Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform~~ [Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform Dec 9, 2025

gemini-code-assist Bot reviewed Dec 9, 2025

View reviewed changes

Comment thread vllm/attention/selector.py

Update vllm/attention/selector.py

9feaaf3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com>

code format

d15bdef

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py added this to the v0.13.0 milestone Dec 15, 2025

DarkLight1337 mentioned this pull request Dec 15, 2025

[Feature][Attention][UX]: Incorporate Features into Attention Selection #30654

Closed

1 task

DarkLight1337 approved these changes Dec 15, 2025

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Dec 15, 2025

DarkLight1337 enabled auto-merge (squash) December 15, 2025 13:14

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 15, 2025

fix test with block_size=None

18ed91a

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py commented Dec 15, 2025

View reviewed changes

DarkLight1337 merged commit ec154c3 into vllm-project:main Dec 15, 2025
48 of 49 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Dec 15, 2025

karan mentioned this pull request Dec 15, 2025

Use AttentionSelectorConfig in get_attn_backend_cls vllm-project/tpu-inference#1313

Merged

Isotr0py deleted the refactor-attn-select branch December 16, 2025 00:07

pawel-olejniczak mentioned this pull request Dec 16, 2025

[FIX_FOR_VLLM_LATEST] Add attn_selector_config and fix apply_rotary_pos_emb vllm-project/vllm-gaudi#725

Merged

Toneymiller mentioned this pull request Dec 16, 2025

Upgrade vllm commit hash to 1216 vllm-project/vllm-ascend#5053

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform#30212

[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform#30212
DarkLight1337 merged 12 commits intovllm-project:mainfrom
Isotr0py:refactor-attn-select

Isotr0py commented Dec 7, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mergify Bot commented Dec 7, 2025

Uh oh!

chatgpt-codex-connector Bot commented Dec 9, 2025

Uh oh!

Isotr0py commented Dec 9, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

mergify Bot commented Dec 9, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Isotr0py Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Isotr0py commented Dec 7, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify Bot commented Dec 7, 2025

Uh oh!

chatgpt-codex-connector Bot commented Dec 9, 2025

Uh oh!

Isotr0py commented Dec 9, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify Bot commented Dec 9, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Isotr0py commented Dec 7, 2025 •

edited by github-actions Bot

Loading