Skip to content

[ROCm] Attention selector reordering#36702

Open
gshtras wants to merge 4 commits intovllm-project:mainfrom
ROCm:rocm_attn_revisit
Open

[ROCm] Attention selector reordering#36702
gshtras wants to merge 4 commits intovllm-project:mainfrom
ROCm:rocm_attn_revisit

Conversation

@gshtras
Copy link
Collaborator

@gshtras gshtras commented Mar 10, 2026

With the unit tests now able to handle this change following
#36025 #35334 and others
Changing the priorities of ROCm attention backends to

  1. ROCM_ATTN - when applicable it is the most performant backend today
  2. AITER_MHA - when explicitly selected
  3. AITER_UNIFIED - a variation of TRITON_ATTN, specifically tuned for ROCm. Will fall back here when the model requires sinks (GPT-OSS)
  4. TRITON_ATTN - the most versatile and as of now the least performant backend

Additionally, even though ROCM_ATTN supports sinks, it would fall back from the custom attention HIP kernel to a triton implementation, so even though technically the support is there, changing the backend to report that it doesn't have support. Actual triton backends (aiter and unified) are better.

Removing VLLM_ROCM_CUSTOM_PAGED_ATTN. If ROCM_ATTN is selected, the idea is to use this kernel anyway.

As a bonus, fixing the AITER supported condition. AITER is not built for and doesn't support gfx90a

gshtras added 2 commits March 10, 2026 20:08
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
@mergify
Copy link

mergify bot commented Mar 10, 2026

Documentation preview: https://vllm--36702.org.readthedocs.build/en/36702/

@mergify mergify bot added documentation Improvements or additions to documentation ci/build rocm Related to AMD ROCm v1 labels Mar 10, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 10, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the attention backend selection for ROCm to prioritize the ROCM_ATTN backend, which is now considered the most performant. It also removes the VLLM_ROCM_CUSTOM_PAGED_ATTN environment variable, simplifying configuration. As part of these changes, ROCM_ATTN now correctly reports that it does not support attention sinks, ensuring that more suitable backends like AITER_UNIFIED are chosen when sinks are required. My review identifies a potential issue with the new backend priority order which may not align with the intended logic.

@mergify
Copy link

mergify bot commented Mar 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @gshtras.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 11, 2026
@micah-wil
Copy link
Contributor

AMD CI build with this PR to compare against nightly: https://buildkite.com/vllm/amd-ci/builds/5975/steps/canvas
Everything checks out, so this change should be good to merge.

@gshtras gshtras marked this pull request as ready for review March 12, 2026 20:56
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
@mergify mergify bot removed the needs-rebase label Mar 12, 2026
@mergify
Copy link

mergify bot commented Mar 16, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @gshtras.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 16, 2026
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
@mergify mergify bot removed the needs-rebase label Mar 17, 2026
@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants