[ROCm] Attention selector reordering#36702
[ROCm] Attention selector reordering#36702gshtras wants to merge 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
Documentation preview: https://vllm--36702.org.readthedocs.build/en/36702/ |
There was a problem hiding this comment.
Code Review
This pull request refactors the attention backend selection for ROCm to prioritize the ROCM_ATTN backend, which is now considered the most performant. It also removes the VLLM_ROCM_CUSTOM_PAGED_ATTN environment variable, simplifying configuration. As part of these changes, ROCM_ATTN now correctly reports that it does not support attention sinks, ensuring that more suitable backends like AITER_UNIFIED are chosen when sinks are required. My review identifies a potential issue with the new backend priority order which may not align with the intended logic.
|
This pull request has merge conflicts that must be resolved before it can be |
|
AMD CI build with this PR to compare against nightly: https://buildkite.com/vllm/amd-ci/builds/5975/steps/canvas |
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
With the unit tests now able to handle this change following
#36025 #35334 and others
Changing the priorities of ROCm attention backends to
Additionally, even though ROCM_ATTN supports sinks, it would fall back from the custom attention HIP kernel to a triton implementation, so even though technically the support is there, changing the backend to report that it doesn't have support. Actual triton backends (aiter and unified) are better.
Removing VLLM_ROCM_CUSTOM_PAGED_ATTN. If ROCM_ATTN is selected, the idea is to use this kernel anyway.
As a bonus, fixing the AITER supported condition. AITER is not built for and doesn't support gfx90a