Use paged_attention_v1 for sliding window decode in rocm_aiter_fa by iseeyuan · Pull Request #34378 · vllm-project/vllm

iseeyuan · 2026-02-11T21:23:28Z

Summary: Replace unified_attention (Triton) with paged_attention_v1 for the sliding window decode path in AiterFlashAttentionImpl. paged_attention_v1 already supports sliding window natively via its sliding_window parameter, so this unifies the NHD decode path for both sliding window and non-sliding window cases. The sliding window value is recovered from the flash-attn convention (self.sliding_window[0] + 1), which yields 0 (disabled) when no sliding window is configured.

Test Plan: Requires ROCm GPU with a sliding window model (e.g., Mistral) to validate end-to-end. Verified that unified_attention is no longer referenced in the file.

Differential Revision: D93009177

gemini-code-assist

Code Review

This pull request refactors the sliding window decode path within the AiterFlashAttentionImpl for ROCm. By removing the separate unified_attention (Triton) kernel and instead leveraging the native sliding window support in paged_attention_v1, the changes successfully unify the decode logic for both sliding window and non-sliding window scenarios. This simplification improves code clarity and maintainability. The sliding window parameter is passed correctly, ensuring that the behavior is preserved while the implementation is streamlined. The changes appear solid and are a good improvement.

…lm-project#34378) Summary: Replace unified_attention (Triton) with paged_attention_v1 for the sliding window decode path in AiterFlashAttentionImpl. paged_attention_v1 already supports sliding window natively via its sliding_window parameter, so this unifies the NHD decode path for both sliding window and non-sliding window cases. The sliding window value is recovered from the flash-attn convention (self.sliding_window[0] + 1), which yields 0 (disabled) when no sliding window is configured. Test Plan: Requires ROCm GPU with a sliding window model (e.g., Mistral) to validate end-to-end. Verified that unified_attention is no longer referenced in the file. Differential Revision: D93009177

…lm-project#34378) Summary: Replace unified_attention (Triton) with paged_attention_v1 for the sliding window decode path in AiterFlashAttentionImpl. paged_attention_v1 already supports sliding window natively via its sliding_window parameter, so this unifies the NHD decode path for both sliding window and non-sliding window cases. The sliding window value is recovered from the flash-attn convention (self.sliding_window[0] + 1), which yields 0 (disabled) when no sliding window is configured. Test Plan: Requires ROCm GPU with a sliding window model (e.g., Mistral) to validate end-to-end. Verified that unified_attention is no longer referenced in the file. Differential Revision: D93009177 Signed-off-by: Martin Yuan <myuan@meta.com>

…lm-project#34378) Summary: Replace unified_attention (Triton) with paged_attention_v1 for the sliding window decode path in AiterFlashAttentionImpl. paged_attention_v1 already supports sliding window natively via its sliding_window parameter, so this unifies the NHD decode path for both sliding window and non-sliding window cases. The sliding window value is recovered from the flash-attn convention (self.sliding_window[0] + 1), which yields 0 (disabled) when no sliding window is configured. Test Plan: Requires ROCm GPU with a sliding window model (e.g., Mistral) to validate end-to-end. Verified that unified_attention is no longer referenced in the file. Differential Revision: D93009177

…lm-project#34378) Summary: Replace unified_attention (Triton) with paged_attention_v1 for the sliding window decode path in AiterFlashAttentionImpl. paged_attention_v1 already supports sliding window natively via its sliding_window parameter, so this unifies the NHD decode path for both sliding window and non-sliding window cases. The sliding window value is recovered from the flash-attn convention (self.sliding_window[0] + 1), which yields 0 (disabled) when no sliding window is configured. Test Plan: Requires ROCm GPU with a sliding window model (e.g., Mistral) to validate end-to-end. Verified that unified_attention is no longer referenced in the file. Differential Revision: D93009177 Signed-off-by: Martin Yuan <myuan@meta.com>

houseroad

Looks good.

AndreasKaratzas · 2026-02-15T01:12:50Z

This PR is behind a recent regression: https://buildkite.com/vllm/amd-ci/builds/4773/steps/canvas?sid=019c5af3-274c-4971-937e-636c0be82f12&tab=output

I'm working on a fix right now. But in the future let us run tests on AMD hardware before such changes :)
Btw, I'm seeing that Language Models Tests (Standard) failed in amd-ci of this very PR as well: https://buildkite.com/vllm/amd-ci/builds/4717/steps/canvas?sid=019c53b0-1e30-43af-91d7-8ede16e93052&tab=output

cc @iseeyuan @houseroad

…lm-project#34378) Signed-off-by: Martin Yuan <myuan@meta.com> Co-authored-by: Martin Yuan <myuan@meta.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

…lm-project#34378) Signed-off-by: Martin Yuan <myuan@meta.com> Co-authored-by: Martin Yuan <myuan@meta.com>

iseeyuan requested a review from tjtanaa as a code owner February 11, 2026 21:23

meta-codesync bot added fb-exported meta-exported labels Feb 11, 2026

mergify bot added rocm Related to AMD ROCm v1 labels Feb 11, 2026

github-project-automation bot added this to AMD Feb 11, 2026

github-project-automation bot moved this to Todo in AMD Feb 11, 2026

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

iseeyuan force-pushed the export-D93009177 branch from e294cea to c8eda38 Compare February 12, 2026 18:25

iseeyuan force-pushed the export-D93009177 branch from c8eda38 to b7f390f Compare February 12, 2026 18:36

iseeyuan force-pushed the export-D93009177 branch from b7f390f to 772a458 Compare February 12, 2026 21:00

iseeyuan force-pushed the export-D93009177 branch from 772a458 to 285ec8c Compare February 12, 2026 21:09

houseroad approved these changes Feb 12, 2026

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 12, 2026

zhuohan123 merged commit 9ea1f59 into vllm-project:main Feb 13, 2026
53 of 56 checks passed

github-project-automation bot moved this from Todo to Done in AMD Feb 13, 2026

AndreasKaratzas mentioned this pull request Feb 15, 2026

[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 #34570

Merged

sammysun0711 mentioned this pull request Feb 24, 2026

[ROCm] Support AITER paged attention with sliding_window #29065

Closed

5 tasks

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

Use paged_attention_v1 for sliding window decode in rocm_aiter_fa (vl…

c8d638a

…lm-project#34378) Signed-off-by: Martin Yuan <myuan@meta.com> Co-authored-by: Martin Yuan <myuan@meta.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

Use paged_attention_v1 for sliding window decode in rocm_aiter_fa (vl…

0df1090

…lm-project#34378) Signed-off-by: Martin Yuan <myuan@meta.com> Co-authored-by: Martin Yuan <myuan@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use paged_attention_v1 for sliding window decode in rocm_aiter_fa#34378

Use paged_attention_v1 for sliding window decode in rocm_aiter_fa#34378
zhuohan123 merged 1 commit intovllm-project:mainfrom
iseeyuan:export-D93009177

iseeyuan commented Feb 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

houseroad left a comment

Uh oh!

Uh oh!

AndreasKaratzas commented Feb 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

iseeyuan commented Feb 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AndreasKaratzas commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AndreasKaratzas commented Feb 15, 2026 •

edited

Loading