Skip to content

[ROCm][WIP]: Fused aiter rope kvcache mla#35245

Draft
Rohan138 wants to merge 38 commits intovllm-project:mainfrom
ROCm:fused_aiter_rope_kvcache_mla
Draft

[ROCm][WIP]: Fused aiter rope kvcache mla#35245
Rohan138 wants to merge 38 commits intovllm-project:mainfrom
ROCm:fused_aiter_rope_kvcache_mla

Conversation

@Rohan138
Copy link
Contributor

@Rohan138 Rohan138 commented Feb 24, 2026

Fuse RoPE+cache+cat ops for MLA e.g. DeepSeek, Kimi on ROCm + AITER similar to #33443.

Requires:

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

ElizaWszola and others added 12 commits February 16, 2026 15:20
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@mergify mergify bot added the rocm Related to AMD ROCm label Feb 24, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Feb 24, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fusion for RoPE, KV cache update, and concatenation operations for MLA on ROCm with AITER, which is a significant performance optimization. The changes include adding a new custom op for rotary embedding, refactoring the MLA attention layer to separate KV cache updates, and updating tests and configurations. The implementation looks mostly correct, but I've found a potential logic bug in the configuration and a minor inconsistency in function signatures that should be addressed.

Comment on lines +148 to +149
rocm_aiter_ops.is_rmsnorm_enabled()
and not rocm_aiter_ops.is_triton_gemm_enabled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There seems to be a logic inversion here. The docstring on line 143 states that this fusion should be enabled when using 'AITER Triton GEMMs', but the code on line 149 checks for not rocm_aiter_ops.is_triton_gemm_enabled(). This appears to be a bug that would incorrectly disable the fusion when Triton GEMMs are in use. Should this be rocm_aiter_ops.is_triton_gemm_enabled() to match the docstring and the previous logic?

Suggested change
rocm_aiter_ops.is_rmsnorm_enabled()
and not rocm_aiter_ops.is_triton_gemm_enabled()
rocm_aiter_ops.is_rmsnorm_enabled()
and rocm_aiter_ops.is_triton_gemm_enabled()

Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
@mergify
Copy link

mergify bot commented Feb 26, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Rohan138.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 26, 2026
ElizaWszola and others added 3 commits February 26, 2026 06:32
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@mergify mergify bot added v1 and removed needs-rebase labels Feb 26, 2026
Rohan138 and others added 4 commits February 26, 2026 01:11
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>

Co-authored-by: Di Wu <dw2761@nyu.edu>
@mergify
Copy link

mergify bot commented Mar 2, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Rohan138.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 2, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@mergify mergify bot removed the needs-rebase label Mar 2, 2026
Rohan138 and others added 5 commits March 2, 2026 15:25
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 force-pushed the fused_aiter_rope_kvcache_mla branch from 968c0e6 to 652c032 Compare March 6, 2026 23:01
@Rohan138 Rohan138 force-pushed the fused_aiter_rope_kvcache_mla branch 2 times, most recently from 8b5ed5d to 4d6956f Compare March 12, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rocm Related to AMD ROCm v1

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants