[ROCm][WIP]: Fused aiter rope kvcache mla#35245
[ROCm][WIP]: Fused aiter rope kvcache mla#35245Rohan138 wants to merge 38 commits intovllm-project:mainfrom
Conversation
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…ROCm/vllm into fused_aiter_triton_rope_kvcache
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a fusion for RoPE, KV cache update, and concatenation operations for MLA on ROCm with AITER, which is a significant performance optimization. The changes include adding a new custom op for rotary embedding, refactoring the MLA attention layer to separate KV cache updates, and updating tests and configurations. The implementation looks mostly correct, but I've found a potential logic bug in the configuration and a minor inconsistency in function signatures that should be addressed.
| rocm_aiter_ops.is_rmsnorm_enabled() | ||
| and not rocm_aiter_ops.is_triton_gemm_enabled() |
There was a problem hiding this comment.
There seems to be a logic inversion here. The docstring on line 143 states that this fusion should be enabled when using 'AITER Triton GEMMs', but the code on line 149 checks for not rocm_aiter_ops.is_triton_gemm_enabled(). This appears to be a bug that would incorrectly disable the fusion when Triton GEMMs are in use. Should this be rocm_aiter_ops.is_triton_gemm_enabled() to match the docstring and the previous logic?
| rocm_aiter_ops.is_rmsnorm_enabled() | |
| and not rocm_aiter_ops.is_triton_gemm_enabled() | |
| rocm_aiter_ops.is_rmsnorm_enabled() | |
| and rocm_aiter_ops.is_triton_gemm_enabled() |
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Di Wu <dw2761@nyu.edu>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
968c0e6 to
652c032
Compare
8b5ed5d to
4d6956f
Compare
Fuse RoPE+cache+cat ops for MLA e.g. DeepSeek, Kimi on ROCm + AITER similar to #33443.
Requires:
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.