[FLA] Introduce Kimi Delta Attention(KDA) to VLLM#27654
[FLA] Introduce Kimi Delta Attention(KDA) to VLLM#27654youkaichao merged 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
There was a problem hiding this comment.
Code Review
This pull request introduces Kimi Delta Attention (KDA) into vLLM by adding new kernels and modifying existing ones. The changes are extensive and add a significant new feature. I have identified a couple of critical bugs related to incorrect tensor shapes and memory strides in the Triton kernels, which could lead to incorrect outputs. Additionally, there's a performance-related issue in the autotuning configuration of one of the kernels. Addressing these points will be crucial for the correctness and efficiency of the new implementation.
| num_stages = 3 | ||
| num_warps = 1 | ||
|
|
||
| o = torch.empty_like(k) |
There was a problem hiding this comment.
The output tensor o is being allocated with the shape of the key tensor k (torch.empty_like(k)). However, the output of an attention operation should have the shape of the value tensor v. The shape of k is [B, T, H, K] while v is [B, T, HV, V], which can be different. This will lead to a shape mismatch and incorrect output. Please allocate o with the shape of v.
| o = torch.empty_like(k) | |
| o = torch.empty_like(v) |
There was a problem hiding this comment.
In KDA models, q, k, v, and o share the same shape, so it's safe to use empty_like(k).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
youkaichao
left a comment
There was a problem hiding this comment.
looking forward to the new model 👍
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <waitingwind@foxmail.com>
### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <waitingwind@foxmail.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <waitingwind@foxmail.com> Signed-off-by: hwhaokun <haokun0405@163.com>
### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <waitingwind@foxmail.com> Signed-off-by: nsdie <yeyifan@huawei.com>
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <waitingwind@foxmail.com>
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.