Skip to content

[0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA.#1872

Merged
ganyi1996ppo merged 1 commit intovllm-project:v0.9.1-devfrom
whx-sjtu:nz_opt_091
Jul 19, 2025
Merged

[0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA.#1872
ganyi1996ppo merged 1 commit intovllm-project:v0.9.1-devfrom
whx-sjtu:nz_opt_091

Conversation

@whx-sjtu
Copy link
Copy Markdown
Collaborator

@whx-sjtu whx-sjtu commented Jul 18, 2025

This PR removes NZ transformation of weights of kv_b_proj. This is because we find that this matmul weight is not quantized and will fall back to ND calculation in runtime (because currently float bmm nz is not supported in torchair graph), which causes two redundant transData operations (trans weight from NZ back to ND). Removing these two operations will provide an optimization of about 40us per layer.

Signed-off-by: whx-sjtu <2952154980@qq.com>
@ttanzhiqiang
Copy link
Copy Markdown
Contributor

#1131 This PR does this

@ganyi1996ppo ganyi1996ppo merged commit 5be1d8c into vllm-project:v0.9.1-dev Jul 19, 2025
16 checks passed
NNUCJ pushed a commit to NNUCJ/vllm-ascend that referenced this pull request Jul 21, 2025
This PR removes NZ transformation of weights of kv_b_proj. This is
because we find that this matmul weight is not quantized and will fall
back to ND calculation in runtime (because currently float bmm nz is not
supported in torchair graph), which causes two redundant transData
operations (trans weight from NZ back to ND). Removing these two
operations will provide an optimization of about 40us per layer.

Signed-off-by: whx-sjtu <2952154980@qq.com>
@whx-sjtu whx-sjtu deleted the nz_opt_091 branch October 20, 2025 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants