Skip to content

[0.9.1][Perf] Port MLA multistream optimazition and prefetch to v0.9.1#1750

Merged
ganyi1996ppo merged 1 commit intovllm-project:v0.9.1-devfrom
whx-sjtu:mla_cv_prefetch_091
Jul 13, 2025
Merged

[0.9.1][Perf] Port MLA multistream optimazition and prefetch to v0.9.1#1750
ganyi1996ppo merged 1 commit intovllm-project:v0.9.1-devfrom
whx-sjtu:mla_cv_prefetch_091

Conversation

@whx-sjtu
Copy link
Copy Markdown
Collaborator

This PR port the optimization in PR #1353 to v0.9.1-dev.

@wangxiyuan wangxiyuan changed the title [Perf] Port MLA multistream optimazition and prefetch to v0.9.1 [0.9.1][Perf] Port MLA multistream optimazition and prefetch to v0.9.1 Jul 11, 2025
@whx-sjtu whx-sjtu force-pushed the mla_cv_prefetch_091 branch 2 times, most recently from 559c84f to 2d6640b Compare July 12, 2025 06:37
Signed-off-by: whx-sjtu <2952154980@qq.com>
@whx-sjtu whx-sjtu force-pushed the mla_cv_prefetch_091 branch from 2d6640b to 36ee7ec Compare July 12, 2025 06:50
0,
enabled=use_multistream_mla):
hidden_states_or_q_c = self.q_a_layernorm(ckq)
hidden_states_or_q_c = self.q_a_layernorm(ckq)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to #1353, there is a missing line of code here.

forward_kwargs['ckq'] = ckq

Is there any special consideration for not adding this line of code, or was it simply forgotten?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tensor ckq is passed into mla in order to add a npu_wait_tensor control edge before kv_a_proj_with_mqa of stream2, while in my testing scenario I found that removing this control edge results in a little better performance.

@ganyi1996ppo ganyi1996ppo merged commit 9a5e650 into vllm-project:v0.9.1-dev Jul 13, 2025
16 checks passed
@Yikun Yikun added the no-test label Jul 16, 2025
@whx-sjtu whx-sjtu deleted the mla_cv_prefetch_091 branch October 20, 2025 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants