[Refactor] cache cos/sin in mla & remove parameter model in builder. by weijinqian0 · Pull Request #5277 · vllm-project/vllm-ascend

weijinqian0 · 2025-12-23T05:34:11Z

RFC: #4629

Cache cos/sin in mla
AttentionBuilder inherits from the original class of vllm.

version: release/v0.13.0

vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

gemini-code-assist

Code Review

This pull request effectively resolves a bug in xlite caused by an attempt to access a non-existent attribute query_start_loc_cpu on AscendMetadata. The fix correctly moves the calculation of query_lens into the AscendAttentionMetadataBuilder, storing the result in AscendMetadata. This not only fixes the AttributeError but also centralizes the logic, improving code structure. The changes are accurate and well-implemented.

github-actions · 2025-12-23T05:38:05Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

wangxiyuan · 2025-12-26T09:08:25Z

let's merge this after 0.13.0 released

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (46 commits) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) [ReleaseNote] Add release note for v0.13.0rc1 (vllm-project#5334) [Bugfix] Correctly handle the output shape in multimodal attention (vllm-project#5443) Fix nightly (vllm-project#5413) [bugfix] fix typo of _skip_all_reduce_across_dp_group (vllm-project#5435) [Doc]modify pcp tutorial doc (vllm-project#5440) [Misc] fast fail for exiting if tools/install_flash_infer_attention_score_ops_a2.sh (vllm-project#5422) [Doc] Update DeepSeek V3.1/R1 2P1D doc (vllm-project#5387) [DOC]Fix model weight download links (vllm-project#5436) [Doc] Modify DeepSeek-R1/V3.1 documentation (vllm-project#5426) Revert "[feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300)" (vllm-project#5434) [Bugfix] fix greedy temperature detection (vllm-project#5417) [doc] Update Qwen3-235B doc for reproducing latest performance (vllm-project#5323) [feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300) [Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (vllm-project#5419) [Doc] add long_sequence feature user guide (vllm-project#5343) ...

…llm-project#5277) RFC: vllm-project#4629 1. Cache cos/sin in mla 2. AttentionBuilder inherits from the original class of vllm. version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (88 commits) [1/N] Refactor nightly test structure (vllm-project#5479) Docs: Remove deprecated --task parameter for embedding models (vllm-project#5257) Revert "moe_gating_top_k" (vllm-project#5512) [Doc] Fix issue link for 0.12.0 (vllm-project#5500) [CI]update triton ascend version (vllm-project#5392) moe_gating_top_k (vllm-project#5271) [refactor] refactor model runner capture model (vllm-project#5230) Update corresponding vllm commit ID to 12 29 (vllm-project#5475) [Kernel]update csrc cmakelist for open-source cann (vllm-project#5458) [OP] add custom op aclnnMoeInitRoutingCustom (vllm-project#5251) [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (vllm-project#5176) [Refactor][Triton] Move reject sample triton kernels into ops/triton (vllm-project#5324) [Feature] support eager mode in model runner v2 (vllm-project#5210) [feature] fia support sliding windows (vllm-project#5239) Optimize some rejectsampler functions to make npu op launch non-blocking (vllm-project#4587) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) ...

…llm-project#5277) RFC: vllm-project#4629 1. Cache cos/sin in mla 2. AttentionBuilder inherits from the original class of vllm. version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…llm-project#5277) RFC: vllm-project#4629 1. Cache cos/sin in mla 2. AttentionBuilder inherits from the original class of vllm. version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

…llm-project#5277) RFC: vllm-project#4629 1. Cache cos/sin in mla 2. AttentionBuilder inherits from the original class of vllm. version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

weijinqian_v1 added 2 commits December 23, 2025 13:32

[bugfix] fix xlite error: has no attribute 'query_start_loc_cpu'

7766f8d

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[bugfix] fix xlite error: has no attribute 'query_start_loc_cpu'

2015711

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

gemini-code-assist bot reviewed Dec 23, 2025

View reviewed changes

weijinqian0 and others added 7 commits December 24, 2025 10:45

Merge branch 'vllm-project:main' into refactor_attention

32a1cfc

[Refactor] use cos_sin_cache

5432405

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache

e58e977

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache

09ea370

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

5bfb03a

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

03630b8

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

Merge branch 'vllm-project:main' into refactor_attention

e6dd46c

weijinqian0 changed the title ~~[bugfix] fix xlite error: has no attribute 'query_start_loc_cpu'~~ [Refactor] use cos_sin_cache & remove parameter model in builder. Dec 24, 2025

weijinqian0 changed the title ~~[Refactor] use cos_sin_cache & remove parameter model in builder.~~ [Refactor] cache cos/sin in mla & remove parameter model in builder. Dec 24, 2025

weijinqian_v1 added 3 commits December 24, 2025 15:54

[Refactor] use cos_sin_cache & remove parameter like model in builder.

946971e

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

4e3095c

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

45e184f

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

weijinqian0 mentioned this pull request Dec 24, 2025

[RFC]: Refactor Attention module #4629

Closed

[Refactor] use cos_sin_cache & remove parameter like model in builder.

31abe7a

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 24, 2025

weijinqian_v1 and others added 10 commits December 25, 2025 13:04

[Refactor] use cos_sin_cache & remove parameter like model in builder.

607384d

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

cef4b3f

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

35d4c89

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

c79b784

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

f5a795e

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

2604468

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

Merge branch 'main' into refactor_attention

6aef9b3

Merge branch 'main' into refactor_attention

2546349

Merge branch 'main' into refactor_attention

1814fc8

[Refactor] use cos_sin_cache & remove parameter like model in builder.

e1cf263

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Refactor] use cos_sin_cache & remove parameter like model in builder.

ac20bdf

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

wangxiyuan approved these changes Dec 26, 2025

View reviewed changes

Merge branch 'main' into refactor_attention

dd13b53

weijinqian0 merged commit dbe4c33 into vllm-project:main Dec 28, 2025
8 of 10 checks passed

Debonex mentioned this pull request Dec 30, 2025

[Bugfix] Fix no attribute 'cos_cached' situation when running Moonlight-16B-A3B-Instruct #5421

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] cache cos/sin in mla & remove parameter model in builder.#5277

[Refactor] cache cos/sin in mla & remove parameter model in builder.#5277
weijinqian0 merged 25 commits intovllm-project:mainfrom
weijinqian0:refactor_attention

weijinqian0 commented Dec 23, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

wangxiyuan commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weijinqian0 commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

wangxiyuan commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weijinqian0 commented Dec 23, 2025 •

edited

Loading