[Bugfix] record cos and sin cache in AscendRotaryEmbedding by Debonex · Pull Request #5516 · vllm-project/vllm-ascend

Debonex · 2025-12-30T07:58:37Z

What this PR does / why we need it?

In scenarios where models like Moonlight (using MLA but without rope_scaling in config.json) invoke AscendRotaryEmbedding. _cos_cache and _sin_cache are not recorded correctly.

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@45c1ca1

Signed-off-by: Debonex <719893090@qq.com>

gemini-code-assist

Code Review

This pull request addresses a bug where _cos_cache and _sin_cache for rotary embeddings were not being correctly recorded in certain scenarios. The fix involves introducing a new function, _record_cos_and_sin_cache_interleaved, to derive these caches from the combined cos_sin_cache and calling it during the initialization of AscendRotaryEmbedding. My review focuses on improving the implementation of this new function for better readability and maintainability, as the current version is overly complex for its task.

gemini-code-assist · 2025-12-30T08:00:03Z

vllm_ascend/ops/rotary_embedding.py

+def _record_cos_and_sin_cache_interleaved(cos_sin_cache):
+    global _cos_cache
+    global _sin_cache
+    if _cos_cache is not None or _sin_cache is not None:
+        return
+    hidden_dim = cos_sin_cache.shape[-1] // 2
+    cos_cache, sin_cache = cos_sin_cache.view(-1, 2, hidden_dim).repeat(
+        1, 1, 2).chunk(2, dim=1)
+    _cos_cache = cos_cache.squeeze(1)
+    _sin_cache = sin_cache.squeeze(1)


The implementation of this function is overly complex and hard to follow. It can be significantly simplified for better readability and maintainability. The current logic is equivalent to splitting the cos_sin_cache and repeating each part, which can be expressed more directly using chunk and repeat.

Additionally, the function name _record_cos_and_sin_cache_interleaved is misleading because the cos_sin_cache tensor is concatenated ([cos, sin]), not interleaved ([c0, s0, c1, s1, ...]). Consider renaming it to something like _record_cos_and_sin_cache_from_combined to more accurately reflect its purpose.

Suggested change

def _record_cos_and_sin_cache_interleaved(cos_sin_cache):

global _cos_cache

global _sin_cache

if _cos_cache is not None or _sin_cache is not None:

return

hidden_dim = cos_sin_cache.shape[-1] // 2

cos_cache, sin_cache = cos_sin_cache.view(-1, 2, hidden_dim).repeat(

1, 1, 2).chunk(2, dim=1)

_cos_cache = cos_cache.squeeze(1)

_sin_cache = sin_cache.squeeze(1)

def _record_cos_and_sin_cache_interleaved(cos_sin_cache):

global _cos_cache

global _sin_cache

if _cos_cache is not None or _sin_cache is not None:

return

# cos_sin_cache is concatenated from cos and sin, each of size rotary_dim/2.

cos_part, sin_part = cos_sin_cache.chunk(2, dim=-1)

# For neox style, cos and sin are duplicated to match rotary_dim.

_cos_cache = cos_part.repeat(1, 2)

_sin_cache = sin_part.repeat(1, 2)

github-actions · 2025-12-30T09:23:19Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits) [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617) [CI]update bisheng version (vllm-project#5621) [UT][PCP&DCP] UT for block_table.py (vllm-project#5032) [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595) [CI] mv ops to correct path (vllm-project#5615) [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613) Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611) [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267) [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192) [docs] Correct image about prefill phase of PCP (vllm-project#5598) [CI] update triton-ascend version (vllm-project#5584) [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574) [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516) [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492) [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (vllm-project#5474) [CI] Download models from ms (vllm-project#5405) Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256) [Doc] Add NNAL installation guide and requirements (vllm-project#5235) Add the requirement of arctic-inference which speculative decoding with suffix_decode (vllm-project#5045) [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253) ...

…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>

…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>

…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>

[Bugfix] record cos and sin cache in AscendRotaryEmbedding

d03057b

Signed-off-by: Debonex <719893090@qq.com>

gemini-code-assist bot reviewed Dec 30, 2025

View reviewed changes

github-actions bot added the module:ops label Dec 30, 2025

wangxiyuan approved these changes Jan 4, 2026

View reviewed changes

realliujiaxu added ready read for review ready-for-test start test by label for PR labels Jan 5, 2026

wangxiyuan merged commit d86021f into vllm-project:main Jan 5, 2026
61 of 66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] record cos and sin cache in AscendRotaryEmbedding#5516

[Bugfix] record cos and sin cache in AscendRotaryEmbedding#5516
wangxiyuan merged 1 commit intovllm-project:mainfrom
Debonex:main-fix-mla-cos-sin-cache

Debonex commented Dec 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Debonex commented Dec 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Debonex commented Dec 30, 2025 •

edited by github-actions bot

Loading