[Bugfix] record cos and sin cache in AscendRotaryEmbedding#5516
[Bugfix] record cos and sin cache in AscendRotaryEmbedding#5516wangxiyuan merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Debonex <719893090@qq.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where _cos_cache and _sin_cache for rotary embeddings were not being correctly recorded in certain scenarios. The fix involves introducing a new function, _record_cos_and_sin_cache_interleaved, to derive these caches from the combined cos_sin_cache and calling it during the initialization of AscendRotaryEmbedding. My review focuses on improving the implementation of this new function for better readability and maintainability, as the current version is overly complex for its task.
| def _record_cos_and_sin_cache_interleaved(cos_sin_cache): | ||
| global _cos_cache | ||
| global _sin_cache | ||
| if _cos_cache is not None or _sin_cache is not None: | ||
| return | ||
| hidden_dim = cos_sin_cache.shape[-1] // 2 | ||
| cos_cache, sin_cache = cos_sin_cache.view(-1, 2, hidden_dim).repeat( | ||
| 1, 1, 2).chunk(2, dim=1) | ||
| _cos_cache = cos_cache.squeeze(1) | ||
| _sin_cache = sin_cache.squeeze(1) |
There was a problem hiding this comment.
The implementation of this function is overly complex and hard to follow. It can be significantly simplified for better readability and maintainability. The current logic is equivalent to splitting the cos_sin_cache and repeating each part, which can be expressed more directly using chunk and repeat.
Additionally, the function name _record_cos_and_sin_cache_interleaved is misleading because the cos_sin_cache tensor is concatenated ([cos, sin]), not interleaved ([c0, s0, c1, s1, ...]). Consider renaming it to something like _record_cos_and_sin_cache_from_combined to more accurately reflect its purpose.
| def _record_cos_and_sin_cache_interleaved(cos_sin_cache): | |
| global _cos_cache | |
| global _sin_cache | |
| if _cos_cache is not None or _sin_cache is not None: | |
| return | |
| hidden_dim = cos_sin_cache.shape[-1] // 2 | |
| cos_cache, sin_cache = cos_sin_cache.view(-1, 2, hidden_dim).repeat( | |
| 1, 1, 2).chunk(2, dim=1) | |
| _cos_cache = cos_cache.squeeze(1) | |
| _sin_cache = sin_cache.squeeze(1) | |
| def _record_cos_and_sin_cache_interleaved(cos_sin_cache): | |
| global _cos_cache | |
| global _sin_cache | |
| if _cos_cache is not None or _sin_cache is not None: | |
| return | |
| # cos_sin_cache is concatenated from cos and sin, each of size rotary_dim/2. | |
| cos_part, sin_part = cos_sin_cache.chunk(2, dim=-1) | |
| # For neox style, cos and sin are duplicated to match rotary_dim. | |
| _cos_cache = cos_part.repeat(1, 2) | |
| _sin_cache = sin_part.repeat(1, 2) |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits) [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617) [CI]update bisheng version (vllm-project#5621) [UT][PCP&DCP] UT for block_table.py (vllm-project#5032) [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595) [CI] mv ops to correct path (vllm-project#5615) [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613) Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611) [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267) [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192) [docs] Correct image about prefill phase of PCP (vllm-project#5598) [CI] update triton-ascend version (vllm-project#5584) [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574) [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516) [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492) [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (vllm-project#5474) [CI] Download models from ms (vllm-project#5405) Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256) [Doc] Add NNAL installation guide and requirements (vllm-project#5235) Add the requirement of arctic-inference which speculative decoding with suffix_decode (vllm-project#5045) [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253) ...
…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>
…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>
…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>
…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ect#5516) ### What this PR does / why we need it? In scenarios where models like [Moonlight](https://modelscope.cn/models/moonshotai/Moonlight-16B-A3B-Instruct) (using MLA but without `rope_scaling` in config.json) invoke `AscendRotaryEmbedding`. `_cos_cache` and `_sin_cache` are not recorded correctly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: Debonex <719893090@qq.com>
What this PR does / why we need it?
In scenarios where models like Moonlight (using MLA but without
rope_scalingin config.json) invokeAscendRotaryEmbedding._cos_cacheand_sin_cacheare not recorded correctly.Does this PR introduce any user-facing change?
No
How was this patch tested?