[Bugfix]: Correct handling of cos_sin_cache length by jianzs · Pull Request #1900 · vllm-project/vllm-ascend

jianzs · 2025-07-21T03:44:32Z

What this PR does / why we need it?

This PR addresses the performance issue related to cos/sin cache handling:

The cos/sin cache is already initialized with the maximum context length during initialization. However, due to max_seq_len_cache being stored as seq_len, the condition check was incorrect, leading to unnecessary cache recreation.
Since the cos/sin cache is already initialized with maximum context length, it should not trigger recreation during the process.
Fixed variable naming: max_seq_len_cache was never used and should be max_seq_len. This also is the correct variable to check against the maximum context length.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI pass.

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@afa5b7c

jianzs · 2025-07-21T03:44:53Z

@whx-sjtu PTAL

Copilot

Pull Request Overview

This PR fixes performance issues in rotary embedding cos/sin cache handling by correcting variable usage and preventing unnecessary cache recreation. The fix ensures that the cache, which is already initialized with maximum context length, is not unnecessarily recreated during processing.

Replaces cache recreation logic with an error when max_seq_len exceeds the initialized maximum
Corrects variable assignment in _set_cos_sin_cache from max_seq_len_cached to max_seq_len
Removes redundant max_seq_len assignment during initialization since the cache setup handles this

whx-sjtu

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

codecov · 2025-07-21T05:29:24Z

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.09%. Comparing base (0bd5ff5) to head (d07d61a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/rotary_embedding.py	25.00%	3 Missing ⚠️

❌ Your patch check has failed because the patch coverage (40.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1900   +/-   ##
=======================================
  Coverage   76.09%   76.09%           
=======================================
  Files         114      114           
  Lines       13103    13100    -3     
=======================================
- Hits         9971     9969    -2     
+ Misses       3132     3131    -1

Flag	Coverage Δ
unittests	`76.09% <40.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ApsarasX · 2025-07-21T06:11:19Z

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

weijinqian0 · 2025-07-21T08:00:32Z


 def _set_cos_sin_cache(self, seq_len, device, dtype):
-    self.max_seq_len_cached = seq_len
+    self.max_seq_len = seq_len * self.scaling_factor


There is no problem in v0.9.1. what happens.

#1551 fixed this problem in v0.9.1-dev

So can we just cherry-pick this one?

Ok, I've cherry-picked #1551 into the current pr.

jianzs · 2025-07-21T09:00:55Z

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

@ApsarasX #1551

jianzs · 2025-07-29T11:49:35Z

@zzzzwwjj PTAL

fix OOM error when `chunked_prefill_for_mla` is enable and long input scene. Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs · 2025-08-11T07:01:20Z

@wangxiyuan @Yikun @ganyi1996ppo ready to merge.

ganyi1996ppo · 2025-08-11T07:18:09Z

-            position=positions,
-            attn_state=attn_state)
+        # NOTE: when use ring_mla, attn_mask don't need to generate here.
+        if not self.use_ring_mla or attn_state == AscendAttentionState.PrefillNoCache:


According to the comments, there should be and rather than or?

Indeed, it looks as you described. This PR is a cherry-pick from #1551. @zzzzwwjj could you please verify this?

According to the comments, there should be and rather than or?

When use ring_mla and attn_state == PrefillNoCache, it will not use ringattn op and also need attn_mask. So or is right.

should be considered with #2319

github-actions · 2025-08-21T00:57:07Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions Bot added the module:ops label Jul 21, 2025

jianzs requested review from Copilot and ganyi1996ppo July 21, 2025 03:45

Copilot AI reviewed Jul 21, 2025

View reviewed changes

Comment thread vllm_ascend/ops/rotary_embedding.py Outdated

whx-sjtu approved these changes Jul 21, 2025

View reviewed changes

jianzs requested review from ApsarasX, Yikun and wangxiyuan July 21, 2025 04:11

jianzs added performance-test enable performance test for PR accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR labels Jul 21, 2025

github-actions Bot added the module:tests label Jul 21, 2025

jianzs added the ready read for review label Jul 21, 2025

weijinqian0 reviewed Jul 21, 2025

View reviewed changes

ApsarasX approved these changes Jul 21, 2025

View reviewed changes

jianzs force-pushed the fix/cos_sin_cache_len branch 2 times, most recently from 20683f4 to 3eb5b75 Compare July 29, 2025 11:45

github-actions Bot added module:tests and removed module:tests labels Jul 29, 2025

jianzs force-pushed the fix/cos_sin_cache_len branch 3 times, most recently from 2fee7a0 to d07d61a Compare August 8, 2025 06:19

zzzzwwjj and others added 2 commits August 11, 2025 11:36

[0.9.1][bugfix] fix deepseek memory bug (vllm-project#1551)

1138b60

fix OOM error when `chunked_prefill_for_mla` is enable and long input scene. Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: test case

e25dee4

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs force-pushed the fix/cos_sin_cache_len branch from d07d61a to e25dee4 Compare August 11, 2025 03:36

ganyi1996ppo reviewed Aug 11, 2025

View reviewed changes

github-actions Bot added merge-conflicts and removed ready read for review labels Aug 21, 2025

jianzs closed this Sep 25, 2025

Conversation

jianzs commented Jul 21, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

jianzs commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ApsarasX commented Jul 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jianzs commented Jul 21, 2025

Uh oh!

jianzs commented Jul 29, 2025

Uh oh!

jianzs commented Aug 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jianzs commented Jul 21, 2025 •

edited by github-actions Bot

Loading

codecov Bot commented Jul 21, 2025 •

edited

Loading