Skip to content

[Bugfix]: Correct handling of cos_sin_cache length#1900

Closed
jianzs wants to merge 2 commits intovllm-project:mainfrom
jianzs:fix/cos_sin_cache_len
Closed

[Bugfix]: Correct handling of cos_sin_cache length#1900
jianzs wants to merge 2 commits intovllm-project:mainfrom
jianzs:fix/cos_sin_cache_len

Conversation

@jianzs
Copy link
Copy Markdown
Collaborator

@jianzs jianzs commented Jul 21, 2025

What this PR does / why we need it?

This PR addresses the performance issue related to cos/sin cache handling:

  1. The cos/sin cache is already initialized with the maximum context length during initialization. However, due to max_seq_len_cache being stored as seq_len, the condition check was incorrect, leading to unnecessary cache recreation.

  2. Since the cos/sin cache is already initialized with maximum context length, it should not trigger recreation during the process.

  3. Fixed variable naming: max_seq_len_cache was never used and should be max_seq_len. This also is the correct variable to check against the maximum context length.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI pass.

@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Jul 21, 2025

@whx-sjtu PTAL

@jianzs jianzs requested review from Copilot and ganyi1996ppo July 21, 2025 03:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes performance issues in rotary embedding cos/sin cache handling by correcting variable usage and preventing unnecessary cache recreation. The fix ensures that the cache, which is already initialized with maximum context length, is not unnecessarily recreated during processing.

  • Replaces cache recreation logic with an error when max_seq_len exceeds the initialized maximum
  • Corrects variable assignment in _set_cos_sin_cache from max_seq_len_cached to max_seq_len
  • Removes redundant max_seq_len assignment during initialization since the cache setup handles this

Comment thread vllm_ascend/ops/rotary_embedding.py Outdated
Copy link
Copy Markdown
Collaborator

@whx-sjtu whx-sjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@jianzs jianzs requested review from ApsarasX, Yikun and wangxiyuan July 21, 2025 04:11
@jianzs jianzs added performance-test enable performance test for PR accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR labels Jul 21, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 21, 2025

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.09%. Comparing base (0bd5ff5) to head (d07d61a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/rotary_embedding.py 25.00% 3 Missing ⚠️

❌ Your patch check has failed because the patch coverage (40.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1900   +/-   ##
=======================================
  Coverage   76.09%   76.09%           
=======================================
  Files         114      114           
  Lines       13103    13100    -3     
=======================================
- Hits         9971     9969    -2     
+ Misses       3132     3131    -1     
Flag Coverage Δ
unittests 76.09% <40.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ApsarasX
Copy link
Copy Markdown
Collaborator

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

@jianzs jianzs added the ready read for review label Jul 21, 2025
Comment thread vllm_ascend/ops/rotary_embedding.py Outdated

def _set_cos_sin_cache(self, seq_len, device, dtype):
self.max_seq_len_cached = seq_len
self.max_seq_len = seq_len * self.scaling_factor
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no problem in v0.9.1. what happens.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1551 fixed this problem in v0.9.1-dev

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can we just cherry-pick this one?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've cherry-picked #1551 into the current pr.

@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Jul 21, 2025

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

@ApsarasX #1551

@jianzs jianzs force-pushed the fix/cos_sin_cache_len branch 2 times, most recently from 20683f4 to 3eb5b75 Compare July 29, 2025 11:45
@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Jul 29, 2025

@zzzzwwjj PTAL

@jianzs jianzs force-pushed the fix/cos_sin_cache_len branch 3 times, most recently from 2fee7a0 to d07d61a Compare August 8, 2025 06:19
zzzzwwjj and others added 2 commits August 11, 2025 11:36
fix OOM error when `chunked_prefill_for_mla` is enable and long input
scene.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs jianzs force-pushed the fix/cos_sin_cache_len branch from d07d61a to e25dee4 Compare August 11, 2025 03:36
@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Aug 11, 2025

@wangxiyuan @Yikun @ganyi1996ppo ready to merge.

position=positions,
attn_state=attn_state)
# NOTE: when use ring_mla, attn_mask don't need to generate here.
if not self.use_ring_mla or attn_state == AscendAttentionState.PrefillNoCache:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comments, there should be and rather than or?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it looks as you described. This PR is a cherry-pick from #1551. @zzzzwwjj could you please verify this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comments, there should be and rather than or?

When use ring_mla and attn_state == PrefillNoCache, it will not use ringattn op and also need attn_mask. So or is right.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be considered with #2319

@github-actions github-actions Bot added merge-conflicts and removed ready read for review labels Aug 21, 2025
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@jianzs jianzs closed this Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accuracy-test enable all accuracy test for PR merge-conflicts module:ops module:tests performance-test enable performance test for PR ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants