[bugfix][torchair] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation #3066

linfeng-yuan · 2025-09-20T12:53:20Z

What this PR does / why we need it?

This PR removes redundant calling of reshape_and_cache operation at prefilling stage with torchair graph mode. This reduces prefilling latency as well as fixes accuracy problem while enable_kv_nz is enabled. Although #2988 fixes enable_kv_nz accuracy problem, the output tokens with deepseek is inaccurate, leading to a decline in benchmark scoring.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

We run e2e online serving and accuracy test containing eager mode with enable_shared_expert_dp and torchair graph mode with enable_kv_nz.

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@6d8246a

github-actions · 2025-09-20T12:53:28Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request correctly removes a redundant _npu_reshape_and_cache operation that was being called in torchair graph mode. This is a good simplification, as caching is already handled by npu_kv_rmsnorm_rope_cache in that scenario. However, an important assertion checking the kv_cache size was also removed. I've recommended re-adding it to ensure code robustness and prevent potential runtime errors.

vllm_ascend/torchair/torchair_mla.py

…cache operation Signed-off-by: linfeng-yuan <[email protected]>

codecov · 2025-09-20T13:37:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.96%. Comparing base (1bbb20e) to head (d745a8e).
⚠️ Report is 81 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3066      +/-   ##
==========================================
- Coverage   74.76%   71.96%   -2.81%     
==========================================
  Files         150      168      +18     
  Lines       20891    23544    +2653     
==========================================
+ Hits        15620    16943    +1323     
- Misses       5271     6601    +1330

Flag	Coverage Δ
unittests	`71.96% <100.00%> (-2.81%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jianzs · 2025-09-21T12:52:10Z

@linfeng-yuan Thanks a lot!!!

jianzs · 2025-09-22T07:17:39Z

@linfeng-yuan This pull request only fixes one accuracy problem. Tests show accuracy is fine without KV NZ enabled, but still problematic when it's on, even after applying this change. The GSM-8K benchmark scores are still too low with KV NZ active....

jianzs · 2025-09-22T08:17:54Z

@linfeng-yuan The torch_npu.atb.npu_ring_mla and torch_npu._npu_flash_attention functions were used in the prefill stage, but the code doesn't seem to have any adaptations for KV NZ. This might be the cause of the problem?

jianzs

This PR cannot be merged until kv_nz is supported during the prefill phase.

gemini-code-assist bot reviewed Sep 20, 2025

View reviewed changes

vllm_ascend/torchair/torchair_mla.py Show resolved Hide resolved

linfeng-yuan force-pushed the fix_torchair_kv_nz branch 2 times, most recently from 1aace42 to 3893cd1 Compare September 20, 2025 13:19

[bugfix] fix kv_nz accuracy problem and delete redundant reshape_and_…

d745a8e

…cache operation Signed-off-by: linfeng-yuan <[email protected]>

linfeng-yuan force-pushed the fix_torchair_kv_nz branch from 3893cd1 to d745a8e Compare September 20, 2025 13:21

linfeng-yuan changed the title ~~[bugfix] fix kv_nz accuracy problem and delete redundant reshape_and_cache operation~~ [bugfix] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation Sep 20, 2025

linfeng-yuan added ready read for review ready-for-test start test by label for PR labels Sep 20, 2025

linfeng-yuan changed the title ~~[bugfix] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation~~ [bugfix][torchair] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation Sep 20, 2025

linfeng-yuan requested review from ApsarasX and jianzs September 21, 2025 02:29

wangxiyuan approved these changes Sep 22, 2025

View reviewed changes

wangxiyuan added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Sep 22, 2025

linfeng-yuan removed ready read for review ready-for-test start test by label for PR labels Sep 22, 2025

jianzs requested changes Sep 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix][torchair] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation #3066

[bugfix][torchair] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation #3066

Uh oh!

linfeng-yuan commented Sep 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

codecov bot commented Sep 20, 2025 •

edited

Loading

Uh oh!

jianzs commented Sep 21, 2025

Uh oh!

jianzs commented Sep 22, 2025

Uh oh!

jianzs commented Sep 22, 2025

Uh oh!

jianzs left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[bugfix][torchair] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation #3066

Are you sure you want to change the base?

[bugfix][torchair] fix kv_nz accuracy problem and remove redundant reshape_and_cache operation #3066

Uh oh!

Conversation

linfeng-yuan commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

codecov bot commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jianzs commented Sep 21, 2025

Uh oh!

jianzs commented Sep 22, 2025

Uh oh!

jianzs commented Sep 22, 2025

Uh oh!

jianzs left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linfeng-yuan commented Sep 20, 2025 •

edited

Loading

codecov bot commented Sep 20, 2025 •

edited

Loading