[bugfix] Fix mooncake kvpool accuracy issue by LCAIZJ · Pull Request #4976 · vllm-project/vllm-ascend

LCAIZJ · 2025-12-12T15:25:22Z

What this PR does / why we need it?

The current KVPool has a accuracy issue #4412. This PR aims to fix the precision problem without impacting prefill performance.

Note：Due to a bug in ADXL, calling current_event.synchronize() may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release.

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-12-12T15:25:30Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to resolve a KVPool accuracy issue by introducing event synchronization. While the approach is sound for the non-layerwise path, the implementation has several critical flaws. There are typos in a new parameter name and its usage which will lead to runtime errors. More significantly, the changes for the layerwise saving path are incomplete and buggy. An attempt is made to access a dataclass object as a dictionary, and the necessary current_event is not properly propagated through the call stack, meaning the fix won't apply to layerwise operations. These issues must be addressed to ensure the fix is effective and the code is robust.

gemini-code-assist · 2025-12-12T15:27:04Z

        req_id: str,
        token_len: int,
        block_ids: list[int],
+        currnet_event: None,


There's a typo in the parameter name currnet_event; it should be current_event. Additionally, the type hint None is incorrect. It should be Optional[torch.npu.Event] to accurately represent the type of the event object being passed.

Suggested change

currnet_event: None,

current_event: Optional[torch.npu.Event],

gemini-code-assist · 2025-12-12T15:27:04Z

            "req_id": req_id,
            "token_len": token_len,
            "block_ids": block_ids,
+            "current_event": currnet_event,


There's a typo in the variable name currnet_event. It should be current_event to match the corrected parameter name in the function signature.

Suggested change

"current_event": currnet_event,

"current_event": current_event,

gemini-code-assist · 2025-12-12T15:27:04Z

        addr_list = []
        size_list = []
        key_list = []
+        current_event = req_meta["current_event"]


req_meta is an instance of the LasyerMultiBlockReqMeta dataclass, not a dictionary. Accessing it with ["current_event"] will raise a TypeError. It should be accessed as an attribute, e.g., req_meta.current_event.

Furthermore, this reveals a larger issue: the LasyerMultiBlockReqMeta dataclass (defined in vllm_ascend/distributed/kvpool/config_data.py) is missing a current_event field. This field must be added. Consequently, the creation of LasyerMultiBlockReqMeta instances in pool_worker.py's store_layer method needs to be updated to pass the current_event, and the save_kv_layer method needs logic to create this event, similar to what was done for wait_for_save. Without these changes, the accuracy fix will not apply to the layerwise saving path.

Suggested change

current_event = req_meta["current_event"]

current_event = req_meta.current_event

wangxiyuan · 2025-12-15T13:56:41Z

please fix the merge conflict

github-actions · 2025-12-15T14:00:27Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: LCAIZJ <leichao139636@163.com>

LCAIZJ · 2025-12-16T03:11:42Z

please fix the merge conflict

Already fixed.

### What this PR does / why we need it? The current KVPool has a accuracy issue vllm-project#4412. This PR aims to fix the precision problem without impacting prefill performance. Note：Due to a bug in ADXL, calling `current_event.synchronize()` may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: LCAIZJ <leichao139636@163.com>

### What this PR does / why we need it? The current KVPool has a accuracy issue vllm-project#4412. This PR aims to fix the precision problem without impacting prefill performance. Note：Due to a bug in ADXL, calling `current_event.synchronize()` may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: LCAIZJ <leichao139636@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

LCAIZJ force-pushed the dev branch from 71ff671 to 99148a2 Compare December 15, 2025 08:36

wangxiyuan approved these changes Dec 15, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 15, 2025

LCAIZJ added 2 commits December 16, 2025 09:50

fix mooncake kvpool accuracy issue

e95a11b

Signed-off-by: LCAIZJ <leichao139636@163.com>

fix conflict and layerwise add current_event

dd9ac46

Signed-off-by: LCAIZJ <leichao139636@163.com>

LCAIZJ force-pushed the dev branch from e6397e9 to dd9ac46 Compare December 16, 2025 03:09

github-actions bot removed the merge-conflicts label Dec 16, 2025

wangxiyuan merged commit 9c02fa9 into vllm-project:main Dec 16, 2025
23 checks passed

LCAIZJ mentioned this pull request Dec 24, 2025

[P/D]Improve the performance of Layerwise Connector #5303

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Fix mooncake kvpool accuracy issue#4976

[bugfix] Fix mooncake kvpool accuracy issue#4976
wangxiyuan merged 2 commits intovllm-project:mainfrom
LCAIZJ:dev

LCAIZJ commented Dec 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

wangxiyuan commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

LCAIZJ commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	currnet_event: None,
	current_event: Optional[torch.npu.Event],

	"current_event": currnet_event,
	"current_event": current_event,

	current_event = req_meta["current_event"]
	current_event = req_meta.current_event

Conversation

LCAIZJ commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

LCAIZJ commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LCAIZJ commented Dec 12, 2025 •

edited

Loading