Skip to content

[bugfix] Fix mooncake kvpool accuracy issue#4976

Merged
wangxiyuan merged 2 commits intovllm-project:mainfrom
LCAIZJ:dev
Dec 16, 2025
Merged

[bugfix] Fix mooncake kvpool accuracy issue#4976
wangxiyuan merged 2 commits intovllm-project:mainfrom
LCAIZJ:dev

Conversation

@LCAIZJ
Copy link
Copy Markdown
Collaborator

@LCAIZJ LCAIZJ commented Dec 12, 2025

What this PR does / why we need it?

The current KVPool has a accuracy issue #4412. This PR aims to fix the precision problem without impacting prefill performance.

Note:Due to a bug in ADXL, calling current_event.synchronize() may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to resolve a KVPool accuracy issue by introducing event synchronization. While the approach is sound for the non-layerwise path, the implementation has several critical flaws. There are typos in a new parameter name and its usage which will lead to runtime errors. More significantly, the changes for the layerwise saving path are incomplete and buggy. An attempt is made to access a dataclass object as a dictionary, and the necessary current_event is not properly propagated through the call stack, meaning the fix won't apply to layerwise operations. These issues must be addressed to ensure the fix is effective and the code is robust.

req_id: str,
token_len: int,
block_ids: list[int],
currnet_event: None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a typo in the parameter name currnet_event; it should be current_event. Additionally, the type hint None is incorrect. It should be Optional[torch.npu.Event] to accurately represent the type of the event object being passed.

Suggested change
currnet_event: None,
current_event: Optional[torch.npu.Event],

"req_id": req_id,
"token_len": token_len,
"block_ids": block_ids,
"current_event": currnet_event,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a typo in the variable name currnet_event. It should be current_event to match the corrected parameter name in the function signature.

Suggested change
"current_event": currnet_event,
"current_event": current_event,

addr_list = []
size_list = []
key_list = []
current_event = req_meta["current_event"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

req_meta is an instance of the LasyerMultiBlockReqMeta dataclass, not a dictionary. Accessing it with ["current_event"] will raise a TypeError. It should be accessed as an attribute, e.g., req_meta.current_event.

Furthermore, this reveals a larger issue: the LasyerMultiBlockReqMeta dataclass (defined in vllm_ascend/distributed/kvpool/config_data.py) is missing a current_event field. This field must be added. Consequently, the creation of LasyerMultiBlockReqMeta instances in pool_worker.py's store_layer method needs to be updated to pass the current_event, and the save_kv_layer method needs logic to create this event, similar to what was done for wait_for_save. Without these changes, the accuracy fix will not apply to the layerwise saving path.

Suggested change
current_event = req_meta["current_event"]
current_event = req_meta.current_event

@wangxiyuan
Copy link
Copy Markdown
Collaborator

please fix the merge conflict

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: LCAIZJ <leichao139636@163.com>
Signed-off-by: LCAIZJ <leichao139636@163.com>
@LCAIZJ
Copy link
Copy Markdown
Collaborator Author

LCAIZJ commented Dec 16, 2025

please fix the merge conflict

Already fixed.

@wangxiyuan wangxiyuan merged commit 9c02fa9 into vllm-project:main Dec 16, 2025
23 checks passed
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
### What this PR does / why we need it?

The current KVPool has a accuracy issue
vllm-project#4412. This PR aims to
fix the precision problem without impacting prefill performance.

Note:Due to a bug in ADXL, calling `current_event.synchronize()` may
occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You
can manually build the master branch of the project at
https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1
release.


- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: LCAIZJ <leichao139636@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

The current KVPool has a accuracy issue
vllm-project#4412. This PR aims to
fix the precision problem without impacting prefill performance.

Note:Due to a bug in ADXL, calling `current_event.synchronize()` may
occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You
can manually build the master branch of the project at
https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1
release.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: LCAIZJ <leichao139636@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

The current KVPool has a accuracy issue
vllm-project#4412. This PR aims to
fix the precision problem without impacting prefill performance.

Note:Due to a bug in ADXL, calling `current_event.synchronize()` may
occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You
can manually build the master branch of the project at
https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1
release.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: LCAIZJ <leichao139636@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants