[bugfix] Fix mooncake kvpool accuracy issue#4976
[bugfix] Fix mooncake kvpool accuracy issue#4976wangxiyuan merged 2 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request aims to resolve a KVPool accuracy issue by introducing event synchronization. While the approach is sound for the non-layerwise path, the implementation has several critical flaws. There are typos in a new parameter name and its usage which will lead to runtime errors. More significantly, the changes for the layerwise saving path are incomplete and buggy. An attempt is made to access a dataclass object as a dictionary, and the necessary current_event is not properly propagated through the call stack, meaning the fix won't apply to layerwise operations. These issues must be addressed to ensure the fix is effective and the code is robust.
| req_id: str, | ||
| token_len: int, | ||
| block_ids: list[int], | ||
| currnet_event: None, |
There was a problem hiding this comment.
There's a typo in the parameter name currnet_event; it should be current_event. Additionally, the type hint None is incorrect. It should be Optional[torch.npu.Event] to accurately represent the type of the event object being passed.
| currnet_event: None, | |
| current_event: Optional[torch.npu.Event], |
| "req_id": req_id, | ||
| "token_len": token_len, | ||
| "block_ids": block_ids, | ||
| "current_event": currnet_event, |
| addr_list = [] | ||
| size_list = [] | ||
| key_list = [] | ||
| current_event = req_meta["current_event"] |
There was a problem hiding this comment.
req_meta is an instance of the LasyerMultiBlockReqMeta dataclass, not a dictionary. Accessing it with ["current_event"] will raise a TypeError. It should be accessed as an attribute, e.g., req_meta.current_event.
Furthermore, this reveals a larger issue: the LasyerMultiBlockReqMeta dataclass (defined in vllm_ascend/distributed/kvpool/config_data.py) is missing a current_event field. This field must be added. Consequently, the creation of LasyerMultiBlockReqMeta instances in pool_worker.py's store_layer method needs to be updated to pass the current_event, and the save_kv_layer method needs logic to create this event, similar to what was done for wait_for_save. Without these changes, the accuracy fix will not apply to the layerwise saving path.
| current_event = req_meta["current_event"] | |
| current_event = req_meta.current_event |
|
please fix the merge conflict |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: LCAIZJ <leichao139636@163.com>
Signed-off-by: LCAIZJ <leichao139636@163.com>
Already fixed. |
### What this PR does / why we need it? The current KVPool has a accuracy issue vllm-project#4412. This PR aims to fix the precision problem without impacting prefill performance. Note:Due to a bug in ADXL, calling `current_event.synchronize()` may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: LCAIZJ <leichao139636@163.com>
### What this PR does / why we need it? The current KVPool has a accuracy issue vllm-project#4412. This PR aims to fix the precision problem without impacting prefill performance. Note:Due to a bug in ADXL, calling `current_event.synchronize()` may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: LCAIZJ <leichao139636@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? The current KVPool has a accuracy issue vllm-project#4412. This PR aims to fix the precision problem without impacting prefill performance. Note:Due to a bug in ADXL, calling `current_event.synchronize()` may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: LCAIZJ <leichao139636@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
The current KVPool has a accuracy issue #4412. This PR aims to fix the precision problem without impacting prefill performance.
Note:Due to a bug in ADXL, calling
current_event.synchronize()may occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You can manually build the master branch of the project at https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1 release.