[Bugfix][V1][P/D]Fix the issue where repeated requests for the same input produce abnormal outputs for P2pNcclConnector#23403
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where repeated identical requests could lead to abnormal outputs. The fix involves reverting a change that offloaded KVCache extraction to a background thread. By moving the extraction logic back into the main thread within P2pNcclConnector.save_kv_layer, it resolves a potential race condition where the KV cache could be modified before being sent, ensuring data integrity. The changes in P2pNcclEngine are consistent with this revert.
I've identified a potential critical issue where attn_metadata could be None, leading to a crash. Please see my detailed comment.
| if isinstance(attn_metadata, MLACommonMetadata): | ||
| num_pages, page_size = layer.shape[0], layer.shape[1] | ||
| return layer.reshape(num_pages * page_size, -1)[slot_mapping, | ||
| ...] | ||
| num_pages, page_size = layer.shape[1], layer.shape[2] | ||
| return layer.reshape(2, num_pages * page_size, -1)[:, slot_mapping, | ||
| ...] |
There was a problem hiding this comment.
There is a potential None value for attn_metadata that is not handled here, which could lead to a runtime crash.
The start_load_kv method in this class (line 122) checks if attn_metadata is None and returns early. However, save_kv_layer does not have this check. If attn_metadata is None, isinstance(attn_metadata, MLACommonMetadata) on line 257 will evaluate to False. For an MLA model, this will cause the code to incorrectly take the non-MLA path, resulting in an IndexError on line 261 because layer.shape will have 3 dimensions instead of the expected 4.
To prevent this crash, it's recommended to add a check for attn_metadata is None at the beginning of the save_kv_layer method, similar to the implementation in start_load_kv.
|
@KuntaiDu PTAL! |
…nput produce abnormal outputs for P2pNcclConnector (vllm-project#23403) Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Terrencezzj <terrence@cohere.ai>
…nput produce abnormal outputs for P2pNcclConnector (vllm-project#23403) Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: tc-mb <caitianchi@modelbest.cn>
…nput produce abnormal outputs for P2pNcclConnector (vllm-project#23403) Signed-off-by: Abatom <abzhonghua@gmail.com>
…nput produce abnormal outputs for P2pNcclConnector (vllm-project#23403) Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…nput produce abnormal outputs for P2pNcclConnector (vllm-project#23403) Signed-off-by: Abatom <abzhonghua@gmail.com>
…nput produce abnormal outputs for P2pNcclConnector (vllm-project#23403) Signed-off-by: Abatom <abzhonghua@gmail.com>
…nput produce abnormal outputs for P2pNcclConnector (vllm-project#23403) Signed-off-by: Abatom <abzhonghua@gmail.com>
Purpose
Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.