[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1#33086
[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1#33086chaunceyjiang wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug by removing a restrictive assertion related to num_kv_heads in the DeepseekV32IndexerBackend. While this resolves the immediate AssertionError, it's crucial to ensure that the backend's get_kv_cache_shape and underlying attention kernels correctly handle scenarios where num_kv_heads is greater than 1. The PR description could benefit from more details regarding the specific DeepseekV3.2 configurations that triggered the error and how the backend is expected to behave with varying num_kv_heads values. Additionally, filling out the 'Purpose', 'Test Plan', and 'Test Result' sections in the PR body would greatly enhance clarity and reviewability.
| head_size: int, | ||
| cache_dtype_str: str = "auto", | ||
| ) -> tuple[int, ...]: | ||
| assert num_kv_heads == 1 |
There was a problem hiding this comment.
vllm/vllm/distributed/kv_transfer/kv_connector/utils.py
Lines 322 to 326 in 64e3d67
/cc @NickLucche
I’m not sure whether this fix is correct. PTAL.
There was a problem hiding this comment.
@chaunceyjiang I think the issue is in how we're using the function, let me look into it
There was a problem hiding this comment.
Hi @NickLucche, for a change like #33090, I’m inclined to remove this assert. This would make the use of get_kv_cache_shape more flexible.
There was a problem hiding this comment.
same as: FlashMLASparseBackend
Purpose
Fix #33074
Introduced #30207
vllm/vllm/distributed/kv_transfer/kv_connector/utils.py
Lines 322 to 326 in 64e3d67
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.