Skip to content

[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1#33086

Closed
chaunceyjiang wants to merge 2 commits intovllm-project:mainfrom
chaunceyjiang:pd_dsv32_2
Closed

[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1#33086
chaunceyjiang wants to merge 2 commits intovllm-project:mainfrom
chaunceyjiang:pd_dsv32_2

Conversation

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang commented Jan 26, 2026

Purpose

Fix #33074

Introduced #30207

# Figure out whether the first dimension of the cache is K/V
# or num_blocks. This is used to register the memory regions correctly.
kv_cache_shape = self.attn_backend.get_kv_cache_shape(
num_blocks=1, block_size=16, num_kv_heads=4, head_size=1
)

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@mergify mergify bot added deepseek Related to DeepSeek models v1 bug Something isn't working labels Jan 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug by removing a restrictive assertion related to num_kv_heads in the DeepseekV32IndexerBackend. While this resolves the immediate AssertionError, it's crucial to ensure that the backend's get_kv_cache_shape and underlying attention kernels correctly handle scenarios where num_kv_heads is greater than 1. The PR description could benefit from more details regarding the specific DeepseekV3.2 configurations that triggered the error and how the backend is expected to behave with varying num_kv_heads values. Additionally, filling out the 'Purpose', 'Test Plan', and 'Test Result' sections in the PR body would greatly enhance clarity and reviewability.

@chaunceyjiang chaunceyjiang marked this pull request as ready for review January 26, 2026 11:03
head_size: int,
cache_dtype_str: str = "auto",
) -> tuple[int, ...]:
assert num_kv_heads == 1
Copy link
Copy Markdown
Collaborator Author

@chaunceyjiang chaunceyjiang Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Figure out whether the first dimension of the cache is K/V
# or num_blocks. This is used to register the memory regions correctly.
kv_cache_shape = self.attn_backend.get_kv_cache_shape(
num_blocks=1, block_size=16, num_kv_heads=4, head_size=1
)

/cc @NickLucche

I’m not sure whether this fix is correct. PTAL.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang I think the issue is in how we're using the function, let me look into it

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NickLucche, for a change like #33090, I’m inclined to remove this assert. This would make the use of get_kv_cache_shape more flexible.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as: FlashMLASparseBackend

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickLucche WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: PD report DeepseekV32 AssertionError: num_kv_heads == 1

2 participants