Skip to content

[Bugfix] Fix DeepSeek V3.2 OOM during CG memory profiling#36691

Merged
MatthewBonanni merged 2 commits intovllm-project:mainfrom
MatthewBonanni:fix_dsv32_oom
Mar 11, 2026
Merged

[Bugfix] Fix DeepSeek V3.2 OOM during CG memory profiling#36691
MatthewBonanni merged 2 commits intovllm-project:mainfrom
MatthewBonanni:fix_dsv32_oom

Conversation

@MatthewBonanni
Copy link
Collaborator

@MatthewBonanni MatthewBonanni commented Mar 10, 2026

Purpose

The cudagraph memory profiler added in #30515 did not account for UniformTypeKVCacheSpecs in init_minimal_kv_cache_for_profiling, so the page_size was being improperly multiplied by the group_size, causing an allocation that was 61x too large. This PR fixes this and takes advantage of the existing num_blocks override mechanism instead of spoofing the available memory, so it should be more robust.

Test Plan

vllm serve deepseek-ai/DeepSeek-V3.2 -tp 8

Test Result

main: OOM during startup
PR: starts up successfully


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni requested a review from njhill as a code owner March 10, 2026 18:03
@MatthewBonanni MatthewBonanni added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026
@mergify mergify bot added deepseek Related to DeepSeek models v1 bug Something isn't working labels Mar 10, 2026
@robertgshaw2-redhat
Copy link
Collaborator

do we need a bugfix in 0.17 for this?

@MatthewBonanni
Copy link
Collaborator Author

@robertgshaw2-redhat no, it was introduced by #30515, which isn't in 0.17. Updated the PR description to clarify

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an out-of-memory issue during CUDA graph memory profiling for DeepSeek V3.2. The fix correctly initializes the minimal KV cache by using the num_gpu_blocks_override mechanism, which is a more robust approach than the previous memory calculation that was incorrect for UniformTypeKVCacheSpecs. The change is sound, but I've suggested an improvement to ensure the configuration is always restored to its original state, even in the case of an exception, by using a try...finally block.

Note: Security Review did not run due to the size of the PR.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MatthewBonanni , maybe just add a short comment?

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni enabled auto-merge (squash) March 11, 2026 02:42
@MatthewBonanni MatthewBonanni merged commit 8ab3d74 into vllm-project:main Mar 11, 2026
56 checks passed
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants