[BugFix] Fix memory spike in workspace allocation by LucasWilkinson · Pull Request #30744 · vllm-project/vllm

LucasWilkinson · 2025-12-16T05:09:26Z

Attempt to fix: https://buildkite.com/vllm/ci/builds/43469#019b1ba9-b250-451b-8125-dc941489fe04

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

chatgpt-codex-connector · 2025-12-16T05:09:34Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request addresses a memory spike issue during workspace allocation by replacing torch.Tensor.resize_ with a manual deallocation and reallocation process. The change correctly identifies that resize_ can temporarily double memory usage and cause out-of-memory errors. The implementation correctly de-references the old tensor to allow for garbage collection before allocating a new, larger tensor. This is a good fix that should effectively mitigate the memory spikes. The logic is sound and the implementation is correct.

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

jeejeelee · 2025-12-16T08:23:38Z

.buildkite/test-pipeline.yaml

    # FIXIT: find out which code initialize cuda before running the test
    # before the fix, we need to use spawn to test it
    - export VLLM_WORKER_MULTIPROC_METHOD=spawn
+    # Alot of these tests are on the edge of OOMing


NIT

Suggested change

# Alot of these tests are on the edge of OOMing

# A lot of these tests are on the edge of OOMing

jeejeelee

Thank you for fixing

DarkLight1337 · 2025-12-16T14:47:14Z

Will open a separate fix for the failing fusion tests, it is related to the recent deprecation #30396.

DarkLight1337 · 2025-12-16T14:50:03Z

Fixed by #30787

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> (cherry picked from commit 00a8d76)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

fix memory spike

8d79eb1

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mergify bot added the v1 label Dec 16, 2025

gemini-code-assist bot reviewed Dec 16, 2025

View reviewed changes

fix

3a101a4

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mergify bot added the ci/build label Dec 16, 2025

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 16, 2025

LucasWilkinson mentioned this pull request Dec 16, 2025

[Bugfix] Fix RequestOutput miss lora_request #30636

Merged

5 tasks

jeejeelee reviewed Dec 16, 2025

View reviewed changes

jeejeelee approved these changes Dec 16, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) December 16, 2025 11:58

Merge branch 'main' into lwilkinson/fix-memory-spike

6c61f3a

vllm-bot merged commit 00a8d76 into main Dec 16, 2025
48 of 51 checks passed

vllm-bot deleted the lwilkinson/fix-memory-spike branch December 16, 2025 14:46

khluu pushed a commit that referenced this pull request Dec 17, 2025

[BugFix] Fix memory spike in workspace allocation (#30744)

761b730

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> (cherry picked from commit 00a8d76)

LucasWilkinson mentioned this pull request Dec 17, 2025

[Do not merge][Test] Revert "[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2" #30715

Closed

fort726 pushed a commit to fort726/vllm that referenced this pull request Jan 6, 2026

[BugFix] Fix memory spike in workspace allocation (vllm-project#30744)

fa79626

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[BugFix] Fix memory spike in workspace allocation (vllm-project#30744)

ee50ada

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix memory spike in workspace allocation#30744

[BugFix] Fix memory spike in workspace allocation#30744
vllm-bot merged 3 commits intomainfrom
lwilkinson/fix-memory-spike

LucasWilkinson commented Dec 16, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jeejeelee Dec 16, 2025

Uh oh!

jeejeelee left a comment

Uh oh!

Uh oh!

DarkLight1337 commented Dec 16, 2025

Uh oh!

DarkLight1337 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	# Alot of these tests are on the edge of OOMing
	# A lot of these tests are on the edge of OOMing

Uh oh!

Conversation

LucasWilkinson commented Dec 16, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jeejeelee Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Dec 16, 2025

Uh oh!

DarkLight1337 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants