[Bugfix] Fix test_whisper distributed test stability: torch.compile flakiness and memory utilization by dzhengAP · Pull Request #42092 · vllm-project/vllm

dzhengAP · 2026-05-08T16:30:41Z

Follow-up to #41423 and #42038.

test_models_distributed in test_whisper.py was failing in CI build #65117 due to two issues:

torch.compile flakiness — with enforce_eager=False, the test triggers torch.compile/AOT cache setup which can fail non-deterministically.
Fix: use enforce_eager=True for the distributed correctness test.
Leftover GPU memory — the Whisper test runs last (command 7/7) in the CI job. Earlier tests leave ~6.6 GiB of GPU memory occupied, causing vLLM's startup memory check to fail: Free memory on device cuda:0 (15.41/22.05 GiB) is less than desired GPU memory utilization (0.92, 20.28 GiB).
Fix: lower gpu_memory_utilization to 0.7 — sufficient for max_model_len=448. Same issue also catched by @SoluMilken

…y_utilization and enforce_eager Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request modifies the Whisper generation tests by removing the process spawning decorator for distributed tests and adjusting GPU memory utilization settings. Feedback focuses on restoring the spawn decorator and its import to maintain CUDA stability, as well as further reducing the gpu_memory_utilization to 0.65 to ensure reliable execution within CI memory constraints.

… test Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>

mergify · 2026-05-08T17:22:49Z

⚠️ The sha of the head commit of this PR conflicts with #42038. Mergify cannot evaluate rules on this PR. Once #42038 is merged or closed, Mergify will resume processing this PR. ⚠️

dzhengAP · 2026-05-08T22:16:19Z

Closing this PR because its changes have been folded into #42038. Please review #42038 instead.

[Bugfix] Fix test_whisper distributed test stability: lower gpu_memor…

7968143

…y_utilization and enforce_eager Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>

dzhengAP requested review from DarkLight1337 and ywang96 as code owners May 8, 2026 16:30

claude Bot reviewed May 8, 2026

View reviewed changes

dzhengAP changed the title ~~[Bugfix] Fix test_whisper distributed test stability: torch.complie flakiness and memory utilization~~ [Bugfix] Fix test_whisper distributed test stability: torch.compile flakiness and memory utilization May 8, 2026

mergify Bot added multi-modality Related to multi-modality (#4194) bug Something isn't working labels May 8, 2026

dzhengAP mentioned this pull request May 8, 2026

[Bugfix] Fix test_whisper distributed test process handling #42038

Merged

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread tests/models/multimodal/generation/test_whisper.py

Comment thread tests/models/multimodal/generation/test_whisper.py

Comment thread tests/models/multimodal/generation/test_whisper.py

Comment thread tests/models/multimodal/generation/test_whisper.py

Add method param to multi_gpu_test, use spawn for Whisper distributed…

38d129e

… test Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>

dzhengAP force-pushed the bugfix/fix-whisper-distributed-stability branch from b82cab3 to 38d129e Compare May 8, 2026 17:20

dzhengAP closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix test_whisper distributed test stability: torch.compile flakiness and memory utilization#42092

[Bugfix] Fix test_whisper distributed test stability: torch.compile flakiness and memory utilization#42092
dzhengAP wants to merge 2 commits intovllm-project:mainfrom
dzhengAP:bugfix/fix-whisper-distributed-stability

dzhengAP commented May 8, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 8, 2026

Uh oh!

dzhengAP commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dzhengAP commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 8, 2026

Uh oh!

dzhengAP commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dzhengAP commented May 8, 2026 •

edited

Loading