Skip to content

[Bugfix] Fix test_whisper distributed test stability: torch.compile flakiness and memory utilization#42092

Closed
dzhengAP wants to merge 2 commits intovllm-project:mainfrom
dzhengAP:bugfix/fix-whisper-distributed-stability
Closed

[Bugfix] Fix test_whisper distributed test stability: torch.compile flakiness and memory utilization#42092
dzhengAP wants to merge 2 commits intovllm-project:mainfrom
dzhengAP:bugfix/fix-whisper-distributed-stability

Conversation

@dzhengAP
Copy link
Copy Markdown
Contributor

@dzhengAP dzhengAP commented May 8, 2026

Follow-up to #41423 and #42038.

test_models_distributed in test_whisper.py was failing in CI build #65117 due to two issues:

  1. torch.compile flakiness — with enforce_eager=False, the test triggers torch.compile/AOT cache setup which can fail non-deterministically.
    Fix: use enforce_eager=True for the distributed correctness test.

  2. Leftover GPU memory — the Whisper test runs last (command 7/7) in the CI job. Earlier tests leave ~6.6 GiB of GPU memory occupied, causing vLLM's startup memory check to fail: Free memory on device cuda:0 (15.41/22.05 GiB) is less than desired GPU memory utilization (0.92, 20.28 GiB).
    Fix: lower gpu_memory_utilization to 0.7 — sufficient for max_model_len=448. Same issue also catched by @SoluMilken

@ProExpertProg @DarkLight1337

…y_utilization and enforce_eager

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@dzhengAP dzhengAP changed the title [Bugfix] Fix test_whisper distributed test stability: torch.complie flakiness and memory utilization [Bugfix] Fix test_whisper distributed test stability: torch.compile flakiness and memory utilization May 8, 2026
@mergify mergify Bot added multi-modality Related to multi-modality (#4194) bug Something isn't working labels May 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the Whisper generation tests by removing the process spawning decorator for distributed tests and adjusting GPU memory utilization settings. Feedback focuses on restoring the spawn decorator and its import to maintain CUDA stability, as well as further reducing the gpu_memory_utilization to 0.65 to ensure reliable execution within CI memory constraints.

Comment thread tests/models/multimodal/generation/test_whisper.py
Comment thread tests/models/multimodal/generation/test_whisper.py
Comment thread tests/models/multimodal/generation/test_whisper.py
Comment thread tests/models/multimodal/generation/test_whisper.py
… test

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
@dzhengAP dzhengAP force-pushed the bugfix/fix-whisper-distributed-stability branch from b82cab3 to 38d129e Compare May 8, 2026 17:20
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 8, 2026

⚠️ The sha of the head commit of this PR conflicts with #42038. Mergify cannot evaluate rules on this PR. Once #42038 is merged or closed, Mergify will resume processing this PR. ⚠️

@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 8, 2026

Closing this PR because its changes have been folded into #42038. Please review #42038 instead.

@dzhengAP dzhengAP closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working multi-modality Related to multi-modality (#4194)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant