Skip to content

[Bugfix] Fix test_whisper distributed test process handling#42038

Merged
jikunshang merged 6 commits intovllm-project:mainfrom
dzhengAP:bugfix/fix-whisper-distributed-double-spawn
May 9, 2026
Merged

[Bugfix] Fix test_whisper distributed test process handling#42038
jikunshang merged 6 commits intovllm-project:mainfrom
dzhengAP:bugfix/fix-whisper-distributed-double-spawn

Conversation

@dzhengAP
Copy link
Copy Markdown
Contributor

@dzhengAP dzhengAP commented May 8, 2026

Follow-up to #41423

  • Add method parameter to multi_gpu_test — allows callers to explicitly specify spawn/fork instead of always defaulting to platform detection. This avoids the need for a separate @create_new_process_for_each_test("spawn") decorator on top of @multi_gpu_test.
  • Use method="spawn" for Whisper distributed test — replaces the double-decorator pattern with a single @multi_gpu_test(num_gpus=2, method="spawn"), ensuring a clean CUDA environment without double-wrapping.
  • Lower gpu_memory_utilization=0.7 — the Whisper test runs last in CI (command 7/7), by which point earlier tests leave ~6.6 GiB of GPU memory occupied. max_model_len=448 doesn't need the default 0.92 reservation.
  • enforce_eager=True — avoids torch.compile/AOT cache flakiness in the distributed correctness test.

cc @ProExpertProg

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added multi-modality Related to multi-modality (#4194) bug Something isn't working labels May 8, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 8, 2026

Hi @dzhengAP, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the create_new_process_for_each_test utility import and its corresponding decorator from the test_models_distributed function within the Whisper model tests. I have no feedback to provide as no review comments were included in the request.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 8, 2026
@DarkLight1337
Copy link
Copy Markdown
Member

https://buildkite.com/vllm/ci/builds/65117/canvas?jid=019e067c-c2a9-467e-831a-dfe3e0bdb43e&tab=output are failing

@ProExpertProg ProExpertProg enabled auto-merge (squash) May 8, 2026 13:57
@ProExpertProg ProExpertProg disabled auto-merge May 8, 2026 13:58
@ProExpertProg
Copy link
Copy Markdown
Collaborator

Oh yeah the failure is not fixed yet

@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 8, 2026 via email

@SoluMilken
Copy link
Copy Markdown
Contributor

Hi @dzhengAP @ProExpertProg @DarkLight1337,

I noticed that in the CI build #65117, the Whisper distributed test seems to start with leftover GPU memory already in use. The failure is:

Free memory on device cuda:0 (15.41/22.05 GiB) on startup is less than desired GPU memory utilization (0.92, 20.28 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.

So even before the test really runs, vLLM fails the startup memory check.

I can think of two possible ways to handle this:

  1. Lower gpu_memory_utilization for this Whisper distributed correctness test, since it uses max_model_len=448/max_tokens=200 and probably does not need the default 0.92 reservation.
  2. Investigate/clean up the leftover GPU memory from earlier distributed/Ray/vLLM tests, so this test starts with a clean GPU state.

Which direction do reviewers think is better here?

Thanks.

…y_utilization and enforce_eager

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 8, 2026

Hi @SoluMilken, this is a good point. the memory part was also included when I fixed the torch.compile flakiness issue, and Gemini also suggested some other settings related to memory, please check here #42092 I will also mention you catch the same issue there .

@ProExpertProg @DarkLight1337 would you check the fix? thanks

@SoluMilken
Copy link
Copy Markdown
Contributor

@dzhengAP @ProExpertProg @DarkLight1337 I'm not entirely sure, but should we just combine this PR and #42092? That way we can run the CI together and see if it actually passes. Thanks.

… test

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
@dzhengAP dzhengAP force-pushed the bugfix/fix-whisper-distributed-double-spawn branch from 061c236 to 38d129e Compare May 8, 2026 17:22
@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 8, 2026

Changes

  1. Add method parameter to multi_gpu_test — allows callers to explicitly specify spawn/fork instead of always defaulting to platform detection. This avoids the need for a separate @create_new_process_for_each_test("spawn") decorator on top of @multi_gpu_test.
  2. Use method="spawn" for Whisper distributed test — replaces the double-decorator pattern with a single @multi_gpu_test(num_gpus=2, method="spawn"), ensuring a clean CUDA environment without double-wrapping.
  3. Lower gpu_memory_utilization=0.7 — the Whisper test runs last in CI (command 7/7), by which point earlier tests leave ~6.6 GiB of GPU memory occupied. max_model_len=448 doesn't need the default 0.92 reservation.
  4. enforce_eager=True — avoids torch.compile/AOT cache flakiness in the distributed correctness test.

Fixes failures in CI builds #64792 and #65117.

Closes #42092.(Combined here)

cc @jikunshang @ProExpertProg @SoluMilken

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 8, 2026

Hi @dzhengAP, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 8, 2026

checking new CI failures now.
Updated the Whisper distributed test fix.

Instead of passing method="spawn" into multi_gpu_test, which caused mypy failures because multi_gpu_test only accepts num_gpus, I restored the explicit process decorator:

  • Kept @multi_gpu_test(num_gpus=2)
  • Added @create_new_process_for_each_test("spawn")
  • Imported create_new_process_for_each_test from tests.utils

Validation:
PYTHONPATH= MYPYPATH= pre-commit run mypy-3.10 --hook-stage manual --files tests/models/multimodal/generation/test_whisper.py

Result: passed.

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 8, 2026

Hi @dzhengAP, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@dzhengAP dzhengAP changed the title [Bugfix] Remove redundant @create_new_process_for_each_test from test_whisper distributed test [Bugfix] Fix test_whisper distributed test process handling May 8, 2026
@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 8, 2026

Failure: The 2-GPU distributed Whisper tests were the only one during vLLM startup because gpu_memory_utilization=0.7 requested ~15.43 GiB while CI had ~15.41 GiB free. Lower this test to 0.65 to provide startup memory headroom.
**Root cause: ** the distributed Whisper CI tests requested slightly more GPU memory than was available at startup. With gpu_memory_utilization=0.7, vLLM tried to reserve about 15.43 GiB, but CI only had about 15.41 GiB free on each GPU, so engine initialization failed before the test could run.

Failing tests:
tests/models/multimodal/generation/test_whisper.py::test_models_distributed[5-200-half-ray-openai/whisper-large-v3-turbo]
tests/models/multimodal/generation/test_whisper.py::test_models_distributed[5-200-half-mp-openai/whisper-large-v3-turbo]

Fix: lower the Whisper distributed test’s memory utilization margin:

  •    gpu_memory_utilization=0.7,
    
  •    gpu_memory_utilization=0.65,
    

@jikunshang
Copy link
Copy Markdown
Collaborator

may i know why you say:
enforce_eager=True — avoids torch.compile/AOT cache flakiness in the distributed correctness test.
torch.compile cache is empty when container launch.

@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 9, 2026 via email

@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 9, 2026

@DarkLight1337 @ProExpertProg all passed on the distributed test, can we merge?

Comment thread tests/models/multimodal/generation/test_whisper.py Outdated
Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
@dzhengAP dzhengAP force-pushed the bugfix/fix-whisper-distributed-double-spawn branch from cf102df to 5b7447a Compare May 9, 2026 01:39
@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 9, 2026

@ProExpertProg Hi Luka, all passed with eager mode off.

@jikunshang
Copy link
Copy Markdown
Collaborator

thanks for fixing! merged

@jikunshang jikunshang merged commit 845ca32 into vllm-project:main May 9, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants