[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP by youkaichao · Pull Request #37449 · vllm-project/vllm

youkaichao · 2026-03-18T15:40:02Z

Purpose

See https://forums.developer.nvidia.com/t/when-a-thread-has-a-primary-cuda-context-does-the-child-thread-it-creates-automatically-inherit-the-cuda-context/362810 , a new thread does not have any cuda context, and later cuda runtime call might create a context in device 0.

Test Plan

Run vLLM serve with EP/DP:

vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507 -dp 2 -ep --port 8899

test with multiple requests:

vllm bench serve \
  --backend openai \
  --base-url http://127.0.0.1:8899 \
  --model Qwen/Qwen3-30B-A3B-Instruct-2507 \
  --dataset-name random \
  --num-prompts 100 \
  --input-len 1024 \
  --output-len 256

Test Result

Before the fix, after the benchmark script, worker 1 will take around 800 MiB memory on device 0.

After the fix, each worker only resides on one GPU.

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao · 2026-03-18T15:41:01Z

vllm/v1/executor/multiproc_executor.py

            )
            self.worker.load_model()

+        scheduler_config = vllm_config.scheduler_config


move this part of code here, to be after self.worker.init_device(), so that self.worker.device is initialized properly.

gemini-code-assist

Code Review

This pull request addresses a critical issue where asynchronous scheduling threads could implicitly create an extra CUDA context on device 0, leading to unnecessary memory consumption, especially in Expert Parallel (EP) and Data Parallel (DP) setups. The changes correctly relocate the asynchronous output copy thread initialization to ensure the worker is fully loaded before the thread starts. Additionally, the async_output_busy_loop method now explicitly sets the CUDA device for the thread to match the worker's assigned device, preventing the creation of unintended contexts. This is a well-targeted fix that directly resolves the described bug and improves resource management.

njhill

Nice find, thanks @youkaichao!

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com>

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com>

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

fix async scheduling cuda context

aa50310

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested a review from njhill as a code owner March 18, 2026 15:40

youkaichao commented Mar 18, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

mergify bot added nvidia v1 bug Something isn't working labels Mar 18, 2026

github-project-automation bot added this to NVIDIA Mar 18, 2026

njhill approved these changes Mar 18, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Mar 18, 2026

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2026

youkaichao enabled auto-merge (squash) March 18, 2026 16:33

youkaichao merged commit 70b81c4 into vllm-project:main Mar 18, 2026
49 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Mar 18, 2026

youkaichao deleted the fix_async_context branch March 19, 2026 02:32

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[bugfix][async scheduling] fix extra cuda context in device 0 with EP…

14bc8a5

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

[bugfix][async scheduling] fix extra cuda context in device 0 with EP…

4862a62

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[bugfix][async scheduling] fix extra cuda context in device 0 with EP…

7606842

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[bugfix][async scheduling] fix extra cuda context in device 0 with EP…

46a1120

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[bugfix][async scheduling] fix extra cuda context in device 0 with EP…

dbde452

…/DP (vllm-project#37449) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP#37449

[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP#37449
youkaichao merged 1 commit intovllm-project:mainfrom
youkaichao:fix_async_context

youkaichao commented Mar 18, 2026 •

edited by github-actions bot

Loading

Uh oh!

youkaichao Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

youkaichao commented Mar 18, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

youkaichao Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

youkaichao commented Mar 18, 2026 •

edited by github-actions bot

Loading