Skip to content

[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP#37449

Merged
youkaichao merged 1 commit intovllm-project:mainfrom
youkaichao:fix_async_context
Mar 18, 2026
Merged

[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP#37449
youkaichao merged 1 commit intovllm-project:mainfrom
youkaichao:fix_async_context

Conversation

@youkaichao
Copy link
Copy Markdown
Member

@youkaichao youkaichao commented Mar 18, 2026

Purpose

See https://forums.developer.nvidia.com/t/when-a-thread-has-a-primary-cuda-context-does-the-child-thread-it-creates-automatically-inherit-the-cuda-context/362810 , a new thread does not have any cuda context, and later cuda runtime call might create a context in device 0.

Test Plan

Run vLLM serve with EP/DP:

vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507 -dp 2 -ep --port 8899

test with multiple requests:

vllm bench serve \
  --backend openai \
  --base-url http://127.0.0.1:8899 \
  --model Qwen/Qwen3-30B-A3B-Instruct-2507 \
  --dataset-name random \
  --num-prompts 100 \
  --input-len 1024 \
  --output-len 256

Test Result

Before the fix, after the benchmark script, worker 1 will take around 800 MiB memory on device 0.

After the fix, each worker only resides on one GPU.

Signed-off-by: youkaichao <youkaichao@gmail.com>
@youkaichao youkaichao requested a review from njhill as a code owner March 18, 2026 15:40
)
self.worker.load_model()

scheduler_config = vllm_config.scheduler_config
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this part of code here, to be after self.worker.init_device(), so that self.worker.device is initialized properly.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical issue where asynchronous scheduling threads could implicitly create an extra CUDA context on device 0, leading to unnecessary memory consumption, especially in Expert Parallel (EP) and Data Parallel (DP) setups. The changes correctly relocate the asynchronous output copy thread initialization to ensure the worker is fully loaded before the thread starts. Additionally, the async_output_busy_loop method now explicitly sets the CUDA device for the thread to match the worker's assigned device, preventing the creation of unintended contexts. This is a well-targeted fix that directly resolves the described bug and improves resource management.

@mergify mergify bot added nvidia v1 bug Something isn't working labels Mar 18, 2026
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, thanks @youkaichao!

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Mar 18, 2026
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2026
@youkaichao youkaichao enabled auto-merge (squash) March 18, 2026 16:33
@youkaichao youkaichao merged commit 70b81c4 into vllm-project:main Mar 18, 2026
49 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Mar 18, 2026
@youkaichao youkaichao deleted the fix_async_context branch March 19, 2026 02:32
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…/DP (vllm-project#37449)

Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026
…/DP (vllm-project#37449)

Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants