Skip to content

[Core][WIP] Check for GPU<->CPU sync during CI#40561

Open
njhill wants to merge 71 commits intovllm-project:mainfrom
njhill:sync-check
Open

[Core][WIP] Check for GPU<->CPU sync during CI#40561
njhill wants to merge 71 commits intovllm-project:mainfrom
njhill:sync-check

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented Apr 21, 2026

vLLM now uses asynchronous scheduling by default and in the majority of cases. Performance relies on the absence of any gpu<->cpu synchronizations on the main cuda stream, but such syncs can be opaque and it is easy for them to creep in accidentally.

This change adds a VLLM_GPU_SYNC_CHECK env var which enables torch.cuda.set_sync_debug_mode for the model forward pass and sampler, so that we can easily check for such syncs.

I'm trying first to enable it globally in the CI to flush out syncs that need to be fixed or where they are unavoidable and the check needs to be suppressed. Will then probably split the fixes into separate PR(s).

Update

Started to open separate PRs fixing identified sync points:

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added ci/build rocm Related to AMD ROCm v1 labels Apr 21, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a GPU-CPU synchronization check mechanism via the VLLM_GPU_SYNC_CHECK environment variable, which is set to "error" by default in the Dockerfiles. The check is applied to the sample_tokens and execute_model methods in the V1 GPU worker using a new decorator. Feedback indicates that the with_gpu_sync_check decorator should be improved to restore the previous synchronization mode rather than resetting to default and should check the environment variable at runtime to support dynamic disabling.

Comment thread vllm/v1/worker/utils.py Outdated
@njhill njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Apr 21, 2026
@njhill njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Apr 22, 2026
njhill added 26 commits April 30, 2026 09:00
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build kv-connector multi-modality Related to multi-modality (#4194) nvidia qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding v1

Projects

Status: Todo
Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants