[BugFix] Add sleep to fix tight loop and release GIL by alec-flowers · Pull Request #29476 · vllm-project/vllm

alec-flowers · 2025-11-26T03:58:13Z

Potential Fix for - #29369

While not very elegant it does do the job of releasing the GIL.

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: alec-flowers <aflowers@nvidia.com>

chatgpt-codex-connector · 2025-11-26T03:58:28Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request introduces a time.sleep(0.001) to address a potential tight loop in the engine's busy-wait cycle. This occurs when there are pending requests (e.g., waiting for remote KV cache) but no model execution can be scheduled. By yielding the Global Interpreter Lock (GIL), this change prevents the starvation of background threads, which is critical for operations like distributed handshakes. The fix is well-targeted, conditional, and a pragmatic solution to a potential deadlock or performance degradation issue.

njhill

Thanks @alec-flowers. I think this might actually be the best fix given the current design.

Ideally we should avoid a hot loop altogether in this scenario but that would require more significant rework.

robertgshaw2-redhat · 2025-12-08T15:25:51Z

I have a feeling this is an ARM vs x86 issue: #30228

njhill · 2025-12-17T16:43:19Z

I have a feeling this is an ARM vs x86 issue: #30228

The sched_yield thing only applies to multiproc executor case where we spin on the shm queue, this issue is kind of separate to that and would apply in uniproc case too.

Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>

njhill

Thanks @alec-flowers!

) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>

) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

- Remove UniProcExecutor GIL contention workaround (fixed upstream in vllm-project/vllm#29476) - Fix _uses_nixl_connector() and _uses_dynamo_connector() to detect connectors nested inside PdConnector's kv_connector_extra_config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: alec-flowers <aflowers@nvidia.com>

) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>

add sleep to fix tight loop and release GIL

bf07ad6

Signed-off-by: alec-flowers <aflowers@nvidia.com>

mergify Bot added the v1 label Nov 26, 2025

gemini-code-assist Bot reviewed Nov 26, 2025

View reviewed changes

njhill reviewed Dec 6, 2025

View reviewed changes

Comment thread vllm/v1/engine/core.py Outdated

njhill changed the title ~~add sleep to fix tight loop and release GIL~~ [BugFix] Add sleep to fix tight loop and release GIL Dec 17, 2025

alec-flowers and others added 2 commits December 17, 2025 10:00

Update vllm/v1/engine/core.py

f66be46

Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>

Merge branch 'main' into fix-tight-loop

9107dce

njhill approved these changes Dec 17, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 17, 2025

njhill merged commit 62be367 into vllm-project:main Dec 18, 2025
45 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Add sleep to fix tight loop and release GIL#29476

[BugFix] Add sleep to fix tight loop and release GIL#29476
njhill merged 3 commits intovllm-project:mainfrom
alec-flowers:fix-tight-loop

alec-flowers commented Nov 26, 2025 •

edited by github-actions Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented Nov 26, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

robertgshaw2-redhat commented Dec 8, 2025

Uh oh!

njhill commented Dec 17, 2025

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

alec-flowers commented Nov 26, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Nov 26, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robertgshaw2-redhat commented Dec 8, 2025

Uh oh!

njhill commented Dec 17, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alec-flowers commented Nov 26, 2025 •

edited by github-actions Bot

Loading