Skip to content

[BugFix] Add sleep to fix tight loop and release GIL#29476

Merged
njhill merged 3 commits intovllm-project:mainfrom
alec-flowers:fix-tight-loop
Dec 18, 2025
Merged

[BugFix] Add sleep to fix tight loop and release GIL#29476
njhill merged 3 commits intovllm-project:mainfrom
alec-flowers:fix-tight-loop

Conversation

@alec-flowers
Copy link
Copy Markdown
Contributor

@alec-flowers alec-flowers commented Nov 26, 2025

Potential Fix for - #29369

While not very elegant it does do the job of releasing the GIL.

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: alec-flowers <aflowers@nvidia.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify Bot added the v1 label Nov 26, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a time.sleep(0.001) to address a potential tight loop in the engine's busy-wait cycle. This occurs when there are pending requests (e.g., waiting for remote KV cache) but no model execution can be scheduled. By yielding the Global Interpreter Lock (GIL), this change prevents the starvation of background threads, which is critical for operations like distributed handshakes. The fix is well-targeted, conditional, and a pragmatic solution to a potential deadlock or performance degradation issue.

Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alec-flowers. I think this might actually be the best fix given the current design.

Ideally we should avoid a hot loop altogether in this scenario but that would require more significant rework.

Comment thread vllm/v1/engine/core.py Outdated
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

I have a feeling this is an ARM vs x86 issue: #30228

@njhill
Copy link
Copy Markdown
Member

njhill commented Dec 17, 2025

I have a feeling this is an ARM vs x86 issue: #30228

The sched_yield thing only applies to multiproc executor case where we spin on the shm queue, this issue is kind of separate to that and would apply in uniproc case too.

@njhill njhill changed the title add sleep to fix tight loop and release GIL [BugFix] Add sleep to fix tight loop and release GIL Dec 17, 2025
alec-flowers and others added 2 commits December 17, 2025 10:00
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alec-flowers!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 17, 2025
@njhill njhill merged commit 62be367 into vllm-project:main Dec 18, 2025
45 of 46 checks passed
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025
)

Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
)

Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
alec-flowers added a commit to ai-dynamo/dynamo that referenced this pull request Feb 24, 2026
- Remove UniProcExecutor GIL contention workaround (fixed upstream
  in vllm-project/vllm#29476)
- Fix _uses_nixl_connector() and _uses_dynamo_connector() to detect
  connectors nested inside PdConnector's kv_connector_extra_config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: alec-flowers <aflowers@nvidia.com>
alec-flowers added a commit to ai-dynamo/dynamo that referenced this pull request Feb 24, 2026
- Remove UniProcExecutor GIL contention workaround (fixed upstream
  in vllm-project/vllm#29476)
- Fix _uses_nixl_connector() and _uses_dynamo_connector() to detect
  connectors nested inside PdConnector's kv_connector_extra_config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: alec-flowers <aflowers@nvidia.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
)

Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants