[DO NOT MERGE][Core] Revert "Fix benign error log during normal shutdown (#36270)" by markmc · Pull Request #36646 · vllm-project/vllm

markmc · 2026-03-10T11:43:22Z

Possible simpler alternative to #36628 fixing #36624

…ect#36270)" This reverts commit 6a18d87. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc · 2026-03-10T11:48:11Z

Looking at:

Shutdown timeout, failed - https://buildkite.com/vllm/ci/builds/54863
Busy loop fixes, passed when I triggered it today - https://buildkite.com/vllm/ci/builds/54834
Nick "fix benign error log", failed when I triggered it today - https://buildkite.com/vllm/ci/builds/55015

Strongly suggests simply reverting #36270 might be sufficient to fix the test

(Even if the revert gets us green again though, that doesn't mean the code and tests are correct! But we can follow-up with fixes that don't break the job)

gemini-code-assist

Code Review

This pull request reverts a previous fix for a benign error log during shutdown. While this may be a step towards an alternative solution, the current changes re-introduce two critical issues. First, the shutdown method is no longer idempotent, which could cause errors if called multiple times. Second, a race condition is re-introduced in monitor_engine_cores, which will likely cause spurious error logs during a normal shutdown. I've added specific comments with suggestions to restore the previous, safer behavior.

gemini-code-assist · 2026-03-10T12:40:39Z

vllm/v1/engine/core_client.py

+        self._finalizer.detach()
+        if self.resources.engine_manager is not None:
+            self.resources.engine_manager.shutdown(timeout=timeout)
+        self.resources()


By removing the check on the return value of self._finalizer.detach(), the shutdown method is no longer idempotent. If this method is called multiple times, it will attempt to shut down the engine manager and clean up resources repeatedly. This could lead to unexpected errors if the underlying shutdown and cleanup operations are not idempotent. It's safer to restore the check to ensure shutdown logic runs only once.

Suggested change

self._finalizer.detach()

if self.resources.engine_manager is not None:

self.resources.engine_manager.shutdown(timeout=timeout)

self.resources()

if self._finalizer.detach() is not None:

if self.resources.engine_manager is not None:

self.resources.engine_manager.shutdown(timeout=timeout)

self.resources()

gemini-code-assist · 2026-03-10T12:40:39Z

vllm/v1/engine/core_client.py

            died = multiprocessing.connection.wait(sentinels)
            _self = self_ref()
-            if not _self or not _self._finalizer.alive or _self.resources.engine_dead:
+            if not _self or _self.resources.engine_dead:


Removing the not _self._finalizer.alive check re-introduces a race condition. During a normal shutdown, engine processes are terminated. This monitor thread can wake up and incorrectly interpret this as an unexpected crash, leading to spurious error logs. The check for _self._finalizer.alive is crucial to distinguish between a controlled shutdown and an actual failure.

Suggested change

if not _self or _self.resources.engine_dead:

if not _self or not _self._finalizer.alive or _self.resources.engine_dead:

markmc · 2026-03-10T12:51:54Z

Looking at:

Shutdown timeout, failed - https://buildkite.com/vllm/ci/builds/54863

Busy loop fixes, passed when I triggered it today - https://buildkite.com/vllm/ci/builds/54834

This only passed because it is the PR branch, which did not include the shutdown timeout change

The tests were still broken on main with this "busy loop fixes" PR - https://buildkite.com/vllm/ci/builds/54959

And sure enough, this revert did not help - https://buildkite.com/vllm/ci/builds/55478

Revert "[Core] Fix benign error log during normal shutdown (vllm-proj…

8e45e03

…ect#36270)" This reverts commit 6a18d87. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc requested a review from njhill as a code owner March 10, 2026 11:43

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026

mergify bot added the v1 label Mar 10, 2026

markmc mentioned this pull request Mar 10, 2026

[Bug] External LB test_external_lb_dp[4] failing since shutdown timeout PR #34730 #36624

Closed

markmc mentioned this pull request Mar 10, 2026

[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) #36628

Merged

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

markmc closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE][Core] Revert "Fix benign error log during normal shutdown (#36270)"#36646

[DO NOT MERGE][Core] Revert "Fix benign error log during normal shutdown (#36270)"#36646
markmc wants to merge 1 commit intovllm-project:mainfrom
markmc:revert-fix-benign-error-log

markmc commented Mar 10, 2026 •

edited

Loading

Uh oh!

markmc commented Mar 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

markmc commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if not _self or _self.resources.engine_dead:
	if not _self or not _self._finalizer.alive or _self.resources.engine_dead:

Uh oh!

Conversation

markmc commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markmc commented Mar 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

markmc commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

markmc commented Mar 10, 2026 •

edited

Loading