[CI] Fix BackgroundResources double-cleanup crash by adding guard by AndreasKaratzas · Pull Request #36299 · vllm-project/vllm

AndreasKaratzas · 2026-03-07T00:59:51Z

Fixes regression after: #34730

BackgroundResources.__call__() crashes with AttributeError: 'BackgroundResources' object has no attribute 'output_queue_task' when invoked more than once. This happens because the cleanup path uses del self.output_queue_task which removes the attribute entirely, so a second call fails.

This is triggered in practice when the engine monitor thread detects a dead engine and calls shutdown(), followed by the caller (e.g. a test) also calling shutdown() explicitly. Both paths end up invoking self.resources().

Add a _cleaned_up bool guard so __call__ subsequent calls are a no-op.
Replace del self.output_queue_task / del self.stats_update_task with = None assignments to clear references without removing the attribute.
Nil out engine_manager and coordinator after shutdown to prevent double-shutdown when MPClient.shutdown() calls engine_manager.shutdown(timeout=...) followed by self.resources().

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request addresses a crash caused by double-cleanup in BackgroundResources by introducing an idempotency guard. It also correctly replaces del with None assignments to prevent AttributeError on subsequent cleanup attempts. The changes are logical and directly fix the described issue. I've added one suggestion to make the idempotency guard thread-safe.

gemini-code-assist · 2026-03-07T01:07:37Z

vllm/v1/engine/core_client.py

+        if self._cleaned_up:
+            return
+        self._cleaned_up = True


The current idempotency check is not thread-safe. A race condition can occur where two threads both check self._cleaned_up before it's set to True, leading to the cleanup logic running twice. Using a threading.Lock ensures that the check-and-set operation is atomic, preventing this race.

I'm suggesting a change that introduces a lock to the BackgroundResources class and uses it within __call__ to safely manage the cleanup state. You'll need to add from threading import Lock and from dataclasses import field at the top of the file to apply this suggestion.

Suggested change

if self._cleaned_up:

return

self._cleaned_up = True

with self._cleanup_lock:

if self._cleaned_up:

return

self._cleaned_up = True

Python's GIL makes bool read/write atomic, and the underlying cleanup operations (close(linger=0), task.cancel(), setting attrs to None) are all individually idempotent. The _cleaned_up flag is an optimization to skip redundant work, not a correctness gate. Adding a Lock to a weakref finalizer target introduces complexity without practical benefit here.

gemini-code-assist · 2026-03-07T01:07:37Z

vllm/v1/engine/core_client.py

    engine_dead: bool = False

+    # Guard against double-cleanup
+    _cleaned_up: bool = False


To support the thread-safe idempotency guard in __call__, a lock should be added to the BackgroundResources dataclass. This requires importing Lock from threading and field from dataclasses.

Suggested change

_cleaned_up: bool = False

_cleaned_up: bool = False

_cleanup_lock: "Lock" = field(default_factory=Lock, init=False, repr=False)

AndreasKaratzas · 2026-03-07T01:11:17Z

cc @njhill @markmc

AndreasKaratzas · 2026-03-07T01:11:47Z

Adding the ready label for tests to start.

njhill · 2026-03-07T01:14:19Z

Thanks @AndreasKaratzas, I think this may already be fixed by #36270

AndreasKaratzas · 2026-03-07T01:28:29Z

Thanks @AndreasKaratzas, I think this may already be fixed by #36270

@njhill Oh let me test again without this then and let you know.

AndreasKaratzas · 2026-03-07T01:30:13Z

@njhill Indeed, closing.

njhill · 2026-03-07T01:48:47Z

Thanks @AndreasKaratzas 🙏

[CI] Fix BackgroundResources double-cleanup crash by adding guard

fb0ad88

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested a review from njhill as a code owner March 7, 2026 00:59

mergify bot added the v1 label Mar 7, 2026

AndreasKaratzas mentioned this pull request Mar 7, 2026

[CI Failure]: mi325_1: Async Engine, Inputs, Utils, Worker, Config Test (CPU) #34365

Closed

3 tasks

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

AndreasKaratzas added the rocm Related to AMD ROCm label Mar 7, 2026

github-project-automation bot added this to AMD Mar 7, 2026

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 7, 2026

github-project-automation bot moved this to Todo in AMD Mar 7, 2026

Merge remote-tracking branch 'origin/main' into akaratza_fix_cleanup

23e7532

AndreasKaratzas closed this Mar 7, 2026

github-project-automation bot moved this from Todo to Done in AMD Mar 7, 2026

AndreasKaratzas deleted the akaratza_fix_cleanup branch March 7, 2026 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Fix BackgroundResources double-cleanup crash by adding guard#36299

[CI] Fix BackgroundResources double-cleanup crash by adding guard#36299
AndreasKaratzas wants to merge 2 commits intovllm-project:mainfrom
ROCm:akaratza_fix_cleanup

AndreasKaratzas commented Mar 7, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

AndreasKaratzas Mar 7, 2026

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

njhill commented Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

njhill commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_cleaned_up: bool = False
	_cleaned_up: bool = False
	_cleanup_lock: "Lock" = field(default_factory=Lock, init=False, repr=False)

Uh oh!

Conversation

AndreasKaratzas commented Mar 7, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

njhill commented Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

njhill commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreasKaratzas commented Mar 7, 2026 •

edited by github-actions bot

Loading