Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions vllm/v1/engine/core_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,14 +384,22 @@ class BackgroundResources:
# processing threads can access it without holding a ref to the client.
engine_dead: bool = False

# Guard against double-cleanup
_cleaned_up: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To support the thread-safe idempotency guard in __call__, a lock should be added to the BackgroundResources dataclass. This requires importing Lock from threading and field from dataclasses.

Suggested change
_cleaned_up: bool = False
_cleaned_up: bool = False
_cleanup_lock: "Lock" = field(default_factory=Lock, init=False, repr=False)


def __call__(self):
"""Clean up background resources."""
if self._cleaned_up:
return
self._cleaned_up = True
Comment on lines +392 to +394
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current idempotency check is not thread-safe. A race condition can occur where two threads both check self._cleaned_up before it's set to True, leading to the cleanup logic running twice. Using a threading.Lock ensures that the check-and-set operation is atomic, preventing this race.

I'm suggesting a change that introduces a lock to the BackgroundResources class and uses it within __call__ to safely manage the cleanup state. You'll need to add from threading import Lock and from dataclasses import field at the top of the file to apply this suggestion.

Suggested change
if self._cleaned_up:
return
self._cleaned_up = True
with self._cleanup_lock:
if self._cleaned_up:
return
self._cleaned_up = True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's GIL makes bool read/write atomic, and the underlying cleanup operations (close(linger=0), task.cancel(), setting attrs to None) are all individually idempotent. The _cleaned_up flag is an optimization to skip redundant work, not a correctness gate. Adding a Lock to a weakref finalizer target introduces complexity without practical benefit here.


self.engine_dead = True
if self.engine_manager is not None:
self.engine_manager.shutdown()
self.engine_manager = None
if self.coordinator is not None:
self.coordinator.shutdown()
self.coordinator = None

if isinstance(self.output_socket, zmq.asyncio.Socket):
# Async case.
Expand Down Expand Up @@ -424,8 +432,10 @@ def close_sockets_and_tasks():
del tasks
del close_sockets_and_tasks
close_sockets(sockets)
del self.output_queue_task
del self.stats_update_task

# Clear references instead of deleting attributes
self.output_queue_task = None
self.stats_update_task = None
else:
# Sync case.

Expand Down