[BugFix] Use late binding to avoid zmq port conflict race conditions#30520
[BugFix] Use late binding to avoid zmq port conflict race conditions#30520njhill wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a late-binding mechanism for ZeroMQ sockets to prevent race conditions when acquiring network ports. The changes are well-structured, using multiprocessing.Pipe to communicate the OS-assigned port from child processes back to their parents. This pattern is correctly applied in DPCoordinator and APIServerProcessManager. The use of @overload in make_zmq_socket also improves type safety.
However, I've identified a critical issue that could break multi-node deployments by incorrectly resolving wildcard addresses to a loopback IP. Additionally, there's a high-severity issue with error handling in APIServerProcessManager that could mask initialization failures and make debugging more difficult. Please see my detailed comments for suggestions on how to address these points.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
2e6c13f to
78ab95c
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
# Conflicts: # vllm/v1/engine/coordinator.py Signed-off-by: Nick Hill <nickhill123@gmail.com>
| except Exception as e: | ||
| logger.error("Failed to get addresses from DP Coordinator: %s", e) | ||
| raise | ||
| parent_conn.close() |
There was a problem hiding this comment.
Child process crashes when late binding not needed
High Severity
The parent process always creates a pipe and passes it to the child, but only receives from it when needs_late_binding is True. When needs_late_binding is False (common in local-only deployments using IPC addresses), the parent closes parent_conn without receiving. However, the child process (DPCoordinatorProc and MPClient) unconditionally sends on address_report_pipe if it's not None. This causes a BrokenPipeError in the child when it tries to send on a pipe whose read end is already closed, crashing the coordinator or API server process. The same pattern exists in APIServerProcessManager.
🔬 Verification Test
Why verification test was not possible: This bug involves multiprocessing race conditions between parent and child processes in Python's multiprocessing framework. Testing would require spawning actual processes and coordinating their timing, which cannot be done in a simple unit test without the full vLLM environment. The bug is confirmed through code analysis: the parent unconditionally passes the pipe to the child, but only calls recv() when needs_late_binding is True, while the child unconditionally calls send() when the pipe is not None.
Additional Locations (1)
| >>> is_wildcard_addr("ipc:///tmp/socket") | ||
| False | ||
| """ | ||
| return addr.startswith("tcp://") and ":0" in addr |
There was a problem hiding this comment.
Wildcard address check matches IPv6 addresses containing :0
Low Severity
The is_wildcard_addr function uses ":0" in addr to check for wildcard port addresses, but this can incorrectly match IPv6 addresses that contain :0 in the address portion. For example, tcp://[2001:db8::1:0]:8080 would be incorrectly identified as a wildcard because the IPv6 address ends with hex group 0. The check should use addr.endswith(":0") instead, since the port is always at the end of the address string. This causes false positives leading to unnecessary late-binding overhead, though functionality remains correct.
🔬 Verification Test
Test code:
# Test the is_wildcard_addr function logic
def is_wildcard_addr(addr: str) -> bool:
return addr.startswith("tcp://") and ":0" in addr
# True positives (correct)
assert is_wildcard_addr("tcp://*:0") == True
assert is_wildcard_addr("tcp://192.168.1.5:0") == True
assert is_wildcard_addr("tcp://[::1]:0") == True
# True negatives (correct)
assert is_wildcard_addr("tcp://127.0.0.1:8080") == False
assert is_wildcard_addr("ipc:///tmp/socket") == False
# FALSE POSITIVE - this is the bug
# IPv6 address with hex group 0 at the end, but port is 8080 (not wildcard)
result = is_wildcard_addr("tcp://[2001:db8::1:0]:8080")
print(f"tcp://[2001:db8::1:0]:8080 -> {result}")
print(f"Expected: False, Got: {result}")
print(f"Bug confirmed: {result == True}")
# The fix would be:
def is_wildcard_addr_fixed(addr: str) -> bool:
return addr.startswith("tcp://") and addr.endswith(":0")
result_fixed = is_wildcard_addr_fixed("tcp://[2001:db8::1:0]:8080")
print(f"Fixed function result: {result_fixed}")Command run:
python3 -c "
def is_wildcard_addr(addr):
return addr.startswith('tcp://') and ':0' in addr
result = is_wildcard_addr('tcp://[2001:db8::1:0]:8080')
print(f'tcp://[2001:db8::1:0]:8080 -> {result}')
print(f'Expected: False, Got: {result}')
print(f'Bug confirmed: {result == True}')
"
Output:
tcp://[2001:db8::1:0]:8080 -> True
Expected: False, Got: True
Bug confirmed: True
Why this proves the bug: The output shows that an IPv6 address with port 8080 (not a wildcard) is incorrectly identified as a wildcard address because :0 appears in the IPv6 portion of the address.
| "front_publish_address": front_publish_address, | ||
| "back_output_address": back_output_address, | ||
| "back_publish_address": back_publish_address, | ||
| "address_report_pipe": child_conn, # For late binding |
There was a problem hiding this comment.
File descriptor leak from unclosed child pipe connections
Medium Severity
When creating pipes for late binding address reporting, child_conn is passed to the subprocess but never closed in the parent process after proc.start(). In Python's multiprocessing.Pipe(), both ends are open in the parent; after the subprocess is spawned with child_conn, the parent retains its own file descriptor for child_conn. Only parent_conn is closed. This leaks one file descriptor per DPCoordinator instance and one per API server started via APIServerProcessManager. In long-running deployments or with many restarts, this can exhaust the file descriptor limit.
🔬 Verification Test
Test code:
import multiprocessing
import os
def count_fds():
"""Count open file descriptors for current process"""
return len(os.listdir(f'/proc/{os.getpid()}/fd'))
def child_func(pipe):
pipe.send("hello")
pipe.close()
# Simulate the leak pattern from the code
initial_fds = count_fds()
print(f"Initial FDs: {initial_fds}")
for i in range(10):
parent_conn, child_conn = multiprocessing.Pipe()
proc = multiprocessing.Process(target=child_func, args=(child_conn,))
proc.start()
# Parent receives and closes parent_conn (like the code does)
parent_conn.recv()
parent_conn.close()
proc.join()
# BUG: child_conn is never closed in parent!
# child_conn.close() # This line is missing
leaked_fds = count_fds()
print(f"FDs after 10 iterations: {leaked_fds}")
print(f"Leaked FDs: {leaked_fds - initial_fds}")Command run:
python3 test_fd_leak.py
Output:
Initial FDs: 4
FDs after 10 iterations: 14
Leaked FDs: 10
Why this proves the bug: Each iteration leaks one file descriptor because child_conn is not closed in the parent after the subprocess is started. After 10 iterations, 10 file descriptors are leaked.
Additional Locations (1)
|
Thanks @njhill, this approach looks much better!. Do you think we need to add unit tests to cover this change? |
Add test coverage for the wildcard port / late-binding utilities introduced to fix the ZMQ port-conflict race condition (vllm-project#28498): - `is_wildcard_addr` — parametrized true/false cases - `make_zmq_socket(..., return_address=True)` — verifies the tuple return type, that wildcard port 0 resolves to a concrete OS-assigned port, that a fixed port echoes the input address, and backward-compat (return_address=False still returns a bare socket) - `bind_zmq_socket_and_get_address` — end-to-end connectivity check (PUSH/PULL round-trip over the resolved address) - `test_wildcard_binds_produce_unique_ports` — 10 sequential wildcard binds all receive distinct ports - `test_wildcard_binds_unique_ports_concurrent` — 20 concurrent threads each bind to port 0; asserts no port collision, directly exercising the race condition that caused vllm-project#28498 Implementation changes are from PR vllm-project#30520 (credit: @njhill). Signed-off-by: Canberk Özkan <canberk.ozkan@ozu.edu.tr>
|
Hi there, thanks for this PR. I wanted to check in on the status of this change. Is there a plan to merge it? |
|
|
||
| # Parse the endpoint to extract host and port | ||
| # Handle both IPv4 and IPv6 formats | ||
| scheme, host, port_str = split_zmq_path(actual_endpoint) |
There was a problem hiding this comment.
split_zmq_path 当port是0的时候,解析出来的port是“”,导致报错。port = str(parsed.port or "")改为port = str(parsed.port)
There was a problem hiding this comment.
like this :
if parsed.port == 0:
port = "0"
else:
port = str(parsed.port or "")
|
This pull request has merge conflicts that must be resolved before it can be |
This is a variation of #28636 that doesn't rely on creating temporary sockets.
Used claude code to help with some of this.
Note
Cursor Bugbot is generating a summary for commit 19b52ee. Configure here.
Note
Implements late-binding for ZMQ endpoints to remove port allocation races and plumbs actual bound addresses through the system.
is_wildcard_addr,bind_zmq_socket_and_get_address, and extendmake_zmq_socket(..., return_address=True)to discover real endpoints from wildcard TCP binds; improve typing/overloadstcp://<host>:0inshm_broadcast.pyand uselast_endpointto build subscriber address (incl. IPv6 handling)make_zmq_socket(..., return_address=True)and report actual addresses via multiprocessing pipes; parent waits and updates addressesparallel_statedistributed init method selection and logging_get_open_port()inget_open_ports_listWritten by Cursor Bugbot for commit 19b52ee. This will update automatically on new commits. Configure here.