[vllm, rollout] fix: implement deferred ZMQ futures for non-blocking executor calls#4875
[vllm, rollout] fix: implement deferred ZMQ futures for non-blocking executor calls#4875jreiml wants to merge 1 commit intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly implements the deferred execution for non-blocking calls in ExternalZeroMQDistributedExecutor by introducing _DeferredZmqFuture. This is a good fix that aligns with the vLLM executor contract and allows for overlapping computation. The addition of the assertion for max_concurrent_batches=1 is also a great defensive measure to ensure thread safety with ZMQ REQ/REP sockets. However, I've identified a critical security vulnerability. The implementation uses pickle.loads() to deserialize data received over the network. This is unsafe and can lead to remote code execution if the network is not completely secure. My review includes a comment with details on this issue.
| try: | ||
| outputs = [] | ||
| for socket in self._sockets: | ||
| outputs.append(pickle.loads(socket.recv())) |
There was a problem hiding this comment.
The use of pickle.loads() on data received from a network socket introduces a critical security vulnerability. Deserializing data with pickle can lead to arbitrary code execution if the data is crafted maliciously. While this communication is likely between trusted internal workers, it's a significant security risk if the network is not completely isolated and secure. An attacker who can intercept or inject traffic on this ZMQ channel could compromise the worker process.
It is strongly recommended to replace pickle with a safer serialization format, such as JSON. If complex Python objects must be transferred, consider using a library that provides cryptographically signed serialization to ensure data integrity and authenticity.
There was a problem hiding this comment.
The current change follows the existing logic. This would have to be done in another PR.
- Add `_DeferredZmqFuture` class that defers ZMQ `recv()` until `result()` is called - This properly implements the vLLM executor contract for `non_block=True`, allowing EngineCore to overlap work (e.g., grammar bitmask computation for structured output) with remote model execution - Add assertion to enforce `max_concurrent_batches=1`, required for thread-safe ZMQ REQ/REP operation
7447b1f to
013a7a9
Compare
|
Close as we switch to vllm MP Executor #4280 |
What does this PR do?
Fixes the
non_block=Truebehavior inExternalZeroMQDistributedExecutorto properly implement the vLLM executor contract.The previous implementation immediately called
recv()and wrapped the result in an already-resolvedFuture. This PR adds a_DeferredZmqFuturethat defersrecv()untilresult()is called, allowing vLLM's EngineCore to overlap work (e.g., grammar bitmask computation for structured output) with remote model execution.Related: #3934 added
non_blockparameter compatibility but didn't implement actual non-blocking behavior.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI)Test
The
non_block=Truecode path is called by vLLM v1's EngineCore (seevllm/v1/engine/core.py). Existing CI tests that run vLLM v1 (tests/experimental/agent_loop,test_vllm_abort.py) exercise this code path through the full inference stack.API and Usage Example
No API changes. Internal behavior change only.
Design & Code Changes
_DeferredZmqFutureclass that stores sockets and defersrecv()untilresult()is callednon_blockandunique_reply_rankparameters tocollective_rpc()max_concurrent_batches=1(required for thread-safe ZMQ REQ/REP)Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaystests/experimental/agent_loop,test_vllm_abort.py)ci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)