[Bugfix] Fix streaming tool call args blanked when entire call arrives in one delta#39615
[Bugfix] Fix streaming tool call args blanked when entire call arrives in one delta#39615alexbi29 wants to merge 2 commits into
Conversation
…s in one delta
When a tool call's arguments arrive in a single streaming delta (common
with compact formats like Gemma4), the finish-reason path in serving.py
blanks out the arguments:
1. `actual_call = streamed_args[:-latest_delta_len]` becomes empty when
`latest_delta_len == len(streamed_args)` (all args in one delta).
2. `str.replace("", "", 1)` is a no-op, so `remaining_call` equals the
full expected args — appearing as if nothing was streamed.
3. `_create_remaining_args_delta()` unconditionally overwrites the
parser's delta with `arguments=remaining_call`, but when remaining
should be empty, this replaces valid args with "".
The client receives `arguments: ""` instead of the actual JSON, causing
tool call validation failures ("expected string, received undefined").
Fix: guard against the empty-actual_call case and only call
_create_remaining_args_delta when remaining_call is non-empty.
Signed-off-by: Alex Bilichenko <alexbi29@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in the streaming tool call logic where arguments could be sent twice or blanked out when they arrive in a single delta. It introduces a guard in serving.py to correctly compute the remaining arguments and ensures that the delta message is only updated when there are actual remaining arguments to flush. Additionally, a comprehensive suite of unit tests has been added to test_serving_chat.py to verify various streaming scenarios and edge cases. I have no feedback to provide.
Root Cause Analysis + Fix (Autonomous Agent)I analyzed this bug and have a verified fix. Root CauseTwo bugs interact in
Fix# Bug location: vllm/entrypoints/openai/responses/serving.py
# around line ~1390 in _process_simple_streaming_events
# Fix 1: Guard against empty actual_call
if latest_delta_len > 0:
actual_call = actual_call[:-latest_delta_len]
# ADD: if actual_call is empty but we streamed something, all args were in this delta
if not actual_call and latest_delta_len > 0:
remaining_call = "" # Nothing remaining
else:
remaining_call = expected_call.replace(actual_call, "", 1)
# Fix 2: Only overwrite if there is actually remaining content
if remaining_call:
delta_message = self._create_remaining_args_delta(delta_message, remaining_call, index)Test Planpytest tests/entrypoints/openai/chat_completion/test_serving_chat.py::TestRemainingCallComputation -v12 test cases covering: normal incremental streaming, all-args-in-one-delta (bug case), multi-param variant, multiple tool calls with second arriving all-at-once, flush-all, replace with repeated substrings, parser/state mismatch, and edge cases. AI-assisted analysis (autonomous agent). Human must review and submit PR per vLLM AGENTS.md policy. |
Summary
""when the entire tool call arrives in 1-2 streaming deltas<|tool_call>call:func{args}<tool_call|>fits in ~3 tokensRoot cause
When a streaming tool call finishes,
serving.pycomputes un-streamed argument remainder:Two bugs interact:
replace("", "", 1)is a no-op: When all arguments arrive in one delta,latest_delta_len == len(streamed), soactual_call = "".str.replace("", "", 1)returns the original string unchanged, makingremaining_callequal to the full expected args._create_remaining_args_deltaalways overwrites: It unconditionally replaces the parser'sdelta_message— even whenremaining_callis empty"". This blanks out the arguments that the parser had correctly set.Combined result: the client receives
arguments: ""instead of the actual args, causing tool call validation failures like"expected string, received undefined".Fix
actual_callcase: whenactual_callis empty butlatest_delta_len > 0(meaning all args were in this delta), setremaining_call = ""._create_remaining_args_deltawhenremaining_callis non-empty, preserving the parser's original delta.Test plan
Added
TestRemainingCallComputationwith 12 test cases covering:replace()with repeated substringsarguments=""actual_callnot inexpected)latest_delta_len > len(streamed)edge case