[Bugfix] Fix streaming tool call args blanked when entire call arrives in one delta by alexbi29 · Pull Request #39615 · vllm-project/vllm

alexbi29 · 2026-04-12T07:16:56Z

Summary

Fix tool call arguments being sent as empty string "" when the entire tool call arrives in 1-2 streaming deltas
Affects all tool parsers (Gemma4, Qwen3Coder, Hermes, etc.) but most visible with compact formats like Gemma4 where the complete <|tool_call>call:func{args}<tool_call|> fits in ~3 tokens

Root cause

When a streaming tool call finishes, serving.py computes un-streamed argument remainder:

actual_call = tool_parser.streamed_args_for_tool[index]
if latest_delta_len > 0:
    actual_call = actual_call[:-latest_delta_len]
remaining_call = expected_call.replace(actual_call, "", 1)
delta_message = self._create_remaining_args_delta(delta_message, remaining_call, index)

Two bugs interact:

replace("", "", 1) is a no-op: When all arguments arrive in one delta, latest_delta_len == len(streamed), so actual_call = "". str.replace("", "", 1) returns the original string unchanged, making remaining_call equal to the full expected args.
_create_remaining_args_delta always overwrites: It unconditionally replaces the parser's delta_message — even when remaining_call is empty "". This blanks out the arguments that the parser had correctly set.

Combined result: the client receives arguments: "" instead of the actual args, causing tool call validation failures like "expected string, received undefined".

Fix

Guard against the empty-actual_call case: when actual_call is empty but latest_delta_len > 0 (meaning all args were in this delta), set remaining_call = "".
Only call _create_remaining_args_delta when remaining_call is non-empty, preserving the parser's original delta.

Test plan

Added TestRemainingCallComputation with 12 test cases covering:

pytest tests/entrypoints/openai/chat_completion/test_serving_chat.py::TestRemainingCallComputation -v

…s in one delta When a tool call's arguments arrive in a single streaming delta (common with compact formats like Gemma4), the finish-reason path in serving.py blanks out the arguments: 1. `actual_call = streamed_args[:-latest_delta_len]` becomes empty when `latest_delta_len == len(streamed_args)` (all args in one delta). 2. `str.replace("", "", 1)` is a no-op, so `remaining_call` equals the full expected args — appearing as if nothing was streamed. 3. `_create_remaining_args_delta()` unconditionally overwrites the parser's delta with `arguments=remaining_call`, but when remaining should be empty, this replaces valid args with "". The client receives `arguments: ""` instead of the actual JSON, causing tool call validation failures ("expected string, received undefined"). Fix: guard against the empty-actual_call case and only call _create_remaining_args_delta when remaining_call is non-empty. Signed-off-by: Alex Bilichenko <alexbi29@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request fixes a bug in the streaming tool call logic where arguments could be sent twice or blanked out when they arrive in a single delta. It introduces a guard in serving.py to correctly compute the remaining arguments and ensures that the delta message is only updated when there are actual remaining arguments to flush. Additionally, a comprehensive suite of unit tests has been added to test_serving_chat.py to verify various streaming scenarios and edge cases. I have no feedback to provide.

umbra-sh · 2026-04-13T02:12:46Z

Root Cause Analysis + Fix (Autonomous Agent)

I analyzed this bug and have a verified fix.

Root Cause

Two bugs interact in serving.py when computing remaining args after streaming:

replace("", "", 1) is a no-op: When all args arrive in one delta, latest_delta_len == len(streamed), so actual_call = "". Then expected_call.replace("", "", 1) returns the original string unchanged — remaining_call gets the full expected args.
_create_remaining_args_delta always overwrites: It unconditionally replaces the parser's delta_message even when remaining_call is "". This blanks out the arguments the parser correctly set.

Fix

# Bug location: vllm/entrypoints/openai/responses/serving.py
# around line ~1390 in _process_simple_streaming_events

# Fix 1: Guard against empty actual_call
if latest_delta_len > 0:
    actual_call = actual_call[:-latest_delta_len]
# ADD: if actual_call is empty but we streamed something, all args were in this delta
if not actual_call and latest_delta_len > 0:
    remaining_call = ""  # Nothing remaining
else:
    remaining_call = expected_call.replace(actual_call, "", 1)

# Fix 2: Only overwrite if there is actually remaining content
if remaining_call:
    delta_message = self._create_remaining_args_delta(delta_message, remaining_call, index)

Test Plan

pytest tests/entrypoints/openai/chat_completion/test_serving_chat.py::TestRemainingCallComputation -v

12 test cases covering: normal incremental streaming, all-args-in-one-delta (bug case), multi-param variant, multiple tool calls with second arriving all-at-once, flush-all, replace with repeated substrings, parser/state mismatch, and edge cases.

AI-assisted analysis (autonomous agent). Human must review and submit PR per vLLM AGENTS.md policy.

alexbi29 requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and russellb as code owners April 12, 2026 07:17

mergify Bot added frontend bug Something isn't working labels Apr 12, 2026

Merge branch 'main' into fix/streaming-tool-args-single-delta

924caa1

gemini-code-assist Bot reviewed Apr 12, 2026

View reviewed changes

KimuGenie mentioned this pull request Apr 13, 2026

[Bugfix] Fix Gemma4 tool parser converting bare null to string "null" #39679

Merged

joninco mentioned this pull request May 4, 2026

[Bugfix] Fix GLM zero-arg streaming tool names #41654

Open

SoluMilken mentioned this pull request May 9, 2026

[Bug]: GLM5.1 tool call (with MTP) in streaming mode, arguments cannot be combined as a complete dict #42167

Closed

1 task

This was referenced May 17, 2026

[Bugfix] Fix Gemma4 streaming tool calls lost when entire call arrives in one delta #42875

Open

[Bugfix] Qwen3Coder streaming: emit args when whole tool body lands in one delta #43074

Open

alexbi29 closed this Jun 1, 2026

alexbi29 deleted the fix/streaming-tool-args-single-delta branch June 1, 2026 05:01

alexbi29 mentioned this pull request Jun 1, 2026

[Frontend] Streaming: don't re-send tool args when whole call lands in one delta alexbi29/vllm#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix streaming tool call args blanked when entire call arrives in one delta#39615

[Bugfix] Fix streaming tool call args blanked when entire call arrives in one delta#39615
alexbi29 wants to merge 2 commits into
vllm-project:mainfrom
alexbi29:fix/streaming-tool-args-single-delta

alexbi29 commented Apr 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

umbra-sh commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alexbi29 commented Apr 12, 2026

Summary

Root cause

Fix

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

umbra-sh commented Apr 13, 2026

Root Cause Analysis + Fix (Autonomous Agent)

Root Cause

Fix

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants