[Fix] Gemma4 tool parser: support multiple tool calls in single delta and refactor streaming extraction by Alex-ai-future · Pull Request #43037 · vllm-project/vllm

Alex-ai-future · 2026-05-19T02:56:13Z

Summary

Fixes #42696 — Gemma4 tool parser fails in streaming mode when multiple tool calls arrive in a single delta.

Root Cause

The old streaming parser used a tag-counting state machine that assumed only one active tool call at a time. When MTP/speculative decoding produced multiple <tool_call|> delimiters in one delta, it would:

Lose subsequent tool calls — only the first one was emitted
Emit incomplete DeltaToolCall — follow-up chunks lacked id, type, function.name, causing @ai-sdk (OpenCode) validation failures

Solution

Replace the tag-counting state machine with Hermes-style accumulated-text scan-and-diff:

regex.findall() enumerates all complete tool calls in current_text
Per-index state tracking (prev_tool_call_arr[i]) instead of global current_tool_id
Each DeltaToolCall emission includes id, type, and function.name on first chunk
Content between tool calls is properly extracted and emitted

Code changes

File	+	-
`vllm/tool_parsers/gemma4_tool_parser.py`	~130	~200
`tests/tool_parsers/test_gemma4_tool_parser.py`	~110	~5

Key refactorings:

Split _extract_streaming into 7 focused functions
Removed ~150 lines of dead code (_handle_tool_call_middle, _handle_tool_call_end, etc.)
Removed unused buffered_delta_text and _buffer_delta_text()

Tests

.venv/bin/python -m pytest tests/tool_parsers/test_gemma4_tool_parser.py -v
# 57 passed

New test cases:

test_streaming_multiple_tool_calls_in_single_delta — 2 calls in one delta
test_streaming_four_tool_calls_with_interleaved_text_in_single_delta — 4 calls + text
test_streaming_text_between_tool_calls_in_single_delta — content extraction between calls
test_streaming_multiple_tool_calls_sequential — cross-chunk multi-call
test_streaming_filename_suffix_preserved_across_chunks — file extension split
test_streaming_string_prefix_preserved_across_chunks — string prefix split

Why this is not duplicating an existing PR

[Bugfix] Rewrite Gemma4 streaming tool parser #42237 (whym:codex/gemma4-hermes-style-parser) has the same high-level approach (Hermes-style scan-and-diff). Our PR differs in:
- Better code structure: 7 focused functions vs fewer large functions
- More comprehensive tests: 4-call + interleaved text + content extraction (57 vs ~55)
- No external dependency: Doesn't introduce partial_tag_overlap utility
- Both PRs address the same root cause; maintainers can choose either approach

Signed-off-by: Alex <alex.tech.lab@outlook.com>

github-actions · 2026-05-19T02:56:21Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

gemini-code-assist

Code Review

This pull request refactors the Gemma4 streaming tool parser to support multiple tool calls within a single delta by adopting a full-match enumeration approach, accompanied by new regression tests. Review feedback identifies several critical issues: the prev_end_count variable is now unused, the use of a global current_tool_name_sent flag fails to correctly track state for individual tool calls in a multi-call sequence, and the content extraction logic is prone to data loss for text appearing before or between tool call tags.

gemini-code-assist · 2026-05-19T02:58:28Z

+        # Find all complete tool calls in current text
+        all_matches = self.tool_call_regex.findall(current_text)
        prev_start_count = previous_text.count(self.tool_call_start_token)
        prev_end_count = previous_text.count(self.tool_call_end_token)


The variable prev_end_count is calculated but never used in the new implementation of _extract_streaming. It should be removed to avoid confusion and unnecessary computation.

gemini-code-assist · 2026-05-19T02:58:28Z

+
+                # Stream argument diff
+                if self.current_tool_name_sent and args_part:
+                    partial_delta = self._emit_argument_diff_for_index(


The use of the global self.current_tool_name_sent flag is incorrect for multi-tool parsing. If a previous tool call in the same request has already finished, this flag will be True. When a subsequent tool call starts, line 622 may evaluate to True even if the current tool's name hasn't been sent yet (e.g., if func_name is still None at line 605). This could lead to streaming arguments for the wrong tool or before the tool identity is established. You should use the per-tool state stored in self.prev_tool_call_arr[partial_idx] to track if the name has been sent for the specific tool call being processed.

if "name" in self.prev_tool_call_arr[partial_idx] and args_part:

gemini-code-assist · 2026-05-19T02:58:28Z

+        # Case A: No tool call tokens at all — treat as pure content
+        if curr_start_count == 0 and prev_start_count == 0:
            if delta_text:
-                return DeltaMessage(content=delta_text)
+                content_delta = delta_text
+        # Case B: All tool calls completed and text appears after them
+        elif curr_start_count > 0 and curr_start_count == curr_end_count:
+            # Find text after the last <tool_call|>
+            last_end = current_text.rfind(self.tool_call_end_token)
+            if last_end != -1:
+                text_after = current_text[last_end + len(self.tool_call_end_token):]
+                prev_last_end = previous_text.rfind(self.tool_call_end_token)
+                prev_text_after = ""
+                if prev_last_end != -1:
+                    prev_text_after = previous_text[
+                        prev_last_end + len(self.tool_call_end_token):
+                    ]
+                # New content is what appeared since last check
+                if text_after.startswith(prev_text_after):
+                    new_content = text_after[len(prev_text_after):]
+                else:
+                    new_content = text_after
+                if new_content:
+                    content_delta = new_content


The content extraction logic in Step 3 is incomplete and will lead to data loss in several scenarios:

Preamble Loss: Any text appearing before the first tool call in the same delta (e.g., "Preamble <|tool_call>...") is ignored because curr_start_count > 0 and curr_start_count != curr_end_count.

Inter-call Content Loss: Text between multiple tool calls (e.g., "<tool_call|> Inter <|tool_call>") is lost because the parser assumes content only exists when no tool calls are active.

Delayed Emission: Content after a tool call is only emitted once that specific tool call is fully completed, which may delay the display of text to the user.

You should refactor this to identify all text segments outside of <|tool_call>...<tool_call|> blocks and emit the difference relative to what was previously sent. A more robust approach would be to calculate the total content in current_text (by removing all tool call blocks) and diffing it against the total content in previous_text.

Signed-off-by: Alex <alex.tech.lab@outlook.com>

fix muti function call

e815867

Signed-off-by: Alex <alex.tech.lab@outlook.com>

mergify Bot added the tool-calling label May 19, 2026

github-project-automation Bot added this to Tool Calling May 19, 2026

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Alex-ai-future added 4 commits May 19, 2026 11:16

add complex test case

17ae412

Signed-off-by: Alex <alex.tech.lab@outlook.com>

fix bug

54262ca

Signed-off-by: Alex <alex.tech.lab@outlook.com>

split reflector

9f80013

Signed-off-by: Alex <alex.tech.lab@outlook.com>

rm dead code

c6e9fed

Signed-off-by: Alex <alex.tech.lab@outlook.com>

Alex-ai-future changed the title ~~fix muti function call for gemma~~ Fix Gemma4 tool parser: support multiple tool calls in single delta and refactor streaming extraction May 19, 2026

Alex-ai-future mentioned this pull request May 19, 2026

add function id #42921

Draft

4 tasks

Alex-ai-future changed the title ~~Fix Gemma4 tool parser: support multiple tool calls in single delta and refactor streaming extraction~~ [Fix] Gemma4 tool parser: support multiple tool calls in single delta and refactor streaming extraction May 19, 2026

add suffix/prefix split tests from vllm-project#42237

685b2c6

Alex-ai-future marked this pull request as ready for review May 20, 2026 02:55

Alex-ai-future requested review from aarnphm, bbrowning, chaunceyjiang and sfeng33 as code owners May 20, 2026 02:55

Alex-ai-future mentioned this pull request May 21, 2026

[Bug]: Gemma4 tool parser is broken in the streaming mode (for OpenCode) #42696

Open

willamhou mentioned this pull request May 24, 2026

[rust] perf: incremental Gemma4 args body scan #43513

Closed

Merge branch 'main' into gemma-parser-muti-dispatch

f68af88

yasu-oh mentioned this pull request Jun 6, 2026

[Bugfix] Gemma4 streaming parser for multi-boundary tool deltas #44741

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Gemma4 tool parser: support multiple tool calls in single delta and refactor streaming extraction#43037

[Fix] Gemma4 tool parser: support multiple tool calls in single delta and refactor streaming extraction#43037
Alex-ai-future wants to merge 7 commits into
vllm-project:mainfrom
Alex-ai-future:gemma-parser-muti-dispatch

Alex-ai-future commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Alex-ai-future commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Solution

Code changes

Tests

Why this is not duplicating an existing PR

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alex-ai-future commented May 19, 2026 •

edited

Loading