[Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking by ExtReMLapin · Pull Request #40785 · vllm-project/vllm

ExtReMLapin · 2026-04-24T07:46:35Z

Purpose

This PR addresses several reliability issues in the qwen3_coder tool parser during streaming. Similar to the reasoning parser fix, it handles cases where tool call tags are output as raw text fragments rather than special tokens. It also ensures data integrity during high-speed generation (speculative decoding) where multiple parts of a tool call (or content + tool call) arrive in a single delta.

Key Changes

Fragmented Tag Detection: Switched from delta_text to current_text analysis to reliably detect <tool_call> and other tags even when they are split across multiple small deltas (e.g., < then tool then _call>).
Content Tracking with _sent_content_idx: Introduced a cursor to track exactly which parts of the stream have been emitted as content. This prevents duplicate content delivery and ensures that raw text tags don't "leak" into the user-visible content before being parsed.
Combined Content & Tool Calls: Refactored extract_tool_calls_streaming to allow a single DeltaMessage to contain both plain text content and tool call fragments. This is essential when a model transitions from speaking to calling a tool in the middle of a streaming chunk.
Speculative Decoding Robustness: Fixed a race condition where arguments could be lost if the parameter value and the closing </function> tag arrived in the same delta. The parser now guarantees that all accumulated arguments are emitted before closing the JSON object.
Inter-tool Call Whitespace: Improved handling of whitespace and transitions between multiple tool calls in a single stream.

Test Plan

Added test_extract_tool_calls_streaming_split_tag to verify fragmented tag handling.
Added test_extract_tool_calls_streaming_speculative_decode_loss to verify no data loss during fast generation.
Added test_extract_tool_calls_streaming_various_chunk_sizes using the exact Qwen 3.6 template to ensure stability across different streaming behaviors.

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request improves the Qwen3CoderToolParser streaming logic by using current_text and a tracking index (_sent_content_idx) to prevent leaking tool call tags into the content stream, supported by a new test case for split tags. Review feedback identifies a missing initialization for current_tool_index which would cause an AttributeError, suggests using find instead of rfind for more reliable tag detection, and notes that the logic for skipping content between tool calls is currently fragile.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

ExtReMLapin · 2026-04-24T08:23:00Z

/gemini review

gemini-code-assist

Code Review

This pull request improves the Qwen3 tool parser's handling of streaming tool calls by using current text tracking and partial tag overlap detection to prevent tag fragments from leaking into the content. It also adds a test case for split tags. A review comment identifies a redundant variable initialization in the streaming state reset logic.

ExtReMLapin · 2026-04-24T13:54:26Z

oh boy I found more bugs with param parsing in streamed mode, fucking hell, MTP is guilty there

… + function name only) + delta2 (params + tool call end) was dropping params Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

ExtReMLapin · 2026-04-25T05:00:58Z

superseded by #40861

fix split tag detection in tool parser : qwen3_coder (streaming mode)

7fc99ed

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

ExtReMLapin requested review from aarnphm, bbrowning, chaunceyjiang and sfeng33 as code owners April 24, 2026 07:46

claude Bot reviewed Apr 24, 2026

View reviewed changes

mergify Bot added qwen Related to Qwen models tool-calling bug Something isn't working labels Apr 24, 2026

github-project-automation Bot added this to Tool Calling Apr 24, 2026

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py

Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py Outdated

Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py

ExtReMLapin and others added 3 commits April 24, 2026 10:01

Update vllm/tool_parsers/qwen3coder_tool_parser.py

f1785b3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>

Update vllm/tool_parsers/qwen3coder_tool_parser.py

fcd8783

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>

applied gemini suggestion

f4ee86c

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py

Fixed edge case streamed tool call started in delta1 (tool call start…

77d9e95

… + function name only) + delta2 (params + tool call end) was dropping params Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

ExtReMLapin changed the title ~~[Bugfix] Support fragmented <tool_call> tags in qwen3_coder (streaming mode)~~ [Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking Apr 24, 2026

ExtReMLapin marked this pull request as draft April 24, 2026 22:16

ExtReMLapin closed this Apr 25, 2026

github-project-automation Bot moved this to Done in Tool Calling Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking#40785

[Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking#40785
ExtReMLapin wants to merge 5 commits into
vllm-project:mainfrom
ExtReMLapin:qwen3_coder_incomplete_delta_text

ExtReMLapin commented Apr 24, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ExtReMLapin commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

ExtReMLapin commented Apr 24, 2026

Uh oh!

ExtReMLapin commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ExtReMLapin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Key Changes

Test Plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ExtReMLapin commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ExtReMLapin commented Apr 24, 2026

Uh oh!

ExtReMLapin commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ExtReMLapin commented Apr 24, 2026 •

edited

Loading