Skip to content

[Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking#40785

Closed
ExtReMLapin wants to merge 5 commits into
vllm-project:mainfrom
ExtReMLapin:qwen3_coder_incomplete_delta_text
Closed

[Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking#40785
ExtReMLapin wants to merge 5 commits into
vllm-project:mainfrom
ExtReMLapin:qwen3_coder_incomplete_delta_text

Conversation

@ExtReMLapin
Copy link
Copy Markdown
Contributor

@ExtReMLapin ExtReMLapin commented Apr 24, 2026

Purpose

This PR addresses several reliability issues in the qwen3_coder tool parser during streaming. Similar to the reasoning parser fix, it handles cases where tool call tags are output as raw text fragments rather than special tokens. It also ensures data integrity during high-speed generation (speculative decoding) where multiple parts of a tool call (or content + tool call) arrive in a single delta.

Key Changes

  • Fragmented Tag Detection: Switched from delta_text to current_text analysis to reliably detect <tool_call> and other tags even when they are split across multiple small deltas (e.g., < then tool then _call>).
  • Content Tracking with _sent_content_idx: Introduced a cursor to track exactly which parts of the stream have been emitted as content. This prevents duplicate content delivery and ensures that raw text tags don't "leak" into the user-visible content before being parsed.
  • Combined Content & Tool Calls: Refactored extract_tool_calls_streaming to allow a single DeltaMessage to contain both plain text content and tool call fragments. This is essential when a model transitions from speaking to calling a tool in the middle of a streaming chunk.
  • Speculative Decoding Robustness: Fixed a race condition where arguments could be lost if the parameter value and the closing </function> tag arrived in the same delta. The parser now guarantees that all accumulated arguments are emitted before closing the JSON object.
  • Inter-tool Call Whitespace: Improved handling of whitespace and transitions between multiple tool calls in a single stream.

Test Plan

  • Added test_extract_tool_calls_streaming_split_tag to verify fragmented tag handling.
  • Added test_extract_tool_calls_streaming_speculative_decode_loss to verify no data loss during fast generation.
  • Added test_extract_tool_calls_streaming_various_chunk_sizes using the exact Qwen 3.6 template to ensure stability across different streaming behaviors.

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added qwen Related to Qwen models tool-calling bug Something isn't working labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the Qwen3CoderToolParser streaming logic by using current_text and a tracking index (_sent_content_idx) to prevent leaking tool call tags into the content stream, supported by a new test case for split tags. Review feedback identifies a missing initialization for current_tool_index which would cause an AttributeError, suggests using find instead of rfind for more reliable tag detection, and notes that the logic for skipping content between tool calls is currently fragile.

Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py
Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py Outdated
Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py
ExtReMLapin and others added 3 commits April 24, 2026 10:01
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the Qwen3 tool parser's handling of streaming tool calls by using current text tracking and partial tag overlap detection to prevent tag fragments from leaking into the content. It also adds a test case for split tags. A review comment identifies a redundant variable initialization in the streaming state reset logic.

Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py
@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

oh boy I found more bugs with param parsing in streamed mode, fucking hell, MTP is guilty there

… + function name only) + delta2 (params + tool call end) was dropping params

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin ExtReMLapin changed the title [Bugfix] Support fragmented <tool_call> tags in qwen3_coder (streaming mode) [Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking Apr 24, 2026
@ExtReMLapin ExtReMLapin marked this pull request as draft April 24, 2026 22:16
@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

superseded by #40861

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant