[Bugfix] Robust Qwen3 Coder streaming: fragmented tags, speculative decoding fixes, and content tracking#40785
Conversation
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
There was a problem hiding this comment.
Code Review
This pull request improves the Qwen3CoderToolParser streaming logic by using current_text and a tracking index (_sent_content_idx) to prevent leaking tool call tags into the content stream, supported by a new test case for split tags. Review feedback identifies a missing initialization for current_tool_index which would cause an AttributeError, suggests using find instead of rfind for more reliable tag detection, and notes that the logic for skipping content between tool calls is currently fragile.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves the Qwen3 tool parser's handling of streaming tool calls by using current text tracking and partial tag overlap detection to prevent tag fragments from leaking into the content. It also adds a test case for split tags. A review comment identifies a redundant variable initialization in the streaming state reset logic.
|
oh boy I found more bugs with param parsing in streamed mode, fucking hell, MTP is guilty there |
… + function name only) + delta2 (params + tool call end) was dropping params Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
<tool_call> tags in qwen3_coder (streaming mode)|
superseded by #40861 |
Purpose
This PR addresses several reliability issues in the
qwen3_codertool parser during streaming. Similar to the reasoning parser fix, it handles cases where tool call tags are output as raw text fragments rather than special tokens. It also ensures data integrity during high-speed generation (speculative decoding) where multiple parts of a tool call (or content + tool call) arrive in a single delta.Key Changes
delta_texttocurrent_textanalysis to reliably detect<tool_call>and other tags even when they are split across multiple small deltas (e.g.,<thentoolthen_call>)._sent_content_idx: Introduced a cursor to track exactly which parts of the stream have been emitted as content. This prevents duplicate content delivery and ensures that raw text tags don't "leak" into the user-visible content before being parsed.extract_tool_calls_streamingto allow a singleDeltaMessageto contain both plain text content and tool call fragments. This is essential when a model transitions from speaking to calling a tool in the middle of a streaming chunk.</function>tag arrived in the same delta. The parser now guarantees that all accumulated arguments are emitted before closing the JSON object.Test Plan
test_extract_tool_calls_streaming_split_tagto verify fragmented tag handling.test_extract_tool_calls_streaming_speculative_decode_lossto verify no data loss during fast generation.test_extract_tool_calls_streaming_various_chunk_sizesusing the exact Qwen 3.6 template to ensure stability across different streaming behaviors.