fix: enable tool call parsing in streaming when reasoning parser is active#163
Closed
Thump604 wants to merge 2 commits intowaybarrios:mainfrom
Closed
fix: enable tool call parsing in streaming when reasoning parser is active#163Thump604 wants to merge 2 commits intowaybarrios:mainfrom
Thump604 wants to merge 2 commits intowaybarrios:mainfrom
Conversation
…ctive When both --reasoning-parser and --tool-call-parser are configured, streaming tool calls are completely broken. The reasoning parser branch in stream_chat_completion() yields chunks directly without ever passing content through the tool parser. Tool call XML appears as raw text in the content field, tool_calls is null in all chunks, and finish_reason is "stop" instead of "tool_calls". Root cause: The reasoning parser branch (if _reasoning_parser) and tool parser branch are in opposite sides of an if/else — mutually exclusive. Every model that uses both reasoning and tool calling (Qwen3, Qwen3.5, DeepSeek-R1 with tools, etc.) is affected in streaming mode. Fix: After reasoning extraction produces delta_msg.content, pass that content through the tool parser before emitting the chunk. This creates a sequential pipeline matching upstream vLLM's architecture: Phase 1: Reasoning extraction (separate thinking from content) Phase 2: Tool call parsing on content portion Phase 3: Emit chunk with reasoning + structured tool_calls Handles three tool parser states: - None (inside XML markup): suppress content, still emit reasoning - tool_calls detected: emit structured tool_calls with reasoning - Normal content: pass through as before Non-streaming path is unaffected (already works correctly). Tested with Qwen3.5-122B-A10B + qwen3_xml parser + qwen3 reasoning parser on Apple Silicon (M2 Ultra). Streaming now returns structured tool_calls, finish_reason: "tool_calls", with no raw XML in content. Fixes waybarrios#107
Line-length wrap on finish_reason ternary expression.
Collaborator
Author
|
Closing — this fix was incorporated upstream in v0.2.6. Tool call parsing in the streaming reasoning path is now working in main. |
Open
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When both
--reasoning-parserand--tool-call-parserare configured, streaming tool calls are completely broken. The reasoning parser branch instream_chat_completion()yields chunks directly without ever passing content through the tool parser.Symptoms:
tool_callsisnullin every SSE chunk<tool_call>,<function=...>) appears as raw text incontentfinish_reasonis"stop"instead of"tool_calls"Impact: Every CLI agent using streaming (Claude Code, OpenCode, Roo, etc.) cannot execute tool calls when a reasoning parser is active. This affects all models that use both reasoning and tool calling: Qwen3, Qwen3.5, DeepSeek-R1 with tools, etc.
Root Cause
In
stream_chat_completion(), lines 1926-1952:The reasoning parser branch and tool parser branch are in opposite sides of an
if/else— mutually exclusive. When_reasoning_parseris set, theelsebranch (containing all tool parsing logic) is never reached.Fix
After reasoning extraction produces
delta_msg.content, pass that content through the tool parser before emitting the chunk. This creates a sequential pipeline matching upstream vLLM's architecture:tool_callsHandles three tool parser states:
None(inside XML markup): suppress content, still emit reasoningtool_callsdetected: emit structuredtool_callswith reasoningThe
elsebranch (no reasoning parser) is unchanged.Testing
Tested on Apple Silicon (M2 Ultra, 128GB) with:
--tool-call-parser qwen3_xml --reasoning-parser qwen3--enable-auto-tool-choice --mllmBefore (broken)
After (fixed)
Non-streaming regression test: still works correctly.
Related