Skip to content

fix: enable tool call parsing in streaming when reasoning parser is active#163

Closed
Thump604 wants to merge 2 commits intowaybarrios:mainfrom
Thump604:fix/streaming-reasoning-tool-coexistence
Closed

fix: enable tool call parsing in streaming when reasoning parser is active#163
Thump604 wants to merge 2 commits intowaybarrios:mainfrom
Thump604:fix/streaming-reasoning-tool-coexistence

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Problem

When both --reasoning-parser and --tool-call-parser are configured, streaming tool calls are completely broken. The reasoning parser branch in stream_chat_completion() yields chunks directly without ever passing content through the tool parser.

Symptoms:

  • tool_calls is null in every SSE chunk
  • Tool call XML (<tool_call>, <function=...>) appears as raw text in content
  • finish_reason is "stop" instead of "tool_calls"
  • Non-streaming works correctly

Impact: Every CLI agent using streaming (Claude Code, OpenCode, Roo, etc.) cannot execute tool calls when a reasoning parser is active. This affects all models that use both reasoning and tool calling: Qwen3, Qwen3.5, DeepSeek-R1 with tools, etc.

Root Cause

In stream_chat_completion(), lines 1926-1952:

if _reasoning_parser and delta_text:
    # ... reasoning extraction ...
    yield chunk  # <-- emits directly, NEVER runs tool parser
else:
    # ... tool parsing only runs HERE ...

The reasoning parser branch and tool parser branch are in opposite sides of an if/elsemutually exclusive. When _reasoning_parser is set, the else branch (containing all tool parsing logic) is never reached.

Fix

After reasoning extraction produces delta_msg.content, pass that content through the tool parser before emitting the chunk. This creates a sequential pipeline matching upstream vLLM's architecture:

  1. Phase 1: Reasoning extraction (separate thinking from content)
  2. Phase 2: Tool call parsing on the content portion
  3. Phase 3: Emit chunk with reasoning + structured tool_calls

Handles three tool parser states:

  • None (inside XML markup): suppress content, still emit reasoning
  • tool_calls detected: emit structured tool_calls with reasoning
  • Normal content: pass through as before

The else branch (no reasoning parser) is unchanged.

Testing

Tested on Apple Silicon (M2 Ultra, 128GB) with:

  • Model: Qwen3.5-122B-A10B-5bit
  • Parsers: --tool-call-parser qwen3_xml --reasoning-parser qwen3
  • Flags: --enable-auto-tool-choice --mllm

Before (broken)

tool_calls: null (all chunks)
content: "\n\n<tool_call>\n<function=write_file>..."
finish_reason: "stop"

After (fixed)

tool_calls: [{"name": "write_file", "arguments": "{\"path\": \"hello.txt\", ...}"}]
content: null (tool XML consumed by parser)
finish_reason: "tool_calls"

Non-streaming regression test: still works correctly.

Related

…ctive

When both --reasoning-parser and --tool-call-parser are configured,
streaming tool calls are completely broken. The reasoning parser branch
in stream_chat_completion() yields chunks directly without ever passing
content through the tool parser. Tool call XML appears as raw text in
the content field, tool_calls is null in all chunks, and finish_reason
is "stop" instead of "tool_calls".

Root cause: The reasoning parser branch (if _reasoning_parser) and tool
parser branch are in opposite sides of an if/else — mutually exclusive.
Every model that uses both reasoning and tool calling (Qwen3, Qwen3.5,
DeepSeek-R1 with tools, etc.) is affected in streaming mode.

Fix: After reasoning extraction produces delta_msg.content, pass that
content through the tool parser before emitting the chunk. This creates
a sequential pipeline matching upstream vLLM's architecture:
  Phase 1: Reasoning extraction (separate thinking from content)
  Phase 2: Tool call parsing on content portion
  Phase 3: Emit chunk with reasoning + structured tool_calls

Handles three tool parser states:
- None (inside XML markup): suppress content, still emit reasoning
- tool_calls detected: emit structured tool_calls with reasoning
- Normal content: pass through as before

Non-streaming path is unaffected (already works correctly).

Tested with Qwen3.5-122B-A10B + qwen3_xml parser + qwen3 reasoning
parser on Apple Silicon (M2 Ultra). Streaming now returns structured
tool_calls, finish_reason: "tool_calls", with no raw XML in content.

Fixes waybarrios#107
Line-length wrap on finish_reason ternary expression.
@Thump604
Copy link
Copy Markdown
Collaborator Author

Closing — this fix was incorporated upstream in v0.2.6. Tool call parsing in the streaming reasoning path is now working in main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming mode returns tool calls as raw content text instead of structured tool_calls delta

1 participant