Skip to content

fix: streaming tool calls drop for Qwen3.6 bracket format#374

Open
mikepixelmagic-dev wants to merge 1 commit intowaybarrios:mainfrom
mikepixelmagic-dev:fix/streaming-bracket-tool-calls
Open

fix: streaming tool calls drop for Qwen3.6 bracket format#374
mikepixelmagic-dev wants to merge 1 commit intowaybarrios:mainfrom
mikepixelmagic-dev:fix/streaming-bracket-tool-calls

Conversation

@mikepixelmagic-dev
Copy link
Copy Markdown

Summary

Fixes two bugs that cause Qwen3.6 [Calling tool: name({...})] streaming tool calls to leak into text content instead of emitting structured tool_calls.

Bug 1 — server.py _stream_responses_request fast-path gate

The Responses API streaming path still uses the old "<" not in delta_text gate to decide whether to engage the tool parser. That gate only matches <tool_call> / <function= shapes — bracket-format deltas start with [, so they skip the parser entirely and get emitted as plain text.

The other 4 streaming paths (Anthropic messages and OpenAI chat completions, reasoning and non-reasoning branches) were already refactored to use _streaming_tool_markup_possible() in prior PRs (see refs below) but this path was missed.

Fix: use _streaming_tool_markup_possible(tool_accumulated_text + delta_text), matching the other 4 paths.

Bug 2 — qwen_tool_parser.extract_tool_calls_streaming closing-marker check

Detection of a completed bracket-format tool call looks for )] or </tool_call> in delta_text only. In practice those closing markers routinely span token boundaries — e.g. ) arrives in one delta and ] in the next — so the check never fires, the parser returns None for every chunk, and the entire call gets suppressed without ever being emitted as structured tool_calls.

Fix: check current_text (accumulated) instead of delta_text, so the close is detected reliably regardless of token splits.

Reproduction

Serve Qwen3.6-35B-A3B-8bit with:

vllm-mlx serve mlx-community/Qwen3.6-35B-A3B-8bit \
  --enable-auto-tool-choice \
  --tool-call-parser qwen \
  --reasoning-parser qwen3

Run a streaming chat completion with multiple tools (e.g. a create_file tool with path and content parameters). The model emits a bracket-format tool call. Without these fixes, the client receives [Calling tool: create_file({"path": "...", "content": "..."})] as content with finish_reason: null and zero tool_calls deltas. With these fixes, a proper structured tool_calls delta is emitted and finish_reason: tool_calls is set.

A 40-turn drift test cycling through 5 tools (get_weather, read_file, web_search, calculator, create_file) held 4 turns before drifting pre-fix, and held all 40 turns post-fix with no drift.

Related work

This PR completes the bracket-format coverage by fixing the one remaining Responses API path and the qwen tool parser closing-marker detection.

Test plan

  • Streaming chat completion with Qwen3.6 and a create_file-style tool emits structured tool_calls
  • Multi-turn drift test (40 turns, 5 tools) completes without format drift
  • Other streaming paths (Anthropic messages, OpenAI chat, Responses API) still pass existing tool-call tests

Two bugs caused Qwen3.6 [Calling tool: name({...})] streaming tool calls
to leak into text content instead of emitting structured tool_calls:

1. server.py _stream_responses_request: the fast-path gate checked
   `"<" not in delta_text`, which skips the tool parser for bracket-format
   deltas (they start with "["). Refactored to use the existing
   `_streaming_tool_markup_possible()` helper, matching the 4 other
   streaming paths that already use it.

2. qwen_tool_parser.extract_tool_calls_streaming: the closing-marker
   check looked for `</tool_call>` or `)]` in `delta_text` only. Those
   markers routinely span token boundaries (e.g. `)` and `]` arrive in
   separate deltas), so the check never fires and the parser returns
   None for every chunk, suppressing the whole call. Check `current_text`
   (accumulated) instead so the close is detected reliably.

Reproduction: multi-turn tool-calling session with Qwen3.6-35B-A3B-8bit
and --tool-call-parser qwen --reasoning-parser qwen3. Without these
fixes, streaming emits `[Calling tool: create_file({...})]` as content.
With fixes, structured tool_calls are emitted and a 40-turn drift test
passes cleanly (was failing at turn 5 before).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant