fix: streaming tool calls drop for Qwen3.6 bracket format by mikepixelmagic-dev · Pull Request #374 · waybarrios/vllm-mlx

mikepixelmagic-dev · 2026-04-19T08:45:13Z

Summary

Fixes two bugs that cause Qwen3.6 [Calling tool: name({...})] streaming tool calls to leak into text content instead of emitting structured tool_calls.

Bug 1 — `server.py` `_stream_responses_request` fast-path gate

The Responses API streaming path still uses the old "<" not in delta_text gate to decide whether to engage the tool parser. That gate only matches <tool_call> / <function= shapes — bracket-format deltas start with [, so they skip the parser entirely and get emitted as plain text.

The other 4 streaming paths (Anthropic messages and OpenAI chat completions, reasoning and non-reasoning branches) were already refactored to use _streaming_tool_markup_possible() in prior PRs (see refs below) but this path was missed.

Fix: use _streaming_tool_markup_possible(tool_accumulated_text + delta_text), matching the other 4 paths.

Bug 2 — `qwen_tool_parser.extract_tool_calls_streaming` closing-marker check

Detection of a completed bracket-format tool call looks for )] or </tool_call> in delta_text only. In practice those closing markers routinely span token boundaries — e.g. ) arrives in one delta and ] in the next — so the check never fires, the parser returns None for every chunk, and the entire call gets suppressed without ever being emitted as structured tool_calls.

Fix: check current_text (accumulated) instead of delta_text, so the close is detected reliably regardless of token splits.

Reproduction

Serve Qwen3.6-35B-A3B-8bit with:

vllm-mlx serve mlx-community/Qwen3.6-35B-A3B-8bit \
  --enable-auto-tool-choice \
  --tool-call-parser qwen \
  --reasoning-parser qwen3

Run a streaming chat completion with multiple tools (e.g. a create_file tool with path and content parameters). The model emits a bracket-format tool call. Without these fixes, the client receives [Calling tool: create_file({"path": "...", "content": "..."})] as content with finish_reason: null and zero tool_calls deltas. With these fixes, a proper structured tool_calls delta is emitted and finish_reason: tool_calls is set.

A 40-turn drift test cycling through 5 tools (get_weather, read_file, web_search, calculator, create_file) held 4 turns before drifting pre-fix, and held all 40 turns post-fix with no drift.

Related work

auto tool parser: bare bracket format [func({...})] not recognized #146 / fix(auto-parser): support bare bracket tool calls #305 added bare-bracket recognition to the auto parser
fix(server): stream structured tool calls without parser flags #304 refactored generic streaming tool calls
fix: suppress tool call XML from streaming text content (#129) #232 added XML suppression from streaming text

This PR completes the bracket-format coverage by fixing the one remaining Responses API path and the qwen tool parser closing-marker detection.

Test plan

Streaming chat completion with Qwen3.6 and a create_file-style tool emits structured tool_calls
Multi-turn drift test (40 turns, 5 tools) completes without format drift
Other streaming paths (Anthropic messages, OpenAI chat, Responses API) still pass existing tool-call tests

Two bugs caused Qwen3.6 [Calling tool: name({...})] streaming tool calls to leak into text content instead of emitting structured tool_calls: 1. server.py _stream_responses_request: the fast-path gate checked `"<" not in delta_text`, which skips the tool parser for bracket-format deltas (they start with "["). Refactored to use the existing `_streaming_tool_markup_possible()` helper, matching the 4 other streaming paths that already use it. 2. qwen_tool_parser.extract_tool_calls_streaming: the closing-marker check looked for `</tool_call>` or `)]` in `delta_text` only. Those markers routinely span token boundaries (e.g. `)` and `]` arrive in separate deltas), so the check never fires and the parser returns None for every chunk, suppressing the whole call. Check `current_text` (accumulated) instead so the close is detected reliably. Reproduction: multi-turn tool-calling session with Qwen3.6-35B-A3B-8bit and --tool-call-parser qwen --reasoning-parser qwen3. Without these fixes, streaming emits `[Calling tool: create_file({...})]` as content. With fixes, structured tool_calls are emitted and a 40-turn drift test passes cleanly (was failing at turn 5 before).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: streaming tool calls drop for Qwen3.6 bracket format#374

fix: streaming tool calls drop for Qwen3.6 bracket format#374
mikepixelmagic-dev wants to merge 1 commit intowaybarrios:mainfrom
mikepixelmagic-dev:fix/streaming-bracket-tool-calls

mikepixelmagic-dev commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikepixelmagic-dev commented Apr 19, 2026

Summary

Bug 1 — server.py _stream_responses_request fast-path gate

Bug 2 — qwen_tool_parser.extract_tool_calls_streaming closing-marker check

Reproduction

Related work

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug 1 — `server.py` `_stream_responses_request` fast-path gate

Bug 2 — `qwen_tool_parser.extract_tool_calls_streaming` closing-marker check