fix(server): stream structured tool calls without parser flags by Thump604 · Pull Request #304 · waybarrios/vllm-mlx

Thump604 · 2026-04-14T02:58:35Z

Summary

add a generic streaming tool-parser fallback when tools are present but no explicit parser flags are configured
keep configured parser behavior unchanged when is set
add regression coverage for streaming tool-call parsing without parser flags and for plain streamed text with tools present

Why

Issue #107 shows a real mismatch between non-streaming and streaming behavior: non-streaming chat completions already fall back to generic tool parsing when no parser flags are configured, but streaming skipped tool parsing entirely and leaked raw tool markup as content.

This change makes the streaming path match the existing generic non-streaming behavior for the same request shape.

Validation

============================= test session starts ==============================
platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /tmp/vllm-mlx-issue107
configfile: pytest.ini (WARNING: ignoring pytest config in pyproject.toml!)
plugins: asyncio-1.3.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 42 items / 3 deselected / 39 selected

../../../tmp/vllm-mlx-issue107/tests/test_server.py .................... [ 51%]
................... [100%]

======================= 39 passed, 3 deselected in 2.51s =======================

janhilgard

Review: fix(server): stream structured tool calls without parser flags

Overall: Good feature that fills a gap, but the approach has some complexity concerns.

What this does

When the server has no explicit --tool-call-parser or --enable-auto-tool-choice flags, but the request includes tools, this PR auto-instantiates an AutoToolParser as a fallback. This means streaming responses get structured tool_calls instead of raw markup leaking as content -- matching the existing non-streaming behavior.

Strengths

Correct diagnosis of the asymmetry. Non-streaming requests already fall through to generic parse_tool_calls() which handles tool markup. Streaming had no equivalent fallback.
_get_streaming_tool_parser is well-structured. The function checks tool_choice == "none", tries the configured parser first, then falls back to AutoToolParser. This layering is clean.
_streaming_tool_markup_possible consolidates the heuristic. Replacing "<" in content with a proper marker tuple check is an improvement. The markers cover XML-based formats, Mistral, Qwen bracket format, MiniMax, and Anthropic invoke format.
Two good tests. One verifies structured tool calls appear in streaming output; the other verifies plain text is not interfered with.
_tool_choice_disabled is a useful extraction. Encapsulating the tool_choice == "none" check prevents duplicated logic.

Potential issues

Performance regression for all streaming requests with tools. Previously, the fast path only checked "<" in text. Now _streaming_tool_markup_possible calls any(marker in text for marker in _STREAMING_TOOL_MARKERS) on tool_accumulated_text + content for every chunk. For long conversations with tools, tool_accumulated_text can grow to tens of thousands of characters. The O(n * m) substring search on every token could add measurable latency. Consider checking only on content/delta_text (the new portion) rather than the full accumulated text.
Auto-parser instantiation happens on every request. In _get_streaming_tool_parser, when there is no configured parser but tools are present, a new AutoToolParser is created per request. For a hot path under load this could be a concern. Consider caching or reusing the parser instance.
The marker list is missing the bare bracket pattern from PR #305. _STREAMING_TOOL_MARKERS includes [Calling tool: and [TOOL_CALLS] but not bare [func( patterns. If #305 merges first, this will need updating.
Duplicated marker-check logic across two PRs. PR #305 adds _STREAMING_BARE_BRACKET_MARKER and _STREAMING_BARE_BRACKET_PARTIAL, while this PR adds _STREAMING_TOOL_MARKERS and _streaming_tool_markup_possible. These two PRs will conflict. It would be good to coordinate which one lands first and have the other rebase.

Minor

The global _tool_parser_instance removal from both streaming paths is good cleanup, matching the move to _get_streaming_tool_parser.
The fallback end-of-stream check now uses _streaming_tool_markup_possible(tool_accumulated_text) instead of a hardcoded list of 3 patterns -- this is a nice consolidation.

Solid improvement that closes a real gap in streaming tool call handling. The main concern is the coordination with PR #305 and the performance characteristics of marker scanning on accumulated text.

When tools are present but no explicit parser flags (--enable-auto-tool-choice, --tool-call-parser) are configured, non-streaming chat completions already fall back to generic tool parsing. Streaming skipped this fallback entirely and leaked raw tool markup as content. Changes: - Add _get_streaming_tool_parser() that mirrors the non-streaming fallback: use the configured parser when auto tool choice is on, otherwise instantiate the generic "auto" parser when tools are present. - Replace the old "<" in text heuristic with _streaming_tool_markup_possible(), which checks for known tool call start markers across model families. - Extract _tool_choice_disabled() to centralise the tool_choice=="none" check. - Add regression tests for streaming tool calls without parser flags and for plain text streaming with tools present. Fixes waybarrios#107

janhilgard reviewed Apr 15, 2026

View reviewed changes

This was referenced Apr 15, 2026

fix(auto-parser): support bare bracket tool calls #305

Merged

feat: add registry-backed multi-model serving #307

Open

Thump604 force-pushed the codex/issue107-generic-streaming-tools branch from c487623 to b888645 Compare April 16, 2026 18:38

Thump604 merged commit a8a3024 into waybarrios:main Apr 18, 2026
9 checks passed

mikepixelmagic-dev mentioned this pull request Apr 19, 2026

fix: streaming tool calls drop for Qwen3.6 bracket format #374

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): stream structured tool calls without parser flags#304

fix(server): stream structured tool calls without parser flags#304
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue107-generic-streaming-tools

Thump604 commented Apr 14, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented Apr 14, 2026

Summary

Why

Validation

======================= 39 passed, 3 deselected in 2.51s =======================

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Review: fix(server): stream structured tool calls without parser flags

What this does

Strengths

Potential issues

Minor

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants