Skip to content

fix(server): stream structured tool calls without parser flags#304

Merged
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue107-generic-streaming-tools
Apr 18, 2026
Merged

fix(server): stream structured tool calls without parser flags#304
Thump604 merged 1 commit intowaybarrios:mainfrom
Thump604:codex/issue107-generic-streaming-tools

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Summary

  • add a generic streaming tool-parser fallback when tools are present but no explicit parser flags are configured
  • keep configured parser behavior unchanged when is set
  • add regression coverage for streaming tool-call parsing without parser flags and for plain streamed text with tools present

Why

Issue #107 shows a real mismatch between non-streaming and streaming behavior: non-streaming chat completions already fall back to generic tool parsing when no parser flags are configured, but streaming skipped tool parsing entirely and leaked raw tool markup as content.

This change makes the streaming path match the existing generic non-streaming behavior for the same request shape.

Validation

  • ============================= test session starts ==============================
    platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
    rootdir: /tmp/vllm-mlx-issue107
    configfile: pytest.ini (WARNING: ignoring pytest config in pyproject.toml!)
    plugins: asyncio-1.3.0, anyio-4.13.0
    asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
    collected 42 items / 3 deselected / 39 selected

../../../tmp/vllm-mlx-issue107/tests/test_server.py .................... [ 51%]
................... [100%]

======================= 39 passed, 3 deselected in 2.51s =======================

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(server): stream structured tool calls without parser flags

Overall: Good feature that fills a gap, but the approach has some complexity concerns.

What this does

When the server has no explicit --tool-call-parser or --enable-auto-tool-choice flags, but the request includes tools, this PR auto-instantiates an AutoToolParser as a fallback. This means streaming responses get structured tool_calls instead of raw markup leaking as content -- matching the existing non-streaming behavior.

Strengths

  1. Correct diagnosis of the asymmetry. Non-streaming requests already fall through to generic parse_tool_calls() which handles tool markup. Streaming had no equivalent fallback.

  2. _get_streaming_tool_parser is well-structured. The function checks tool_choice == "none", tries the configured parser first, then falls back to AutoToolParser. This layering is clean.

  3. _streaming_tool_markup_possible consolidates the heuristic. Replacing "<" in content with a proper marker tuple check is an improvement. The markers cover XML-based formats, Mistral, Qwen bracket format, MiniMax, and Anthropic invoke format.

  4. Two good tests. One verifies structured tool calls appear in streaming output; the other verifies plain text is not interfered with.

  5. _tool_choice_disabled is a useful extraction. Encapsulating the tool_choice == "none" check prevents duplicated logic.

Potential issues

  1. Performance regression for all streaming requests with tools. Previously, the fast path only checked "<" in text. Now _streaming_tool_markup_possible calls any(marker in text for marker in _STREAMING_TOOL_MARKERS) on tool_accumulated_text + content for every chunk. For long conversations with tools, tool_accumulated_text can grow to tens of thousands of characters. The O(n * m) substring search on every token could add measurable latency. Consider checking only on content/delta_text (the new portion) rather than the full accumulated text.

  2. Auto-parser instantiation happens on every request. In _get_streaming_tool_parser, when there is no configured parser but tools are present, a new AutoToolParser is created per request. For a hot path under load this could be a concern. Consider caching or reusing the parser instance.

  3. The marker list is missing the bare bracket pattern from PR #305. _STREAMING_TOOL_MARKERS includes [Calling tool: and [TOOL_CALLS] but not bare [func( patterns. If #305 merges first, this will need updating.

  4. Duplicated marker-check logic across two PRs. PR #305 adds _STREAMING_BARE_BRACKET_MARKER and _STREAMING_BARE_BRACKET_PARTIAL, while this PR adds _STREAMING_TOOL_MARKERS and _streaming_tool_markup_possible. These two PRs will conflict. It would be good to coordinate which one lands first and have the other rebase.

Minor

  • The global _tool_parser_instance removal from both streaming paths is good cleanup, matching the move to _get_streaming_tool_parser.
  • The fallback end-of-stream check now uses _streaming_tool_markup_possible(tool_accumulated_text) instead of a hardcoded list of 3 patterns -- this is a nice consolidation.

Solid improvement that closes a real gap in streaming tool call handling. The main concern is the coordination with PR #305 and the performance characteristics of marker scanning on accumulated text.

When tools are present but no explicit parser flags (--enable-auto-tool-choice,
--tool-call-parser) are configured, non-streaming chat completions already fall
back to generic tool parsing.  Streaming skipped this fallback entirely and
leaked raw tool markup as content.

Changes:
- Add _get_streaming_tool_parser() that mirrors the non-streaming fallback:
  use the configured parser when auto tool choice is on, otherwise instantiate
  the generic "auto" parser when tools are present.
- Replace the old "<" in text heuristic with _streaming_tool_markup_possible(),
  which checks for known tool call start markers across model families.
- Extract _tool_choice_disabled() to centralise the tool_choice=="none" check.
- Add regression tests for streaming tool calls without parser flags and for
  plain text streaming with tools present.

Fixes waybarrios#107
@Thump604 Thump604 force-pushed the codex/issue107-generic-streaming-tools branch from c487623 to b888645 Compare April 16, 2026 18:38
@Thump604 Thump604 merged commit a8a3024 into waybarrios:main Apr 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants