Promote tool_call blocks from reasoning to content by Thump604 · Pull Request #433 · waybarrios/vllm-mlx

Thump604 · 2026-04-25T17:09:50Z

Summary

Teach BaseThinkingReasoningParser to detect <tool_call> blocks inside reasoning and reclassify them as content so the downstream tool parser can find them
Fixes Failure Mode B (tool calls inside <think> blocks lost by parser pipeline), confirmed on Qwen 3.5 27B/35B/122B/397B and Qwen 3.6 30B/35B
Both non-streaming and streaming paths
Related: Fix Qwen3 reasoning tool calls embedded inside think vllm-project/vllm#39055 (non-streaming only, known bugs)

Non-streaming

_promote_tool_calls(reasoning, content) scans reasoning after _extract_complete_reasoning splits it from content:

Closed blocks (<tool_call>...</tool_call>): appended to content
Unclosed blocks (<tool_call>\s*[\{<]... spanning </think> boundary): prepended to content to reassemble with continuation
Structural \s*[\{<] guard on unclosed regex prevents false positives on prose that mentions <tool_call>
content = (content or "") handles the content=None case
Logs warning when promotion fires (model misbehavior recovery path)

Streaming

Tool-call buffering state machine in the thinking phase:

<tool_call> detected: start buffering, stop emitting as reasoning
</tool_call> detected: emit buffer as content, resume reasoning
</think> while buffering: flush buffer as content, transition to content phase
Stream ends while buffering: finalize_stream() flushes as content
_transition_to_content applies non-streaming promotion as catch-all for single-delta cases

Test plan

11 non-streaming tests (closed/unclosed/boundary/prose/JSON/multiple/warning)
7 streaming tests (full-text, large-chunk, think-ends-while-buffering, finalize, multiple)
2 composition tests (preserves trailing content, tool_choice=required)
1 composition test (reasoning parser + tool parser end-to-end, skipped without transformers)
121 existing parser tests: 0 regressions

When a thinking model emits <tool_call> XML inside <think> blocks, the sequential parser pipeline loses the tool call: the reasoning parser classifies it as reasoning, and the tool parser only sees content. This is Failure Mode B, confirmed on Qwen 3.5/3.6. Non-streaming: _promote_tool_calls scans reasoning for closed <tool_call>...</tool_call> blocks (appended to content) and unclosed blocks with structural content (prepended to reassemble with content continuation). Structural guard prevents prose false positives. Streaming: tool-call buffering in the thinking phase detects <tool_call> start, accumulates tokens until </tool_call>, then emits as content. Handles </think> arriving mid-buffer and stream-end flush. _transition_to_content applies non-streaming promotion as a catch-all for single-delta cases. 20 tests covering non-streaming, streaming, and composition paths. 121 existing parser tests pass with 0 regressions.

janhilgard

Review: PR #433 — Tool call promotion from reasoning to content

Clean design. The two-path approach (regex for non-streaming, state machine for streaming) is the right call, and the \s*[\{<] structural guard on the unclosed regex is a smart way to avoid false positives on prose.

Should fix

1. Ambiguous ternary at line 514 — add explicit parentheses

final_content = promoted + (
    content_msg.content or "" if content_msg else ""
)

Python's grammar makes this (content_msg.content or "") if content_msg else "" (safe), but the expression reads as if it could be content_msg.content or ("" if content_msg else "") (AttributeError when content_msg is None). Explicit parentheses would remove all doubt:

final_content = promoted + (
    (content_msg.content or "") if content_msg else ""
)

Same pattern at line 552 but there the + makes operator precedence unambiguous.

2. Missing test: `</tool_call></think>` with no trailing content

The _thinking_tool_call path where </tool_call> and </think> are in the same delta AND after_think is empty has no test coverage. This exercises the content_msg = ... if after_think else None branch:

def test_stream_tool_call_closed_immediately_before_think_end(self, parser):
    text = (
        "<think>R\n"
        "<tool_call>\n"
        "<function=f><parameter=x>1</parameter></function>\n"
        "</tool_call></think>"
    )
    reasoning, content = self._stream(parser, text)
    assert content is not None
    assert "<tool_call>" in content

3. Missing test: `_transition_to_content` defensive promotion

The _transition_to_content path also calls _promote_tool_calls as a catch-all. No test exercises this path specifically (where a single large delta contains <think>...<tool_call>...</tool_call>...</think> and hits _transition_to_content directly without going through the streaming tool call state machine).

Minor / nice to have

finalize_stream() is clean — if stream ends mid-tool-call, buffer is flushed as content. Good edge case handling.
Closed blocks appended after existing content, unclosed prepended — correct for the reassembly scenario. The test_tool_call_spanning_think_boundary test validates this nicely.
Warning logging on promotion is useful for operators debugging model misbehavior.
Regex _TOOL_CALL_CLOSED_RE uses non-greedy .*? — correct for multiple tool calls.
The test_prose_mention_not_promoted test verifies the structural guard works.

Overall

This is solid work — the state machine is well-structured, the regex guards prevent false positives, and 21 tests cover the important paths. CI 9/9 pass. The parentheses fix (#1) and the two test gap additions (#2, #3) are straightforward.

1. Add explicit parentheses on ambiguous ternary at line 353 2. Add test for </tool_call></think> with no trailing content 3. Add test for single-delta _transition_to_content catch-all path

Thump604 · 2026-04-25T17:26:58Z

All three items addressed in ff62be4:

Explicit parentheses on the ambiguous ternary at line 353
Added test_stream_tool_call_closed_immediately_before_think_end -- exercises the after_think empty branch
Added test_stream_single_delta_promotion_via_transition -- exercises the _transition_to_content catch-all path directly

142 passed, 1 skipped (transformers).

janhilgard · 2026-04-25T17:30:34Z

@Thump604 Looks good, all three addressed. Thanks for the quick turnaround.

janhilgard

All three items addressed in ff62be4. LGTM.

Thump604 · 2026-04-25T18:10:52Z

@waybarrios This has been reviewed and approved by @janhilgard. When you have a chance, could you take a look and merge if it looks good?

Thump604 · 2026-04-26T18:07:33Z

Approved by Jan, CI green. Wayner, review and merge when you have a moment?

Thump604 requested a review from janhilgard April 25, 2026 17:09

janhilgard reviewed Apr 25, 2026

View reviewed changes

Address review: explicit ternary parens, two missing tests

ff62be4

1. Add explicit parentheses on ambiguous ternary at line 353 2. Add test for </tool_call></think> with no trailing content 3. Add test for single-delta _transition_to_content catch-all path

janhilgard approved these changes Apr 25, 2026

View reviewed changes

Thump604 requested a review from waybarrios April 25, 2026 18:10

Thump604 mentioned this pull request Apr 28, 2026

Add --max-kv-size CLI flag for per-sequence KV cache cap #455

Merged

4 tasks

janhilgard merged commit 9a6253a into main Apr 28, 2026
9 checks passed

janhilgard deleted the feat/tool-call-aware-reasoning-parser branch April 28, 2026 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote tool_call blocks from reasoning to content#433

Promote tool_call blocks from reasoning to content#433
janhilgard merged 2 commits into
mainfrom
feat/tool-call-aware-reasoning-parser

Thump604 commented Apr 25, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

janhilgard commented Apr 25, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

Thump604 commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented Apr 25, 2026

Summary

Non-streaming

Streaming

Test plan

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Review: PR #433 — Tool call promotion from reasoning to content

Should fix

1. Ambiguous ternary at line 514 — add explicit parentheses

2. Missing test: </tool_call></think> with no trailing content

3. Missing test: _transition_to_content defensive promotion

Minor / nice to have

Overall

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

janhilgard commented Apr 25, 2026

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Uh oh!

Thump604 commented Apr 25, 2026

Uh oh!

Thump604 commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. Missing test: `</tool_call></think>` with no trailing content

3. Missing test: `_transition_to_content` defensive promotion