Skip to content

Promote tool_call blocks from reasoning to content#433

Merged
janhilgard merged 2 commits into
mainfrom
feat/tool-call-aware-reasoning-parser
Apr 28, 2026
Merged

Promote tool_call blocks from reasoning to content#433
janhilgard merged 2 commits into
mainfrom
feat/tool-call-aware-reasoning-parser

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Summary

  • Teach BaseThinkingReasoningParser to detect <tool_call> blocks inside reasoning and reclassify them as content so the downstream tool parser can find them
  • Fixes Failure Mode B (tool calls inside <think> blocks lost by parser pipeline), confirmed on Qwen 3.5 27B/35B/122B/397B and Qwen 3.6 30B/35B
  • Both non-streaming and streaming paths
  • Related: Fix Qwen3 reasoning tool calls embedded inside think vllm-project/vllm#39055 (non-streaming only, known bugs)

Non-streaming

_promote_tool_calls(reasoning, content) scans reasoning after _extract_complete_reasoning splits it from content:

  • Closed blocks (<tool_call>...</tool_call>): appended to content
  • Unclosed blocks (<tool_call>\s*[\{<]... spanning </think> boundary): prepended to content to reassemble with continuation
  • Structural \s*[\{<] guard on unclosed regex prevents false positives on prose that mentions <tool_call>
  • content = (content or "") handles the content=None case
  • Logs warning when promotion fires (model misbehavior recovery path)

Streaming

Tool-call buffering state machine in the thinking phase:

  • <tool_call> detected: start buffering, stop emitting as reasoning
  • </tool_call> detected: emit buffer as content, resume reasoning
  • </think> while buffering: flush buffer as content, transition to content phase
  • Stream ends while buffering: finalize_stream() flushes as content
  • _transition_to_content applies non-streaming promotion as catch-all for single-delta cases

Test plan

  • 11 non-streaming tests (closed/unclosed/boundary/prose/JSON/multiple/warning)
  • 7 streaming tests (full-text, large-chunk, think-ends-while-buffering, finalize, multiple)
  • 2 composition tests (preserves trailing content, tool_choice=required)
  • 1 composition test (reasoning parser + tool parser end-to-end, skipped without transformers)
  • 121 existing parser tests: 0 regressions

When a thinking model emits <tool_call> XML inside <think> blocks,
the sequential parser pipeline loses the tool call: the reasoning
parser classifies it as reasoning, and the tool parser only sees
content. This is Failure Mode B, confirmed on Qwen 3.5/3.6.

Non-streaming: _promote_tool_calls scans reasoning for closed
<tool_call>...</tool_call> blocks (appended to content) and unclosed
blocks with structural content (prepended to reassemble with content
continuation). Structural guard prevents prose false positives.

Streaming: tool-call buffering in the thinking phase detects
<tool_call> start, accumulates tokens until </tool_call>, then emits
as content. Handles </think> arriving mid-buffer and stream-end
flush. _transition_to_content applies non-streaming promotion as a
catch-all for single-delta cases.

20 tests covering non-streaming, streaming, and composition paths.
121 existing parser tests pass with 0 regressions.
@Thump604 Thump604 requested a review from janhilgard April 25, 2026 17:09
Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #433 — Tool call promotion from reasoning to content

Clean design. The two-path approach (regex for non-streaming, state machine for streaming) is the right call, and the \s*[\{<] structural guard on the unclosed regex is a smart way to avoid false positives on prose.

Should fix

1. Ambiguous ternary at line 514 — add explicit parentheses

final_content = promoted + (
    content_msg.content or "" if content_msg else ""
)

Python's grammar makes this (content_msg.content or "") if content_msg else "" (safe), but the expression reads as if it could be content_msg.content or ("" if content_msg else "") (AttributeError when content_msg is None). Explicit parentheses would remove all doubt:

final_content = promoted + (
    (content_msg.content or "") if content_msg else ""
)

Same pattern at line 552 but there the + makes operator precedence unambiguous.

2. Missing test: </tool_call></think> with no trailing content

The _thinking_tool_call path where </tool_call> and </think> are in the same delta AND after_think is empty has no test coverage. This exercises the content_msg = ... if after_think else None branch:

def test_stream_tool_call_closed_immediately_before_think_end(self, parser):
    text = (
        "<think>R\n"
        "<tool_call>\n"
        "<function=f><parameter=x>1</parameter></function>\n"
        "</tool_call></think>"
    )
    reasoning, content = self._stream(parser, text)
    assert content is not None
    assert "<tool_call>" in content

3. Missing test: _transition_to_content defensive promotion

The _transition_to_content path also calls _promote_tool_calls as a catch-all. No test exercises this path specifically (where a single large delta contains <think>...<tool_call>...</tool_call>...</think> and hits _transition_to_content directly without going through the streaming tool call state machine).

Minor / nice to have

  • finalize_stream() is clean — if stream ends mid-tool-call, buffer is flushed as content. Good edge case handling.
  • Closed blocks appended after existing content, unclosed prepended — correct for the reassembly scenario. The test_tool_call_spanning_think_boundary test validates this nicely.
  • Warning logging on promotion is useful for operators debugging model misbehavior.
  • Regex _TOOL_CALL_CLOSED_RE uses non-greedy .*? — correct for multiple tool calls.
  • The test_prose_mention_not_promoted test verifies the structural guard works.

Overall

This is solid work — the state machine is well-structured, the regex guards prevent false positives, and 21 tests cover the important paths. CI 9/9 pass. The parentheses fix (#1) and the two test gap additions (#2, #3) are straightforward.

1. Add explicit parentheses on ambiguous ternary at line 353
2. Add test for </tool_call></think> with no trailing content
3. Add test for single-delta _transition_to_content catch-all path
@Thump604
Copy link
Copy Markdown
Collaborator Author

All three items addressed in ff62be4:

  1. Explicit parentheses on the ambiguous ternary at line 353
  2. Added test_stream_tool_call_closed_immediately_before_think_end -- exercises the after_think empty branch
  3. Added test_stream_single_delta_promotion_via_transition -- exercises the _transition_to_content catch-all path directly

142 passed, 1 skipped (transformers).

@janhilgard
Copy link
Copy Markdown
Collaborator

@Thump604 Looks good, all three addressed. Thanks for the quick turnaround.

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three items addressed in ff62be4. LGTM.

@Thump604 Thump604 requested a review from waybarrios April 25, 2026 18:10
@Thump604
Copy link
Copy Markdown
Collaborator Author

@waybarrios This has been reviewed and approved by @janhilgard. When you have a chance, could you take a look and merge if it looks good?

@Thump604
Copy link
Copy Markdown
Collaborator Author

Approved by Jan, CI green. Wayner, review and merge when you have a moment?

@janhilgard janhilgard merged commit 9a6253a into main Apr 28, 2026
9 checks passed
@janhilgard janhilgard deleted the feat/tool-call-aware-reasoning-parser branch April 28, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants