Promote tool_call blocks from reasoning to content#433
Conversation
When a thinking model emits <tool_call> XML inside <think> blocks, the sequential parser pipeline loses the tool call: the reasoning parser classifies it as reasoning, and the tool parser only sees content. This is Failure Mode B, confirmed on Qwen 3.5/3.6. Non-streaming: _promote_tool_calls scans reasoning for closed <tool_call>...</tool_call> blocks (appended to content) and unclosed blocks with structural content (prepended to reassemble with content continuation). Structural guard prevents prose false positives. Streaming: tool-call buffering in the thinking phase detects <tool_call> start, accumulates tokens until </tool_call>, then emits as content. Handles </think> arriving mid-buffer and stream-end flush. _transition_to_content applies non-streaming promotion as a catch-all for single-delta cases. 20 tests covering non-streaming, streaming, and composition paths. 121 existing parser tests pass with 0 regressions.
janhilgard
left a comment
There was a problem hiding this comment.
Review: PR #433 — Tool call promotion from reasoning to content
Clean design. The two-path approach (regex for non-streaming, state machine for streaming) is the right call, and the \s*[\{<] structural guard on the unclosed regex is a smart way to avoid false positives on prose.
Should fix
1. Ambiguous ternary at line 514 — add explicit parentheses
final_content = promoted + (
content_msg.content or "" if content_msg else ""
)Python's grammar makes this (content_msg.content or "") if content_msg else "" (safe), but the expression reads as if it could be content_msg.content or ("" if content_msg else "") (AttributeError when content_msg is None). Explicit parentheses would remove all doubt:
final_content = promoted + (
(content_msg.content or "") if content_msg else ""
)Same pattern at line 552 but there the + makes operator precedence unambiguous.
2. Missing test: </tool_call></think> with no trailing content
The _thinking_tool_call path where </tool_call> and </think> are in the same delta AND after_think is empty has no test coverage. This exercises the content_msg = ... if after_think else None branch:
def test_stream_tool_call_closed_immediately_before_think_end(self, parser):
text = (
"<think>R\n"
"<tool_call>\n"
"<function=f><parameter=x>1</parameter></function>\n"
"</tool_call></think>"
)
reasoning, content = self._stream(parser, text)
assert content is not None
assert "<tool_call>" in content3. Missing test: _transition_to_content defensive promotion
The _transition_to_content path also calls _promote_tool_calls as a catch-all. No test exercises this path specifically (where a single large delta contains <think>...<tool_call>...</tool_call>...</think> and hits _transition_to_content directly without going through the streaming tool call state machine).
Minor / nice to have
finalize_stream()is clean — if stream ends mid-tool-call, buffer is flushed as content. Good edge case handling.- Closed blocks appended after existing content, unclosed prepended — correct for the reassembly scenario. The
test_tool_call_spanning_think_boundarytest validates this nicely. - Warning logging on promotion is useful for operators debugging model misbehavior.
- Regex
_TOOL_CALL_CLOSED_REuses non-greedy.*?— correct for multiple tool calls. - The
test_prose_mention_not_promotedtest verifies the structural guard works.
Overall
This is solid work — the state machine is well-structured, the regex guards prevent false positives, and 21 tests cover the important paths. CI 9/9 pass. The parentheses fix (#1) and the two test gap additions (#2, #3) are straightforward.
1. Add explicit parentheses on ambiguous ternary at line 353 2. Add test for </tool_call></think> with no trailing content 3. Add test for single-delta _transition_to_content catch-all path
|
All three items addressed in ff62be4:
142 passed, 1 skipped (transformers). |
|
@Thump604 Looks good, all three addressed. Thanks for the quick turnaround. |
janhilgard
left a comment
There was a problem hiding this comment.
All three items addressed in ff62be4. LGTM.
|
@waybarrios This has been reviewed and approved by @janhilgard. When you have a chance, could you take a look and merge if it looks good? |
|
Approved by Jan, CI green. Wayner, review and merge when you have a moment? |
Summary
BaseThinkingReasoningParserto detect<tool_call>blocks inside reasoning and reclassify them as content so the downstream tool parser can find them<think>blocks lost by parser pipeline), confirmed on Qwen 3.5 27B/35B/122B/397B and Qwen 3.6 30B/35BNon-streaming
_promote_tool_calls(reasoning, content)scans reasoning after_extract_complete_reasoningsplits it from content:<tool_call>...</tool_call>): appended to content<tool_call>\s*[\{<]...spanning</think>boundary): prepended to content to reassemble with continuation\s*[\{<]guard on unclosed regex prevents false positives on prose that mentions<tool_call>content = (content or "")handles thecontent=NonecaseStreaming
Tool-call buffering state machine in the
thinkingphase:<tool_call>detected: start buffering, stop emitting as reasoning</tool_call>detected: emit buffer as content, resume reasoning</think>while buffering: flush buffer as content, transition to content phasefinalize_stream()flushes as content_transition_to_contentapplies non-streaming promotion as catch-all for single-delta casesTest plan