Skip to content

[Bugfix]: Prevent reasoning_content leak #32997

Open
RohanDisa wants to merge 3 commits intovllm-project:mainfrom
RohanDisa:fix/reasoning-content-leak-in-tool-calls-streaming
Open

[Bugfix]: Prevent reasoning_content leak #32997
RohanDisa wants to merge 3 commits intovllm-project:mainfrom
RohanDisa:fix/reasoning-content-leak-in-tool-calls-streaming

Conversation

@RohanDisa
Copy link
Copy Markdown

@RohanDisa RohanDisa commented Jan 24, 2026

Purpose

Fix a bug where reasoning_content was incorrectly flushed into content in the final streamed chunk when finish_reason='tool_calls'.

Root Cause & Solution

Stream finalization did not clear content/reasoning buffers when finish_reason='tool_calls', allowing leftover reasoning_content (especially from speculative decoding) to leak.
Added guards to clear content/reasoning fields before final chunk creation and after tool call extraction.

Test Plan

  • Added regression test test_no_content_leak_when_finish_reason_tool_calls in tests/entrypoints/openai/test_chat_with_tool_reasoning.py.
  • The test simulates streaming with reasoning + tool_choice="auto" and asserts that the final chunk has no content/reasoning leaks while earlier reasoning is preserved.

Test Result

  • Regression test passes ✅
  • Final chunk with finish_reason="tool_calls" has content=null and no reasoning/reasoning_content

Fixes: #32921

…ool_calls

This fixes a bug where reasoning_content was incorrectly flushed into
the content field in the final streamed chunk when finish_reason='tool_calls'.

The issue occurred when:
- stream=true
- OpenAI tool-call parser enabled
- tool_choice='auto'
- reasoning fields enabled (reasoning, reasoning_content)
- speculative decoding enabled

Per OpenAI's schema contract, when finish_reason='tool_calls', the
response must only contain tool_calls and finish_reason, never content.

Changes:
1. Add guard before final chunk creation to clear content/reasoning
   when finish_reason='tool_calls'
2. Add guards after tool call extraction in all paths (auto, required,
   named, harmony) to prevent content leakage during streaming
3. Ensure reasoning_content is never flushed into content when
   tool calls are present
4. Add test to verify no content leak when finish_reason=tool_calls

Signed-off-by: RohanDisa <105740583+RohanDisa@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a bug where reasoning_content could leak into the content field during streaming with tool calls. The solution, which involves clearing content and reasoning fields at various points, is sound and is well-supported by a new regression test. My review includes a suggestion to ensure consistency in the fix across different code paths and a recommendation to refactor duplicated code to improve long-term maintainability.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

RohanDisa and others added 2 commits January 24, 2026 16:48
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Rohan Disa <105740583+RohanDisa@users.noreply.github.com>
Signed-off-by: RohanDisa <105740583+RohanDisa@users.noreply.github.com>
@RohanDisa RohanDisa changed the title Bugfix: Prevent reasoning_content leak [Bugfix]: Prevent reasoning_content leak Jan 25, 2026
@chaunceyjiang chaunceyjiang self-assigned this Jan 25, 2026
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per OpenAI spec, tool call deltas must not contain content or reasoning

Could you share the link? I couldn’t find any similar documentation or explanation.

temperature=0.0,
stream=True,
tool_choice="auto",
include_reasoning=True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got an unexpected keyword argument 'include_reasoning'

)

# Verify tool_calls are present (the expected behavior)
assert delta.tool_calls is not None and len(delta.tool_calls) > 0, (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertionError: Final chunk with finish_reason='tool_calls' must have tool_calls

I ran this test on your branch and got the error shown above.

"""
Clear content and reasoning fields from a delta message.

Per OpenAI spec, tool call deltas must not contain content or reasoning
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per OpenAI spec, tool call deltas must not contain content or reasoning

Could you share the link? I couldn’t find any similar documentation or explanation.

@bbrowning
Copy link
Copy Markdown
Contributor

I also haven't run across any place that states that reasoning or content should not be in the Chat Completion chunks when tool calls are involved. The spec at https://github.com/openai/openai-openapi/blob/498c71ddf6f1c45b983f972ccabca795da211a3e/openapi.yaml#L18416 doesn't show anything like this, for example. Do you have an example where this was causing problems?

@think-in-universe
Copy link
Copy Markdown

think-in-universe commented Mar 6, 2026

Hey guys, any ideas when this PR and can be merged and fix this issue: #32921

Suffering a lot from this issue recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug]: gpt-oss-20b streaming last reasoning content part into content

4 participants