[Bugfix]: Prevent reasoning_content leak #32997
[Bugfix]: Prevent reasoning_content leak #32997RohanDisa wants to merge 3 commits intovllm-project:mainfrom
Conversation
…ool_calls This fixes a bug where reasoning_content was incorrectly flushed into the content field in the final streamed chunk when finish_reason='tool_calls'. The issue occurred when: - stream=true - OpenAI tool-call parser enabled - tool_choice='auto' - reasoning fields enabled (reasoning, reasoning_content) - speculative decoding enabled Per OpenAI's schema contract, when finish_reason='tool_calls', the response must only contain tool_calls and finish_reason, never content. Changes: 1. Add guard before final chunk creation to clear content/reasoning when finish_reason='tool_calls' 2. Add guards after tool call extraction in all paths (auto, required, named, harmony) to prevent content leakage during streaming 3. Ensure reasoning_content is never flushed into content when tool calls are present 4. Add test to verify no content leak when finish_reason=tool_calls Signed-off-by: RohanDisa <105740583+RohanDisa@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a bug where reasoning_content could leak into the content field during streaming with tool calls. The solution, which involves clearing content and reasoning fields at various points, is sound and is well-supported by a new regression test. My review includes a suggestion to ensure consistency in the fix across different code paths and a recommendation to refactor duplicated code to improve long-term maintainability.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Rohan Disa <105740583+RohanDisa@users.noreply.github.com>
Signed-off-by: RohanDisa <105740583+RohanDisa@users.noreply.github.com>
chaunceyjiang
left a comment
There was a problem hiding this comment.
Per OpenAI spec, tool call deltas must not contain content or reasoning
Could you share the link? I couldn’t find any similar documentation or explanation.
| temperature=0.0, | ||
| stream=True, | ||
| tool_choice="auto", | ||
| include_reasoning=True, |
There was a problem hiding this comment.
got an unexpected keyword argument 'include_reasoning'
| ) | ||
|
|
||
| # Verify tool_calls are present (the expected behavior) | ||
| assert delta.tool_calls is not None and len(delta.tool_calls) > 0, ( |
There was a problem hiding this comment.
AssertionError: Final chunk with finish_reason='tool_calls' must have tool_calls
I ran this test on your branch and got the error shown above.
| """ | ||
| Clear content and reasoning fields from a delta message. | ||
|
|
||
| Per OpenAI spec, tool call deltas must not contain content or reasoning |
There was a problem hiding this comment.
Per OpenAI spec, tool call deltas must not contain content or reasoning
Could you share the link? I couldn’t find any similar documentation or explanation.
|
I also haven't run across any place that states that reasoning or content should not be in the Chat Completion chunks when tool calls are involved. The spec at https://github.com/openai/openai-openapi/blob/498c71ddf6f1c45b983f972ccabca795da211a3e/openapi.yaml#L18416 doesn't show anything like this, for example. Do you have an example where this was causing problems? |
|
Hey guys, any ideas when this PR and can be merged and fix this issue: #32921 Suffering a lot from this issue recently. |
Purpose
Fix a bug where reasoning_content was incorrectly flushed into content in the final streamed chunk when finish_reason='tool_calls'.
Root Cause & Solution
Stream finalization did not clear content/reasoning buffers when finish_reason='tool_calls', allowing leftover reasoning_content (especially from speculative decoding) to leak.
Added guards to clear content/reasoning fields before final chunk creation and after tool call extraction.
Test Plan
test_no_content_leak_when_finish_reason_tool_callsintests/entrypoints/openai/test_chat_with_tool_reasoning.py.Test Result
Fixes: #32921