[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns#35884
Closed
will-deines wants to merge 1 commit intovllm-project:mainfrom
Closed
[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns#35884will-deines wants to merge 1 commit intovllm-project:mainfrom
will-deines wants to merge 1 commit intovllm-project:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces two important fixes for GPT-OSS Harmony models. First, it correctly handles tool calls made on the analysis channel by updating the completed-message parser to be consistent with the streaming parser, preventing silent misrouting. Second, it preserves the model's reasoning context across multi-turn tool-calling conversations by disabling the openai_harmony encoder's analysis message dropping, thus avoiding a double-filtering issue with vLLM's own logic. The changes are well-justified, clearly implemented, and accompanied by thorough tests that verify the fixes.
…ss turns Two fixes for GPT-OSS Harmony model behavior: 1. Accept function calls on analysis channel in harmony_to_response_output() to match the streaming/in-progress parsers that already handle both channels. 2. Disable openai_harmony encoder's auto_drop_analysis to prevent double-filtering with vLLM's auto_drop_analysis_messages(), preserving reasoning context between tool-calling turns.
b7bec35 to
3c7c674
Compare
This was referenced Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
GPT-OSS Harmony models exhibit two behaviors that stock vLLM handles incorrectly, causing silent misrouting of tool calls and loss of reasoning context in multi-turn tool-calling conversations.
Bug 1: Tool calls on the analysis channel are silently misrouted
GPT-OSS models sometimes emit function calls on the
analysischannel instead ofcommentary. The completed-message parser (harmony_to_response_output) only accepted function calls oncommentary, so analysis-channel function calls fell through to_parse_mcp_call(), producing incorrect MCP call output items instead of function tool calls.This was an inconsistency:
parser_state_to_response_output()(streaming) and the in-progress parser already accepted function calls on both channels. Only the completed-message path was missing.Bug 2: Reasoning lost between tool-calling turns
The
openai_harmonylibrary defaults toauto_drop_analysis=Truewhen rendering conversations for completion, stripping all analysis messages. vLLM already has its ownauto_drop_analysis_messages()that selectively drops prior-turn analysis while preserving current-turn reasoning. The encoder's blanket drop on top of vLLM's selective drop caused double-filtering that destroyed the model's reasoning context between tool-calling turns.Changes
Fix 1: Accept function calls on analysis channel (
harmony.py)Widened the channel check in
harmony_to_response_output()from== "commentary"toin ("commentary", "analysis"), making the completed-message parser consistent with the streaming and in-progress paths.Fix 2: Disable encoder-side analysis dropping (
harmony_utils.py)Pass
RenderConversationConfig(auto_drop_analysis=False)to theopenai_harmonyencoder inrender_for_completion(). This prevents the encoder from double-dropping analysis messages that vLLM already selectively filters viaauto_drop_analysis_messages().Tests
test_analysis_with_function_recipient_creates_function_call— verifies analysis-channel function calls produceResponseFunctionToolCall, notMcpCalltest_preserves_analysis— verifiesrender_for_completiondoesn't strip analysis messagestest_preserves_reasoning_across_tool_turns— verifies reasoning before a tool call survives rendering through a tool turnRelated Issues / PRs
auto_drop_analysis_messages()algorithm; complementary to our encoder-side fixDesign Decisions
Why widen the channel check instead of fixing the model? The model's behavior of emitting tool calls on
analysisis valid per the Harmony protocol — the streaming and in-progress parsers already handle it. The completed-message parser was the only inconsistent path.Why disable
auto_drop_analysisat the encoder instead of removingauto_drop_analysis_messages()? vLLM'sauto_drop_analysis_messages()implements the correct selective dropping policy (only prior-turn analysis before afinalmessage). The encoder's blanketauto_drop_analysis=Trueis redundant and destructive. Disabling it at the encoder preserves vLLM's intentional filtering while preventing double-drops.Complementary to fix: preserve prior-turn analysis messages in Harmony multi-turn conversations #35826: That PR fixes the
auto_drop_analysis_messages()algorithm itself. This PR fixes the encoder-side double-drop. Both are needed for correct behavior — they address different layers of the same problem.Test Plan
pytest tests/entrypoints/openai/parser/test_harmony_utils.py -v— 59 passedpytest tests/entrypoints/openai/responses/test_harmony_utils.py -v— 22 passed