Skip to content

[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns#35884

Closed
will-deines wants to merge 1 commit intovllm-project:mainfrom
will-deines:harmony-analysis-tool-calls
Closed

[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns#35884
will-deines wants to merge 1 commit intovllm-project:mainfrom
will-deines:harmony-analysis-tool-calls

Conversation

@will-deines
Copy link
Copy Markdown

Motivation

GPT-OSS Harmony models exhibit two behaviors that stock vLLM handles incorrectly, causing silent misrouting of tool calls and loss of reasoning context in multi-turn tool-calling conversations.

Bug 1: Tool calls on the analysis channel are silently misrouted

GPT-OSS models sometimes emit function calls on the analysis channel instead of commentary. The completed-message parser (harmony_to_response_output) only accepted function calls on commentary, so analysis-channel function calls fell through to _parse_mcp_call(), producing incorrect MCP call output items instead of function tool calls.

This was an inconsistency: parser_state_to_response_output() (streaming) and the in-progress parser already accepted function calls on both channels. Only the completed-message path was missing.

Bug 2: Reasoning lost between tool-calling turns

The openai_harmony library defaults to auto_drop_analysis=True when rendering conversations for completion, stripping all analysis messages. vLLM already has its own auto_drop_analysis_messages() that selectively drops prior-turn analysis while preserving current-turn reasoning. The encoder's blanket drop on top of vLLM's selective drop caused double-filtering that destroyed the model's reasoning context between tool-calling turns.

Changes

Fix 1: Accept function calls on analysis channel (harmony.py)

Widened the channel check in harmony_to_response_output() from == "commentary" to in ("commentary", "analysis"), making the completed-message parser consistent with the streaming and in-progress paths.

Fix 2: Disable encoder-side analysis dropping (harmony_utils.py)

Pass RenderConversationConfig(auto_drop_analysis=False) to the openai_harmony encoder in render_for_completion(). This prevents the encoder from double-dropping analysis messages that vLLM already selectively filters via auto_drop_analysis_messages().

Tests

  • test_analysis_with_function_recipient_creates_function_call — verifies analysis-channel function calls produce ResponseFunctionToolCall, not McpCall
  • test_preserves_analysis — verifies render_for_completion doesn't strip analysis messages
  • test_preserves_reasoning_across_tool_turns — verifies reasoning before a tool call survives rendering through a tool turn

Related Issues / PRs

# Title Relation
#35779 Harmony models incorrectly drops prior-turn analysis channel Primary bug for reasoning loss — this PR fixes the encoder-side half
#35826 Fix: preserve prior-turn analysis messages Fixes vLLM-side auto_drop_analysis_messages() algorithm; complementary to our encoder-side fix
#27653 [RFC] Include past-reasoning for harmony formatting Proposes preserving reasoning in multi-turn; our fix implements the rendering half
#28262 Responses API incorrect input/output handling Channel metadata loss in round-trips
#35540 Fix empty channel/recipient in harmony for /v1/responses Fixes channel/recipient preservation on input; complementary
#32713 [RFC] Unified Parser for reasoning, tool calling Long-term architecture; these fixes are consistent with it

Design Decisions

  1. Why widen the channel check instead of fixing the model? The model's behavior of emitting tool calls on analysis is valid per the Harmony protocol — the streaming and in-progress parsers already handle it. The completed-message parser was the only inconsistent path.

  2. Why disable auto_drop_analysis at the encoder instead of removing auto_drop_analysis_messages()? vLLM's auto_drop_analysis_messages() implements the correct selective dropping policy (only prior-turn analysis before a final message). The encoder's blanket auto_drop_analysis=True is redundant and destructive. Disabling it at the encoder preserves vLLM's intentional filtering while preventing double-drops.

  3. Complementary to fix: preserve prior-turn analysis messages in Harmony multi-turn conversations #35826: That PR fixes the auto_drop_analysis_messages() algorithm itself. This PR fixes the encoder-side double-drop. Both are needed for correct behavior — they address different layers of the same problem.

Test Plan

  • pytest tests/entrypoints/openai/parser/test_harmony_utils.py -v — 59 passed
  • pytest tests/entrypoints/openai/responses/test_harmony_utils.py -v — 22 passed
  • All 81 tests pass with no regressions
  • Pre-commit checks pass

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two important fixes for GPT-OSS Harmony models. First, it correctly handles tool calls made on the analysis channel by updating the completed-message parser to be consistent with the streaming parser, preventing silent misrouting. Second, it preserves the model's reasoning context across multi-turn tool-calling conversations by disabling the openai_harmony encoder's analysis message dropping, thus avoiding a double-filtering issue with vLLM's own logic. The changes are well-justified, clearly implemented, and accompanied by thorough tests that verify the fixes.

…ss turns

Two fixes for GPT-OSS Harmony model behavior:

1. Accept function calls on analysis channel in harmony_to_response_output()
   to match the streaming/in-progress parsers that already handle both channels.

2. Disable openai_harmony encoder's auto_drop_analysis to prevent
   double-filtering with vLLM's auto_drop_analysis_messages(), preserving
   reasoning context between tool-calling turns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants