[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns by will-deines · Pull Request #35884 · vllm-project/vllm

will-deines · 2026-03-03T17:00:29Z

Motivation

GPT-OSS Harmony models exhibit two behaviors that stock vLLM handles incorrectly, causing silent misrouting of tool calls and loss of reasoning context in multi-turn tool-calling conversations.

Bug 1: Tool calls on the analysis channel are silently misrouted

GPT-OSS models sometimes emit function calls on the analysis channel instead of commentary. The completed-message parser (harmony_to_response_output) only accepted function calls on commentary, so analysis-channel function calls fell through to _parse_mcp_call(), producing incorrect MCP call output items instead of function tool calls.

This was an inconsistency: parser_state_to_response_output() (streaming) and the in-progress parser already accepted function calls on both channels. Only the completed-message path was missing.

Bug 2: Reasoning lost between tool-calling turns

The openai_harmony library defaults to auto_drop_analysis=True when rendering conversations for completion, stripping all analysis messages. vLLM already has its own auto_drop_analysis_messages() that selectively drops prior-turn analysis while preserving current-turn reasoning. The encoder's blanket drop on top of vLLM's selective drop caused double-filtering that destroyed the model's reasoning context between tool-calling turns.

Changes

Fix 1: Accept function calls on analysis channel (`harmony.py`)

Widened the channel check in harmony_to_response_output() from == "commentary" to in ("commentary", "analysis"), making the completed-message parser consistent with the streaming and in-progress paths.

Fix 2: Disable encoder-side analysis dropping (`harmony_utils.py`)

Pass RenderConversationConfig(auto_drop_analysis=False) to the openai_harmony encoder in render_for_completion(). This prevents the encoder from double-dropping analysis messages that vLLM already selectively filters via auto_drop_analysis_messages().

Tests

test_analysis_with_function_recipient_creates_function_call — verifies analysis-channel function calls produce ResponseFunctionToolCall, not McpCall
test_preserves_analysis — verifies render_for_completion doesn't strip analysis messages
test_preserves_reasoning_across_tool_turns — verifies reasoning before a tool call survives rendering through a tool turn

Related Issues / PRs

#	Title	Relation
#35779	Harmony models incorrectly drops prior-turn analysis channel	Primary bug for reasoning loss — this PR fixes the encoder-side half
#35826	Fix: preserve prior-turn analysis messages	Fixes vLLM-side `auto_drop_analysis_messages()` algorithm; complementary to our encoder-side fix
#27653	[RFC] Include past-reasoning for harmony formatting	Proposes preserving reasoning in multi-turn; our fix implements the rendering half
#28262	Responses API incorrect input/output handling	Channel metadata loss in round-trips
#35540	Fix empty channel/recipient in harmony for /v1/responses	Fixes channel/recipient preservation on input; complementary
#32713	[RFC] Unified Parser for reasoning, tool calling	Long-term architecture; these fixes are consistent with it

Design Decisions

Why widen the channel check instead of fixing the model? The model's behavior of emitting tool calls on analysis is valid per the Harmony protocol — the streaming and in-progress parsers already handle it. The completed-message parser was the only inconsistent path.
Why disable auto_drop_analysis at the encoder instead of removing auto_drop_analysis_messages()? vLLM's auto_drop_analysis_messages() implements the correct selective dropping policy (only prior-turn analysis before a final message). The encoder's blanket auto_drop_analysis=True is redundant and destructive. Disabling it at the encoder preserves vLLM's intentional filtering while preventing double-drops.
Complementary to fix: preserve prior-turn analysis messages in Harmony multi-turn conversations #35826: That PR fixes the auto_drop_analysis_messages() algorithm itself. This PR fixes the encoder-side double-drop. Both are needed for correct behavior — they address different layers of the same problem.

Test Plan

pytest tests/entrypoints/openai/parser/test_harmony_utils.py -v — 59 passed
pytest tests/entrypoints/openai/responses/test_harmony_utils.py -v — 22 passed
All 81 tests pass with no regressions
Pre-commit checks pass

gemini-code-assist

Code Review

This pull request introduces two important fixes for GPT-OSS Harmony models. First, it correctly handles tool calls made on the analysis channel by updating the completed-message parser to be consistent with the streaming parser, preventing silent misrouting. Second, it preserves the model's reasoning context across multi-turn tool-calling conversations by disabling the openai_harmony encoder's analysis message dropping, thus avoiding a double-filtering issue with vLLM's own logic. The changes are well-justified, clearly implemented, and accompanied by thorough tests that verify the fixes.

…ss turns Two fixes for GPT-OSS Harmony model behavior: 1. Accept function calls on analysis channel in harmony_to_response_output() to match the streaming/in-progress parsers that already handle both channels. 2. Disable openai_harmony encoder's auto_drop_analysis to prevent double-filtering with vLLM's auto_drop_analysis_messages(), preserving reasoning context between tool-calling turns.

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Mar 3, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 3, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

will-deines force-pushed the harmony-analysis-tool-calls branch from b7bec35 to 3c7c674 Compare March 3, 2026 17:05

will-deines closed this Mar 3, 2026

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Mar 3, 2026

This was referenced Mar 3, 2026

[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns #35902

Closed

[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns #35907

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns#35884

[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns#35884
will-deines wants to merge 1 commit intovllm-project:mainfrom
will-deines:harmony-analysis-tool-calls

will-deines commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

will-deines commented Mar 3, 2026

Motivation

Bug 1: Tool calls on the analysis channel are silently misrouted

Bug 2: Reasoning lost between tool-calling turns

Changes

Fix 1: Accept function calls on analysis channel (harmony.py)

Fix 2: Disable encoder-side analysis dropping (harmony_utils.py)

Tests

Related Issues / PRs

Design Decisions

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix 1: Accept function calls on analysis channel (`harmony.py`)

Fix 2: Disable encoder-side analysis dropping (`harmony_utils.py`)