[Bugfix] Fix Harmony preamble visibility in Responses API by thepushkarp · Pull Request #32114 · vllm-project/vllm

thepushkarp · 2026-01-11T12:40:51Z

Summary

Per the Harmony spec, preambles (commentary channel messages with no recipient) are "intended to be shown to end-users". vLLM was incorrectly treating them as hidden reasoning across multiple code paths.

This PR fixes preamble visibility in 6 code paths across 3 files:

Parser (harmony_utils.py):

parse_output_message(): route preambles to ResponseOutputMessage instead of ResponseReasoningItem
parse_remaining_state(): return visible ResponseOutputMessage (streaming) instead of hidden reasoning
parse_chat_output(): include preambles in final_texts (Chat Completions API), exclude tool call JSON that was leaking into visible content

Streaming events (streaming_events.py):

emit_content_delta_events(): emit response.output_text.delta for preamble tokens (were silently dropped)
emit_previous_item_done_events(): emit text done events for completed preambles

Token counting (context.py):

_update_num_reasoning_tokens(): exclude preamble tokens from reasoning_tokens count

Channel config (harmony_utils.py):

get_system_message(): stop stripping commentary from valid channels when with_custom_tools=False. The spec always declares all 3 channels; without commentary the model cannot generate preambles in built-in-tool-only sessions (web_search, code_interpreter).

Before/After

The model generates a preamble to preview an upcoming tool call:

<|channel|>commentary
<|message|>I'll search for the weather in SF.<|end|>
<|start|>assistant to=functions.web_search
<|channel|>commentary
<|message|>{"query": "weather in SF"}<|end|>

Responses API — before:

{"type": "reasoning", "content": [{"type": "reasoning_text", "text": "I'll search for the weather in SF."}]}

Preamble hidden as reasoning — user never sees it.

Responses API — after:

{"type": "message", "content": [{"type": "output_text", "text": "I'll search for the weather in SF."}]}

Preamble visible as a message.

Streaming — before: Preamble tokens emitted no SSE events (silently dropped).
Streaming — after: Preamble tokens emit response.output_text.delta events, same as final channel text.

Chat Completions — before: content included tool call JSON ({"query": "weather in SF"}) because the filter was channel != "analysis".
Chat Completions — after: content only includes preambles and final text; tool call payloads excluded.

Token counting — before: Preamble tokens counted as reasoning_tokens.
Token counting — after: Preamble tokens counted as regular output tokens.

Channel config — before: commentary stripped from valid channels when no custom tools — model can't generate preambles at all.
Channel config — after: All 3 channels always present per spec.

Note

Makes Harmony preambles user-visible and adjusts parser behavior accordingly.

Parse commentary with no recipient as ResponseOutputMessage (visible content) rather than ResponseReasoningItem (hidden reasoning_content)
Update parse_remaining_state to emit message (status incomplete) for commentary preambles; keep analysis as reasoning
Clarify built-in tools (python, browser, container) return reasoning; non-builtin recipients become MCP calls; functions remain function_call
Tests updated to expect ResponseOutputMessage for preambles, single message with multiple contents, and streaming status; added coverage for parser edge cases

^{Written by Cursor Bugbot for commit 50cc9b1. This will update automatically on new commits. Configure here.}

Note

Makes Harmony preambles visible to users and aligns commentary parsing with the Harmony spec.

Commentary with no recipient now emits a ResponseOutputMessage (visible content) rather than ResponseReasoningItem; analysis remains reasoning
Streaming parser updated: commentary preambles return a message with status="incomplete"; built-in tools (python, browser, container) explicitly return reasoning; MCP parsing unchanged
Tests updated/added to cover preamble visibility, multi-content aggregation into a single message, and streaming states

^{Written by Cursor Bugbot for commit 8711d3d. This will update automatically on new commits. Configure here.}

github-actions · 2026-01-11T12:40:59Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request correctly addresses a bug in the handling of Harmony format preambles in the Responses API. Previously, preambles (commentary messages with no recipient) were incorrectly treated as hidden reasoning content. The changes ensure they are now parsed as visible ResponseOutputMessage items, aligning with the intended behavior of showing them to end-users.

The fix is implemented consistently for both complete messages in parse_output_message and for streaming scenarios in parse_remaining_state. The logic is sound, and the accompanying test updates in test_harmony_utils.py are thorough, including new assertions for the corrected output type and a new test case for streaming preambles. The changes appear correct and well-tested.

chaunceyjiang · 2026-01-12T05:39:05Z

/cc @qandrew PTAL.

qandrew · 2026-01-12T18:16:54Z

@thepushkarp can you add in the PR description what an example request / response would look like, before / after this change? Would make it easier to review and understand

thepushkarp · 2026-02-22T20:36:52Z

Updated the description and added a few more fixes @qandrew!

sorry for the delay (๑•́ㅿ•̀๑)ᔆᵒʳʳᵞ

vllm/entrypoints/openai/parser/harmony_utils.py

qandrew

lgtm, thanks for the updates! It would be nice to merge chatCompletion / responsesAPI code paths to simplify our logic too

cc @DarkLight1337 @chaunceyjiang to merge

chaunceyjiang · 2026-02-24T02:29:01Z

Thanks~ @thepushkarp I’ll take a look at this today.

chaunceyjiang · 2026-02-24T06:14:37Z

@thepushkarp The CI failed — could you take a look?

thepushkarp · 2026-02-24T09:49:30Z

Hey @chaunceyjiang, the previous error was fixed. There are still some build failures from tests unrelated to the changes in this PR. Can you please check and let me know what to do for them?

mergify · 2026-02-25T04:07:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @thepushkarp.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Per the OpenAI Harmony specification, preambles (commentary channel messages with recipient=None) are intended to be shown to end-users, not hidden as reasoning content. This fix changes how commentary-channel preambles are parsed: - From: ResponseReasoningItem (hidden) - To: ResponseOutputMessage (visible) Affects both batch (parse_output_message) and streaming (parse_remaining_state) code paths. Migration note: Preambles now appear in `content` instead of `reasoning_content`. This is a breaking change for clients that relied on the previous (incorrect) behavior. Signed-off-by: Pushkar Patel <git@thepushkarp.com>

Update parse_chat_output() to explicitly handle preambles: - analysis channel → reasoning (hidden) - commentary without recipient (preambles) → content (visible) - final channel → content (visible) - commentary with recipient (tool calls) → excluded (handled by tool parser) This makes the Chat Completions API consistent with the Responses API fix for preamble visibility per the Harmony specification. Signed-off-by: Pushkar Patel <git@thepushkarp.com>

…, and channel config Extend preamble fix to remaining code paths not covered by prior commits: - emit_content_delta_events(): emit output_text.delta for preambles - emit_previous_item_done_events(): emit text done events for preambles - _update_num_reasoning_tokens(): exclude preambles from reasoning count - get_system_message(): stop stripping commentary from valid channels when with_custom_tools=False (spec always declares all 3 channels) Add tests covering all new preamble paths: streaming events (5 tests), token counting (2 tests), parse_chat_output edge cases (3 tests), get_system_message channel config (3 tests), and integration test update. Signed-off-by: Pushkar Patel <git@thepushkarp.com>

…put_message() Signed-off-by: pupa <pupa@users.noreply.github.com> Signed-off-by: Pushkar Patel <git@thepushkarp.com>

The removal of the commentary-channel-stripping hack from get_system_message() means python tool calls now correctly use the "commentary" channel per Harmony spec. Signed-off-by: Pushkar Patel <git@thepushkarp.com>

…e-visibility

- Remove the `if not with_custom_tools` block in `get_system_message()` that stripped the commentary channel from valid channels. This was preventing the model from ever generating preambles, making all downstream preamble visibility fixes inert. - Add `and group.recipient is None` guard to `extract_harmony_streaming_delta()` so only true preambles (commentary with no recipient) are treated as user-visible content. Commentary targeted at browser.*/container recipients was incorrectly leaking into combined_content. - Fix stale import: rename HarmonyStreamingState → StreamingState in test_serving_responses.py to match the actual class name. Signed-off-by: Pushkar Patel <git@thepushkarp.com>

Move browser.search from preamble test to invalid-inputs test, matching the tightened guard that only treats commentary with no recipient as user-visible content. Also rename test to clarify intent. Signed-off-by: Pushkar Patel <git@thepushkarp.com>

thepushkarp · 2026-02-25T15:58:07Z

Hi @chaunceyjiang

The remaining tests are passing, except for two in the v1/e2e/test_spec_decode.py module. Please advise on what to do here.

DarkLight1337 · 2026-02-25T16:08:17Z

Force-merging

sfeng33 · 2026-02-25T16:31:43Z

lgtm, thanks for the updates! It would be nice to merge chatCompletion / responsesAPI code paths to simplify our logic too

cc @DarkLight1337 @chaunceyjiang to merge

I can work on merging the harmony parsing overlaps on chatCompletion & responsesAPI, there are many duplicate routing logic.

…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>

…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>

thepushkarp requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners January 11, 2026 12:40

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Jan 11, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Jan 11, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Jan 11, 2026

gemini-code-assist bot reviewed Jan 11, 2026

View reviewed changes

mergify bot added the bug Something isn't working label Jan 13, 2026

thepushkarp requested a review from russellb as a code owner February 22, 2026 20:17

qandrew reviewed Feb 23, 2026

View reviewed changes

vllm/entrypoints/openai/parser/harmony_utils.py Outdated Show resolved Hide resolved

qandrew approved these changes Feb 23, 2026

View reviewed changes

chaunceyjiang self-assigned this Feb 24, 2026

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2026

mergify bot added the needs-rebase label Feb 25, 2026

thepushkarp added 5 commits February 25, 2026 09:51

refactor: extract _parse_message_no_recipient() helper from parse_out…

9639408

…put_message() Signed-off-by: pupa <pupa@users.noreply.github.com> Signed-off-by: Pushkar Patel <git@thepushkarp.com>

fix: update MCP tool test to expect commentary channel

e931976

The removal of the commentary-channel-stripping hack from get_system_message() means python tool calls now correctly use the "commentary" channel per Harmony spec. Signed-off-by: Pushkar Patel <git@thepushkarp.com>

thepushkarp force-pushed the fix/harmony-preamble-visibility branch from ce23610 to e931976 Compare February 25, 2026 04:26

mergify bot removed the needs-rebase label Feb 25, 2026

thepushkarp added 3 commits February 25, 2026 10:17

Merge remote-tracking branch 'upstream/main' into fix/harmony-preambl…

c639a01

…e-visibility

vllm-bot merged commit 5d18bf8 into vllm-project:main Feb 25, 2026
48 of 51 checks passed

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Feb 25, 2026

will-deines mentioned this pull request Mar 4, 2026

[Bugfix] Fix Harmony streaming cross-channel delta accumulation #36011

Open

5 tasks

will-deines mentioned this pull request Mar 18, 2026

[Harmony] Fix analysis-channel tool calls and preserve reasoning across turns #35907

Open

4 tasks

Uh oh!

Conversation

thepushkarp commented Jan 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before/After

Uh oh!

github-actions bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chaunceyjiang commented Jan 12, 2026

Uh oh!

qandrew commented Jan 12, 2026

Uh oh!

thepushkarp commented Feb 22, 2026

Uh oh!

Uh oh!

qandrew left a comment

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang commented Feb 24, 2026

Uh oh!

chaunceyjiang commented Feb 24, 2026

Uh oh!

thepushkarp commented Feb 24, 2026

Uh oh!

mergify bot commented Feb 25, 2026

Uh oh!

thepushkarp commented Feb 25, 2026

Uh oh!

Uh oh!

DarkLight1337 commented Feb 25, 2026

Uh oh!

sfeng33 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

thepushkarp commented Jan 11, 2026 •

edited by github-actions bot

Loading