Skip to content

[Bugfix] Fix Harmony preamble visibility in Responses API#32114

Merged
vllm-bot merged 8 commits intovllm-project:mainfrom
thepushkarp:fix/harmony-preamble-visibility
Feb 25, 2026
Merged

[Bugfix] Fix Harmony preamble visibility in Responses API#32114
vllm-bot merged 8 commits intovllm-project:mainfrom
thepushkarp:fix/harmony-preamble-visibility

Conversation

@thepushkarp
Copy link
Copy Markdown
Contributor

@thepushkarp thepushkarp commented Jan 11, 2026

Summary

Per the Harmony spec, preambles (commentary channel messages with no recipient) are "intended to be shown to end-users". vLLM was incorrectly treating them as hidden reasoning across multiple code paths.

This PR fixes preamble visibility in 6 code paths across 3 files:

Parser (harmony_utils.py):

  • parse_output_message(): route preambles to ResponseOutputMessage instead of ResponseReasoningItem
  • parse_remaining_state(): return visible ResponseOutputMessage (streaming) instead of hidden reasoning
  • parse_chat_output(): include preambles in final_texts (Chat Completions API), exclude tool call JSON that was leaking into visible content

Streaming events (streaming_events.py):

  • emit_content_delta_events(): emit response.output_text.delta for preamble tokens (were silently dropped)
  • emit_previous_item_done_events(): emit text done events for completed preambles

Token counting (context.py):

  • _update_num_reasoning_tokens(): exclude preamble tokens from reasoning_tokens count

Channel config (harmony_utils.py):

  • get_system_message(): stop stripping commentary from valid channels when with_custom_tools=False. The spec always declares all 3 channels; without commentary the model cannot generate preambles in built-in-tool-only sessions (web_search, code_interpreter).

Before/After

The model generates a preamble to preview an upcoming tool call:

<|channel|>commentary
<|message|>I'll search for the weather in SF.<|end|>
<|start|>assistant to=functions.web_search
<|channel|>commentary
<|message|>{"query": "weather in SF"}<|end|>

Responses API — before:

{"type": "reasoning", "content": [{"type": "reasoning_text", "text": "I'll search for the weather in SF."}]}

Preamble hidden as reasoning — user never sees it.

Responses API — after:

{"type": "message", "content": [{"type": "output_text", "text": "I'll search for the weather in SF."}]}

Preamble visible as a message.

Streaming — before: Preamble tokens emitted no SSE events (silently dropped).
Streaming — after: Preamble tokens emit response.output_text.delta events, same as final channel text.

Chat Completions — before: content included tool call JSON ({"query": "weather in SF"}) because the filter was channel != "analysis".
Chat Completions — after: content only includes preambles and final text; tool call payloads excluded.

Token counting — before: Preamble tokens counted as reasoning_tokens.
Token counting — after: Preamble tokens counted as regular output tokens.

Channel config — before: commentary stripped from valid channels when no custom tools — model can't generate preambles at all.
Channel config — after: All 3 channels always present per spec.


Note

Makes Harmony preambles user-visible and adjusts parser behavior accordingly.

  • Parse commentary with no recipient as ResponseOutputMessage (visible content) rather than ResponseReasoningItem (hidden reasoning_content)
  • Update parse_remaining_state to emit message (status incomplete) for commentary preambles; keep analysis as reasoning
  • Clarify built-in tools (python, browser, container) return reasoning; non-builtin recipients become MCP calls; functions remain function_call
  • Tests updated to expect ResponseOutputMessage for preambles, single message with multiple contents, and streaming status; added coverage for parser edge cases

Written by Cursor Bugbot for commit 50cc9b1. This will update automatically on new commits. Configure here.


Note

Makes Harmony preambles visible to users and aligns commentary parsing with the Harmony spec.

  • Commentary with no recipient now emits a ResponseOutputMessage (visible content) rather than ResponseReasoningItem; analysis remains reasoning
  • Streaming parser updated: commentary preambles return a message with status="incomplete"; built-in tools (python, browser, container) explicitly return reasoning; MCP parsing unchanged
  • Tests updated/added to cover preamble visibility, multi-content aggregation into a single message, and streaming states

Written by Cursor Bugbot for commit 8711d3d. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug in the handling of Harmony format preambles in the Responses API. Previously, preambles (commentary messages with no recipient) were incorrectly treated as hidden reasoning content. The changes ensure they are now parsed as visible ResponseOutputMessage items, aligning with the intended behavior of showing them to end-users.

The fix is implemented consistently for both complete messages in parse_output_message and for streaming scenarios in parse_remaining_state. The logic is sound, and the accompanying test updates in test_harmony_utils.py are thorough, including new assertions for the corrected output type and a new test case for streaming preambles. The changes appear correct and well-tested.

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

/cc @qandrew PTAL.

@qandrew
Copy link
Copy Markdown
Contributor

qandrew commented Jan 12, 2026

@thepushkarp can you add in the PR description what an example request / response would look like, before / after this change? Would make it easier to review and understand

@mergify mergify bot added the bug Something isn't working label Jan 13, 2026
@thepushkarp
Copy link
Copy Markdown
Contributor Author

Updated the description and added a few more fixes @qandrew!

sorry for the delay (๑•́ㅿ•̀๑)ᔆᵒʳʳᵞ

Copy link
Copy Markdown
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the updates! It would be nice to merge chatCompletion / responsesAPI code paths to simplify our logic too

cc @DarkLight1337 @chaunceyjiang to merge

@chaunceyjiang chaunceyjiang self-assigned this Feb 24, 2026
@chaunceyjiang
Copy link
Copy Markdown
Collaborator

Thanks~ @thepushkarp I’ll take a look at this today.

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2026
@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@thepushkarp The CI failed — could you take a look?

@thepushkarp
Copy link
Copy Markdown
Contributor Author

Hey @chaunceyjiang, the previous error was fixed. There are still some build failures from tests unrelated to the changes in this PR. Can you please check and let me know what to do for them?

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 25, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @thepushkarp.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 25, 2026
Per the OpenAI Harmony specification, preambles (commentary channel
messages with recipient=None) are intended to be shown to end-users,
not hidden as reasoning content.

This fix changes how commentary-channel preambles are parsed:
- From: ResponseReasoningItem (hidden)
- To: ResponseOutputMessage (visible)

Affects both batch (parse_output_message) and streaming
(parse_remaining_state) code paths.

Migration note: Preambles now appear in `content` instead of
`reasoning_content`. This is a breaking change for clients that
relied on the previous (incorrect) behavior.

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Update parse_chat_output() to explicitly handle preambles:
- analysis channel → reasoning (hidden)
- commentary without recipient (preambles) → content (visible)
- final channel → content (visible)
- commentary with recipient (tool calls) → excluded (handled by tool parser)

This makes the Chat Completions API consistent with the Responses API
fix for preamble visibility per the Harmony specification.

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
…, and channel config

Extend preamble fix to remaining code paths not covered by prior commits:
- emit_content_delta_events(): emit output_text.delta for preambles
- emit_previous_item_done_events(): emit text done events for preambles
- _update_num_reasoning_tokens(): exclude preambles from reasoning count
- get_system_message(): stop stripping commentary from valid channels
  when with_custom_tools=False (spec always declares all 3 channels)

Add tests covering all new preamble paths: streaming events (5 tests),
token counting (2 tests), parse_chat_output edge cases (3 tests),
get_system_message channel config (3 tests), and integration test update.

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
…put_message()

Signed-off-by: pupa <pupa@users.noreply.github.com>
Signed-off-by: Pushkar Patel <git@thepushkarp.com>
The removal of the commentary-channel-stripping hack from
get_system_message() means python tool calls now correctly
use the "commentary" channel per Harmony spec.

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
@thepushkarp thepushkarp force-pushed the fix/harmony-preamble-visibility branch from ce23610 to e931976 Compare February 25, 2026 04:26
@mergify mergify bot removed the needs-rebase label Feb 25, 2026
- Remove the `if not with_custom_tools` block in `get_system_message()`
  that stripped the commentary channel from valid channels. This was
  preventing the model from ever generating preambles, making all
  downstream preamble visibility fixes inert.

- Add `and group.recipient is None` guard to
  `extract_harmony_streaming_delta()` so only true preambles (commentary
  with no recipient) are treated as user-visible content. Commentary
  targeted at browser.*/container recipients was incorrectly leaking
  into combined_content.

- Fix stale import: rename HarmonyStreamingState → StreamingState in
  test_serving_responses.py to match the actual class name.

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Move browser.search from preamble test to invalid-inputs test, matching
the tightened guard that only treats commentary with no recipient as
user-visible content. Also rename test to clarify intent.

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
@thepushkarp
Copy link
Copy Markdown
Contributor Author

Hi @chaunceyjiang

The remaining tests are passing, except for two in the v1/e2e/test_spec_decode.py module. Please advise on what to do here.

@vllm-bot vllm-bot merged commit 5d18bf8 into vllm-project:main Feb 25, 2026
48 of 51 checks passed
@DarkLight1337
Copy link
Copy Markdown
Member

Force-merging

@github-project-automation github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Feb 25, 2026
@sfeng33
Copy link
Copy Markdown
Contributor

sfeng33 commented Feb 25, 2026

lgtm, thanks for the updates! It would be nice to merge chatCompletion / responsesAPI code paths to simplify our logic too

cc @DarkLight1337 @chaunceyjiang to merge

I can work on merging the harmony parsing overlaps on chatCompletion & responsesAPI, there are many duplicate routing logic.

haanjack pushed a commit to haanjack/vllm that referenced this pull request Feb 26, 2026
…ct#32114)

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
tom-zju pushed a commit to tom-zju/vllm that referenced this pull request Feb 26, 2026
…ct#32114)

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
…ct#32114)

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
…ct#32114)

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…ct#32114)

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
…ct#32114)

Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants