[Bugfix] Fix Harmony preamble visibility in Responses API#32114
[Bugfix] Fix Harmony preamble visibility in Responses API#32114vllm-bot merged 8 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a bug in the handling of Harmony format preambles in the Responses API. Previously, preambles (commentary messages with no recipient) were incorrectly treated as hidden reasoning content. The changes ensure they are now parsed as visible ResponseOutputMessage items, aligning with the intended behavior of showing them to end-users.
The fix is implemented consistently for both complete messages in parse_output_message and for streaming scenarios in parse_remaining_state. The logic is sound, and the accompanying test updates in test_harmony_utils.py are thorough, including new assertions for the corrected output type and a new test case for streaming preambles. The changes appear correct and well-tested.
|
/cc @qandrew PTAL. |
|
@thepushkarp can you add in the PR description what an example request / response would look like, before / after this change? Would make it easier to review and understand |
|
Updated the description and added a few more fixes @qandrew! sorry for the delay (๑•́ㅿ•̀๑)ᔆᵒʳʳᵞ |
qandrew
left a comment
There was a problem hiding this comment.
lgtm, thanks for the updates! It would be nice to merge chatCompletion / responsesAPI code paths to simplify our logic too
cc @DarkLight1337 @chaunceyjiang to merge
|
Thanks~ @thepushkarp I’ll take a look at this today. |
|
@thepushkarp The CI failed — could you take a look? |
|
Hey @chaunceyjiang, the previous error was fixed. There are still some build failures from tests unrelated to the changes in this PR. Can you please check and let me know what to do for them? |
|
This pull request has merge conflicts that must be resolved before it can be |
Per the OpenAI Harmony specification, preambles (commentary channel messages with recipient=None) are intended to be shown to end-users, not hidden as reasoning content. This fix changes how commentary-channel preambles are parsed: - From: ResponseReasoningItem (hidden) - To: ResponseOutputMessage (visible) Affects both batch (parse_output_message) and streaming (parse_remaining_state) code paths. Migration note: Preambles now appear in `content` instead of `reasoning_content`. This is a breaking change for clients that relied on the previous (incorrect) behavior. Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Update parse_chat_output() to explicitly handle preambles: - analysis channel → reasoning (hidden) - commentary without recipient (preambles) → content (visible) - final channel → content (visible) - commentary with recipient (tool calls) → excluded (handled by tool parser) This makes the Chat Completions API consistent with the Responses API fix for preamble visibility per the Harmony specification. Signed-off-by: Pushkar Patel <git@thepushkarp.com>
…, and channel config Extend preamble fix to remaining code paths not covered by prior commits: - emit_content_delta_events(): emit output_text.delta for preambles - emit_previous_item_done_events(): emit text done events for preambles - _update_num_reasoning_tokens(): exclude preambles from reasoning count - get_system_message(): stop stripping commentary from valid channels when with_custom_tools=False (spec always declares all 3 channels) Add tests covering all new preamble paths: streaming events (5 tests), token counting (2 tests), parse_chat_output edge cases (3 tests), get_system_message channel config (3 tests), and integration test update. Signed-off-by: Pushkar Patel <git@thepushkarp.com>
…put_message() Signed-off-by: pupa <pupa@users.noreply.github.com> Signed-off-by: Pushkar Patel <git@thepushkarp.com>
The removal of the commentary-channel-stripping hack from get_system_message() means python tool calls now correctly use the "commentary" channel per Harmony spec. Signed-off-by: Pushkar Patel <git@thepushkarp.com>
ce23610 to
e931976
Compare
- Remove the `if not with_custom_tools` block in `get_system_message()` that stripped the commentary channel from valid channels. This was preventing the model from ever generating preambles, making all downstream preamble visibility fixes inert. - Add `and group.recipient is None` guard to `extract_harmony_streaming_delta()` so only true preambles (commentary with no recipient) are treated as user-visible content. Commentary targeted at browser.*/container recipients was incorrectly leaking into combined_content. - Fix stale import: rename HarmonyStreamingState → StreamingState in test_serving_responses.py to match the actual class name. Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Move browser.search from preamble test to invalid-inputs test, matching the tightened guard that only treats commentary with no recipient as user-visible content. Also rename test to clarify intent. Signed-off-by: Pushkar Patel <git@thepushkarp.com>
|
The remaining tests are passing, except for two in the |
|
Force-merging |
I can work on merging the harmony parsing overlaps on chatCompletion & responsesAPI, there are many duplicate routing logic. |
…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>
…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>
…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>
…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>
…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…ct#32114) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>
Summary
Per the Harmony spec, preambles (commentary channel messages with no recipient) are "intended to be shown to end-users". vLLM was incorrectly treating them as hidden reasoning across multiple code paths.
This PR fixes preamble visibility in 6 code paths across 3 files:
Parser (
harmony_utils.py):parse_output_message(): route preambles toResponseOutputMessageinstead ofResponseReasoningItemparse_remaining_state(): return visibleResponseOutputMessage(streaming) instead of hidden reasoningparse_chat_output(): include preambles infinal_texts(Chat Completions API), exclude tool call JSON that was leaking into visible contentStreaming events (
streaming_events.py):emit_content_delta_events(): emitresponse.output_text.deltafor preamble tokens (were silently dropped)emit_previous_item_done_events(): emit text done events for completed preamblesToken counting (
context.py):_update_num_reasoning_tokens(): exclude preamble tokens fromreasoning_tokenscountChannel config (
harmony_utils.py):get_system_message(): stop strippingcommentaryfrom valid channels whenwith_custom_tools=False. The spec always declares all 3 channels; withoutcommentarythe model cannot generate preambles in built-in-tool-only sessions (web_search, code_interpreter).Before/After
The model generates a preamble to preview an upcoming tool call:
Responses API — before:
{"type": "reasoning", "content": [{"type": "reasoning_text", "text": "I'll search for the weather in SF."}]}Preamble hidden as reasoning — user never sees it.
Responses API — after:
{"type": "message", "content": [{"type": "output_text", "text": "I'll search for the weather in SF."}]}Preamble visible as a message.
Streaming — before: Preamble tokens emitted no SSE events (silently dropped).
Streaming — after: Preamble tokens emit
response.output_text.deltaevents, same as final channel text.Chat Completions — before:
contentincluded tool call JSON ({"query": "weather in SF"}) because the filter waschannel != "analysis".Chat Completions — after:
contentonly includes preambles and final text; tool call payloads excluded.Token counting — before: Preamble tokens counted as
reasoning_tokens.Token counting — after: Preamble tokens counted as regular output tokens.
Channel config — before:
commentarystripped from valid channels when no custom tools — model can't generate preambles at all.Channel config — after: All 3 channels always present per spec.
Note
Makes Harmony preambles user-visible and adjusts parser behavior accordingly.
commentarywith norecipientasResponseOutputMessage(visiblecontent) rather thanResponseReasoningItem(hiddenreasoning_content)parse_remaining_stateto emitmessage(statusincomplete) for commentary preambles; keepanalysisas reasoningpython,browser,container) return reasoning; non-builtin recipients become MCP calls; functions remainfunction_callResponseOutputMessagefor preambles, single message with multiple contents, and streaming status; added coverage for parser edge casesWritten by Cursor Bugbot for commit 50cc9b1. This will update automatically on new commits. Configure here.
Note
Makes Harmony preambles visible to users and aligns commentary parsing with the Harmony spec.
recipientnow emits aResponseOutputMessage(visiblecontent) rather thanResponseReasoningItem; analysis remains reasoningmessagewithstatus="incomplete"; built-in tools (python,browser,container) explicitly return reasoning; MCP parsing unchangedWritten by Cursor Bugbot for commit 8711d3d. This will update automatically on new commits. Configure here.