Skip to content

[Bugfix] Actually enable serialize_messages for harmony Responses (related to #26185)#27377

Open
jacobthebanana wants to merge 3 commits intovllm-project:mainfrom
VectorInstitute:response-harmony-stopgap
Open

[Bugfix] Actually enable serialize_messages for harmony Responses (related to #26185)#27377
jacobthebanana wants to merge 3 commits intovllm-project:mainfrom
VectorInstitute:response-harmony-stopgap

Conversation

@jacobthebanana
Copy link
Copy Markdown
Contributor

@jacobthebanana jacobthebanana commented Oct 23, 2025

Purpose

For the OpenAI-compatible v1/responses route, enable raw messages to be sent when enable_response_messages is set to True in extra_body.

Previously, the responses are empty because of an issue in openai/harmony. (openai/harmony#78)

#26185 implements most of the fix, but these aren't actually invoked, at least not when serving the model through the vllm serve. The reason is that the said PR specifies when_used="json". Thus, this serialization method is ignored because of the use of model_dump() in vllm/entrypoints/openai/api_server.py#L527-L529.

The fix is to trigger the serializers by setting mode="json" when invoking model_dump.

Test Plan

Start vLLM server vllm serve openai/gpt-oss-20b

Send a Response request with enable_response_messages set to True in extra_body

resp = client.responses.create(
        model=model,
        input=prompt,
        extra_body={"enable_response_messages": True,}
)

print(resp.model_dump_json(indent=2))

Repeat the above for the streaming case.

Test Result

Original:

Details ``` "input_messages": [ { "author": { "role": "system", "name": null }, "content": [ {} ], "channel": null, "recipient": null, "content_type": null }, ... ], "output_messages": [ ... { "author": { "role": "assistant", "name": null }, "content": [ {} ], "channel": "final", "recipient": null, "content_type": null } ] } ```

After adding mode="json"

Details ``` "input_messages": [ { "role": "system", "name": null, "content": [ { "model_identity": "You are ChatGPT, a large language model trained by OpenAI.", "reasoning_effort": "Medium", "conversation_start_date": "2025-10-22", "knowledge_cutoff": "2024-06", "channel_config": { "valid_channels": [ "analysis", "final" ], "channel_required": true }, "type": "system_content" } ] }, { "role": "user", "name": null, "content": [ { "type": "text", "text": "Write a haiku about autumn leaves." } ] } ], "output_messages": [ { "role": "assistant", "name": null, "content": [ { "type": "text", "text": "User wants a haiku about autumn leaves. Simple. Use 5-7-5 syllable structure. Let's produce one. Ensure it's about autumn leaves. Provide in one paragraph." } ], "channel": "analysis" }, { "role": "assistant", "name": null, "content": [ { "type": "text", "text": "Leaves whisper, fall— \ncrimson and amber drift down, \nautumn sighs in wind." } ], "channel": "final" } ] } ```

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Oct 23, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly enables message serialization for Harmony Responses by calling model_dump(mode="json"). The change is a necessary fix for an upstream issue in openai/harmony, and the problem is well-described in the pull request. The code modification is simple, targeted, and correctly applied in both the create_responses and retrieve_responses functions. The inclusion of a TODO comment with a link to the upstream issue is good practice for maintainability. The change appears correct and complete, and I have no further suggestions.

jacobthebanana referenced this pull request Oct 23, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
@jacobthebanana jacobthebanana force-pushed the response-harmony-stopgap branch from 89af97b to 21d9a79 Compare October 23, 2025 14:10
…_response_messages is set.

Signed-off-by: Jacob-Junqi Tian <jacob@banana.abay.cf>
Signed-off-by: Jacob-Junqi Tian <jacob@banana.abay.cf>
@jacobthebanana jacobthebanana force-pushed the response-harmony-stopgap branch from 21d9a79 to 26d9bdc Compare October 23, 2025 14:11
@jacobthebanana
Copy link
Copy Markdown
Contributor Author

(force-pushing to add sign-off)

@jacobthebanana jacobthebanana changed the title Actually enables serialize_messages for harmony Responses (related to #26185) [Bugfix] Actually enable serialize_messages for harmony Responses (related to #26185) Oct 26, 2025
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 14, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jacobthebanana.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend gpt-oss Related to GPT-OSS models needs-rebase

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

1 participant