[Frontend] Add OpenAI Harmony integration for responses API #24472

alecsolder · 2025-09-09T00:35:21Z

Purpose

In order for the Responses API to have full functionality when not using the Responses Store and Messages Store, we must allow users to store the messages client side.

In order to support this, this PR adds Harmony Messages as additional output to the Responses API, and accepts Harmony messages as input to the Responses API.

Harmony messages used as input replicates the functionality of continuing a conversation the same way the Messages Store works.

Test Plan

Tested with Unit tests + serving locally

Test Result

Passed

gemini-code-assist

Code Review

This pull request successfully integrates OpenAI Harmony messages into the Responses API. It adds the ability to pass conversation history via previous_response_harmony_messages and to receive the full Harmony message history in the response, controlled by an environment variable. The changes are well-tested. My main concern is a potential race condition when continuing conversations using previous_response_id, which could lead to data corruption in a concurrent environment.

gemini-code-assist · 2025-09-09T00:36:40Z

vllm/entrypoints/openai/serving_responses.py

The logic inside this if block for continuing a conversation modifies prev_msgs in-place after retrieving it from self.msg_store. This can lead to a race condition if multiple requests try to continue from the same previous_response_id concurrently, as one request could be modifying the message history while another is reading it. This can result in corrupted data and unpredictable behavior.

To prevent this, you should operate on a copy of the message list. For example, by changing line 728 to:

prev_msgs = self.msg_store[prev_response.id].copy()

- Allow harmony messages to be used as input as a replacement for the responses store - Add a env flag to output harmony messages as well - Some tests Signed-off-by: Alec Solder <[email protected]>

Signed-off-by: Alec Solder <[email protected]>

lacora · 2025-09-09T02:00:58Z

@alecsolder could you give an example input / output change for this PR? also are we including the author / channel output?

mergify · 2025-09-09T02:07:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alecsolder.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/entrypoints/openai/protocol.py

vllm/envs.py

yeqcharlotte

see comments, i don't think it's a good idea to mix harmony responses and openai compatible responses in protocol

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_responses.py

yeqcharlotte · 2025-09-10T04:50:36Z

vllm/entrypoints/openai/serving_responses.py

            messages.extend(prev_msgs)
+        elif request.previous_response_harmony_messages is not None:
+            messages.extend(request.previous_response_harmony_messages)
+        else:


you don't need to change if order here?

vllm/entrypoints/openai/serving_responses.py

tests/entrypoints/openai/test_response_api_harmony_input_output.py

yeqcharlotte · 2025-09-10T05:05:12Z

i think it's the right thing to give some power user access to harmony message format directly. which is helpful for debugging and for users to orchestrate tool call externally if they choose to.

however, please improve the overall code structure as commented and better and a small RFC as well as examples to show users how to use them.

cc: @heheda12345

bbrowning · 2025-09-15T18:18:46Z

How does this approach align with things like https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot#chat-completions-api where OpenAI outlines their expectations for how things like the raw chain of thought content is handled when using Chat Completions APIs in front of Harmony models?

More generically, do we actually need the raw harmony text passed back and forth to enable a client to provide the full history of previous responses? Or do we just need the parsed harmony content, including the raw chain of thought (as linked to above) to enable that?

I propose that the existing OpenAI-compatible APIs plus the raw chain of thought, which OpenAI details how to provide as an extension to the OpenAI APIs above, provides the entirety of what's needed to allow a client to pass the full history of the conversation back in, including the raw chain of thought pieces needed to maximize tool use accuracy. Are there places I'm missing where we need more than this?

alecsolder · 2025-09-15T20:10:38Z

How does this approach align with things like https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot#chat-completions-api where OpenAI outlines their expectations for how things like the raw chain of thought content is handled when using Chat Completions APIs in front of Harmony models?

More generically, do we actually need the raw harmony text passed back and forth to enable a client to provide the full history of previous responses? Or do we just need the parsed harmony content, including the raw chain of thought (as linked to above) to enable that?

I propose that the existing OpenAI-compatible APIs plus the raw chain of thought, which OpenAI details how to provide as an extension to the OpenAI APIs above, provides the entirety of what's needed to allow a client to pass the full history of the conversation back in, including the raw chain of thought pieces needed to maximize tool use accuracy. Are there places I'm missing where we need more than this?

IMO what they define in that blog doesn't cover all of the edge cases. For example, they don't have a channel field, and reasoning blocks from the commentary channel which are used for tool performance for function tools do need to be kept even after seeing a message to the final channel. You can see this in their code here. If I'm reading it right, no reasoning message to the commentary channel is ever removed for the next generation.

I had tried to implement this with just using the previous harmony messages, but that actually is also not enough because you need to match call_ids from the previous responses API request to the new responses API request because there is no call_id field on the output of a function call, nor is there a tool name, so you actually can't produce the right harmony message without the last responses API request output as well.

We don't need the raw text version of the tokens though, Harmony has enough metadata in the Messages object to be able to generate the text tokens again (not in the right order, which is a different thing that they mention very briefly here).

Also, tools are not serializable in harmony messages, so they are actually dropped when turned into JSON, but that is OK because responses API requests should get the set of tools + instructions fresh for every API request. IIRC there is some logic in the completions path that needs to be fixed related to adding tools back in if on the request and there is already a system message for example.

So there is a whole lot going on here :)

bbrowning · 2025-09-16T13:43:08Z

@alecsolder I wrote a fair bit of the Responses API implementation in Llama Stack, including how we save and restore the responses to re-hydrate the requests when we see a previous_response_id. All of that was written with the expectation of translating Responses API requests to Chat Completions in the backend actual inference server (because at the time neither vLLM nor any other inference server natively had Responses API). And, it pre-dates the gpt-oss models and the Harmony format. So, I willfully admit my information may be a bit dated as to how this applies specifically to internals of the Harmony format.

With that said, you're saying that the Response object described at https://platform.openai.com/docs/api-reference/responses/object is lossy and doesn't contain all the necessary information needed to supply as context if manually managing state client-side? Or, does that API surface cover all the necessary items and it's just that vLLM isn't returning all the details via this API needed to do this?

qandrew · 2025-10-17T17:04:07Z

Let's close this PR. I've implemented the remaining parts of the PR here #26962

alecsolder requested review from DarkLight1337, aarnphm, robertgshaw2-redhat and simon-mo as code owners September 9, 2025 00:35

mergify bot added the frontend label Sep 9, 2025

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

Alec Solder added 2 commits September 8, 2025 18:52

Add OpenAI Harmony integration for responses API

3885a11

- Allow harmony messages to be used as input as a replacement for the responses store - Add a env flag to output harmony messages as well - Some tests Signed-off-by: Alec Solder <[email protected]>

Supporting tools in and out

2000c07

Signed-off-by: Alec Solder <[email protected]>

alecsolder force-pushed the responses-api-harmony-messages branch from 8431315 to 2000c07 Compare September 9, 2025 01:52

mergify bot added the needs-rebase label Sep 9, 2025

yeqcharlotte reviewed Sep 9, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Show resolved Hide resolved

yeqcharlotte reviewed Sep 9, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

yeqcharlotte reviewed Sep 9, 2025

View reviewed changes

vllm/envs.py Outdated Show resolved Hide resolved

yeqcharlotte requested changes Sep 9, 2025

View reviewed changes

alecsolder changed the title ~~Add OpenAI Harmony integration for responses API~~ [Frontend] Add OpenAI Harmony integration for responses API Sep 9, 2025

yeqcharlotte requested changes Sep 10, 2025

View reviewed changes

Alec Solder added 2 commits September 10, 2025 12:10

Refactor + functionality fixes

19c0e59

wip testing

0e7b261

mergify bot added the gpt-oss Related to GPT-OSS models label Sep 12, 2025

yeqcharlotte added this to gpt-oss Issues & Enhancements Sep 14, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 14, 2025

yeqcharlotte moved this from To Triage to Backlog in gpt-oss Issues & Enhancements Sep 14, 2025

alecsolder mentioned this pull request Sep 15, 2025

[RFC]: Responses API full functionality without stores #24603

Open

1 task

Adding more test cases, not all passing yet

2e1ebce

alecsolder requested a review from NickLucche as a code owner September 15, 2025 22:13

alecsolder requested a review from chaunceyjiang as a code owner September 15, 2025 22:13

Adding more test cases, not all passing yet

83123c8

alecsolder mentioned this pull request Sep 16, 2025

[Frontend] Responses API messages out, just harmony for now #24985

Merged

Uh oh!

[Frontend] Add OpenAI Harmony integration for responses API #24472

Are you sure you want to change the base?

[Frontend] Add OpenAI Harmony integration for responses API #24472

Uh oh!

Conversation

alecsolder commented Sep 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

lacora commented Sep 9, 2025

Uh oh!

mergify bot commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeqcharlotte Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yeqcharlotte commented Sep 10, 2025

Uh oh!

bbrowning commented Sep 15, 2025

Uh oh!

alecsolder commented Sep 15, 2025

Uh oh!

bbrowning commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qandrew commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alecsolder commented Sep 9, 2025 •

edited by github-actions bot

Loading

bbrowning commented Sep 16, 2025 •

edited

Loading