Skip to content

Conversation

@alecsolder
Copy link
Contributor

@alecsolder alecsolder commented Sep 9, 2025

Purpose

In order for the Responses API to have full functionality when not using the Responses Store and Messages Store, we must allow users to store the messages client side.

In order to support this, this PR adds Harmony Messages as additional output to the Responses API, and accepts Harmony messages as input to the Responses API.

Harmony messages used as input replicates the functionality of continuing a conversation the same way the Messages Store works.

Test Plan

Tested with Unit tests + serving locally

Test Result

Passed

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully integrates OpenAI Harmony messages into the Responses API. It adds the ability to pass conversation history via previous_response_harmony_messages and to receive the full Harmony message history in the response, controlled by an environment variable. The changes are well-tested. My main concern is a potential race condition when continuing conversations using previous_response_id, which could lead to data corruption in a concurrent environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic inside this if block for continuing a conversation modifies prev_msgs in-place after retrieving it from self.msg_store. This can lead to a race condition if multiple requests try to continue from the same previous_response_id concurrently, as one request could be modifying the message history while another is reading it. This can result in corrupted data and unpredictable behavior.

To prevent this, you should operate on a copy of the message list. For example, by changing line 728 to:

prev_msgs = self.msg_store[prev_response.id].copy()

Alec Solder added 2 commits September 8, 2025 18:52
- Allow harmony messages to be used as input as a replacement for the responses store
- Add a env flag to output harmony messages as well
- Some tests

Signed-off-by: Alec Solder <[email protected]>
Signed-off-by: Alec Solder <[email protected]>
@alecsolder alecsolder force-pushed the responses-api-harmony-messages branch from 8431315 to 2000c07 Compare September 9, 2025 01:52
@lacora
Copy link
Contributor

lacora commented Sep 9, 2025

@alecsolder could you give an example input / output change for this PR? also are we including the author / channel output?

@mergify
Copy link

mergify bot commented Sep 9, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alecsolder.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 9, 2025
Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments, i don't think it's a good idea to mix harmony responses and openai compatible responses in protocol

@alecsolder alecsolder changed the title Add OpenAI Harmony integration for responses API [Frontend] Add OpenAI Harmony integration for responses API Sep 9, 2025
messages.extend(prev_msgs)
elif request.previous_response_harmony_messages is not None:
messages.extend(request.previous_response_harmony_messages)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to change if order here?

@yeqcharlotte
Copy link
Collaborator

i think it's the right thing to give some power user access to harmony message format directly. which is helpful for debugging and for users to orchestrate tool call externally if they choose to.

however, please improve the overall code structure as commented and better and a small RFC as well as examples to show users how to use them.

cc: @heheda12345

@bbrowning
Copy link
Contributor

How does this approach align with things like https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot#chat-completions-api where OpenAI outlines their expectations for how things like the raw chain of thought content is handled when using Chat Completions APIs in front of Harmony models?

More generically, do we actually need the raw harmony text passed back and forth to enable a client to provide the full history of previous responses? Or do we just need the parsed harmony content, including the raw chain of thought (as linked to above) to enable that?

I propose that the existing OpenAI-compatible APIs plus the raw chain of thought, which OpenAI details how to provide as an extension to the OpenAI APIs above, provides the entirety of what's needed to allow a client to pass the full history of the conversation back in, including the raw chain of thought pieces needed to maximize tool use accuracy. Are there places I'm missing where we need more than this?

@alecsolder
Copy link
Contributor Author

How does this approach align with things like https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot#chat-completions-api where OpenAI outlines their expectations for how things like the raw chain of thought content is handled when using Chat Completions APIs in front of Harmony models?

More generically, do we actually need the raw harmony text passed back and forth to enable a client to provide the full history of previous responses? Or do we just need the parsed harmony content, including the raw chain of thought (as linked to above) to enable that?

I propose that the existing OpenAI-compatible APIs plus the raw chain of thought, which OpenAI details how to provide as an extension to the OpenAI APIs above, provides the entirety of what's needed to allow a client to pass the full history of the conversation back in, including the raw chain of thought pieces needed to maximize tool use accuracy. Are there places I'm missing where we need more than this?

IMO what they define in that blog doesn't cover all of the edge cases. For example, they don't have a channel field, and reasoning blocks from the commentary channel which are used for tool performance for function tools do need to be kept even after seeing a message to the final channel. You can see this in their code here. If I'm reading it right, no reasoning message to the commentary channel is ever removed for the next generation.

I had tried to implement this with just using the previous harmony messages, but that actually is also not enough because you need to match call_ids from the previous responses API request to the new responses API request because there is no call_id field on the output of a function call, nor is there a tool name, so you actually can't produce the right harmony message without the last responses API request output as well.

We don't need the raw text version of the tokens though, Harmony has enough metadata in the Messages object to be able to generate the text tokens again (not in the right order, which is a different thing that they mention very briefly here).

Also, tools are not serializable in harmony messages, so they are actually dropped when turned into JSON, but that is OK because responses API requests should get the set of tools + instructions fresh for every API request. IIRC there is some logic in the completions path that needs to be fixed related to adding tools back in if on the request and there is already a system message for example.

So there is a whole lot going on here :)

@bbrowning
Copy link
Contributor

bbrowning commented Sep 16, 2025

@alecsolder I wrote a fair bit of the Responses API implementation in Llama Stack, including how we save and restore the responses to re-hydrate the requests when we see a previous_response_id. All of that was written with the expectation of translating Responses API requests to Chat Completions in the backend actual inference server (because at the time neither vLLM nor any other inference server natively had Responses API). And, it pre-dates the gpt-oss models and the Harmony format. So, I willfully admit my information may be a bit dated as to how this applies specifically to internals of the Harmony format.

With that said, you're saying that the Response object described at https://platform.openai.com/docs/api-reference/responses/object is lossy and doesn't contain all the necessary information needed to supply as context if manually managing state client-side? Or, does that API surface cover all the necessary items and it's just that vLLM isn't returning all the details via this API needed to do this?

@qandrew
Copy link
Contributor

qandrew commented Oct 17, 2025

Let's close this PR. I've implemented the remaining parts of the PR here #26962

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models needs-rebase

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

5 participants