-
-
Notifications
You must be signed in to change notification settings - Fork 12k
[Frontend] Add OpenAI Harmony integration for responses API #24472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Frontend] Add OpenAI Harmony integration for responses API #24472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully integrates OpenAI Harmony messages into the Responses API. It adds the ability to pass conversation history via previous_response_harmony_messages and to receive the full Harmony message history in the response, controlled by an environment variable. The changes are well-tested. My main concern is a potential race condition when continuing conversations using previous_response_id, which could lead to data corruption in a concurrent environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic inside this if block for continuing a conversation modifies prev_msgs in-place after retrieving it from self.msg_store. This can lead to a race condition if multiple requests try to continue from the same previous_response_id concurrently, as one request could be modifying the message history while another is reading it. This can result in corrupted data and unpredictable behavior.
To prevent this, you should operate on a copy of the message list. For example, by changing line 728 to:
prev_msgs = self.msg_store[prev_response.id].copy()- Allow harmony messages to be used as input as a replacement for the responses store - Add a env flag to output harmony messages as well - Some tests Signed-off-by: Alec Solder <[email protected]>
Signed-off-by: Alec Solder <[email protected]>
8431315 to
2000c07
Compare
|
@alecsolder could you give an example input / output change for this PR? also are we including the author / channel output? |
|
This pull request has merge conflicts that must be resolved before it can be |
yeqcharlotte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments, i don't think it's a good idea to mix harmony responses and openai compatible responses in protocol
| messages.extend(prev_msgs) | ||
| elif request.previous_response_harmony_messages is not None: | ||
| messages.extend(request.previous_response_harmony_messages) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to change if order here?
|
i think it's the right thing to give some power user access to harmony message format directly. which is helpful for debugging and for users to orchestrate tool call externally if they choose to. however, please improve the overall code structure as commented and better and a small RFC as well as examples to show users how to use them. cc: @heheda12345 |
|
How does this approach align with things like https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot#chat-completions-api where OpenAI outlines their expectations for how things like the raw chain of thought content is handled when using Chat Completions APIs in front of Harmony models? More generically, do we actually need the raw harmony text passed back and forth to enable a client to provide the full history of previous responses? Or do we just need the parsed harmony content, including the raw chain of thought (as linked to above) to enable that? I propose that the existing OpenAI-compatible APIs plus the raw chain of thought, which OpenAI details how to provide as an extension to the OpenAI APIs above, provides the entirety of what's needed to allow a client to pass the full history of the conversation back in, including the raw chain of thought pieces needed to maximize tool use accuracy. Are there places I'm missing where we need more than this? |
IMO what they define in that blog doesn't cover all of the edge cases. For example, they don't have a channel field, and reasoning blocks from the commentary channel which are used for tool performance for function tools do need to be kept even after seeing a message to the final channel. You can see this in their code here. If I'm reading it right, no reasoning message to the commentary channel is ever removed for the next generation. I had tried to implement this with just using the previous harmony messages, but that actually is also not enough because you need to match call_ids from the previous responses API request to the new responses API request because there is no call_id field on the output of a function call, nor is there a tool name, so you actually can't produce the right harmony message without the last responses API request output as well. We don't need the raw text version of the tokens though, Harmony has enough metadata in the Messages object to be able to generate the text tokens again (not in the right order, which is a different thing that they mention very briefly here). Also, tools are not serializable in harmony messages, so they are actually dropped when turned into JSON, but that is OK because responses API requests should get the set of tools + instructions fresh for every API request. IIRC there is some logic in the completions path that needs to be fixed related to adding tools back in if on the request and there is already a system message for example. So there is a whole lot going on here :) |
|
@alecsolder I wrote a fair bit of the Responses API implementation in Llama Stack, including how we save and restore the responses to re-hydrate the requests when we see a With that said, you're saying that the Response object described at https://platform.openai.com/docs/api-reference/responses/object is lossy and doesn't contain all the necessary information needed to supply as context if manually managing state client-side? Or, does that API surface cover all the necessary items and it's just that vLLM isn't returning all the details via this API needed to do this? |
|
Let's close this PR. I've implemented the remaining parts of the PR here #26962 |
Purpose
In order for the Responses API to have full functionality when not using the Responses Store and Messages Store, we must allow users to store the messages client side.
In order to support this, this PR adds Harmony Messages as additional output to the Responses API, and accepts Harmony messages as input to the Responses API.
Harmony messages used as input replicates the functionality of continuing a conversation the same way the Messages Store works.
Test Plan
Tested with Unit tests + serving locally
Test Result
Passed