Document diverged OpenAI API #3147

freddyheppell · 2024-03-13T12:04:43Z

The existence of the diverged OpenAI API appears to be undocumented at the moment, but has some useful and requested features (e.g. repetition_penalty #1914).

As far as I can tell, it appears these endpoints can be used with the OpenAI client, and additional parameters passed with the extra_body argument, e.g.:

# ...
openai.base_url = "http://localhost:8000/api/v1/"
# ...
completion = openai.chat.completions.create(
  model=model,
  messages=[{"role": "user", "content": "Hello! What is your name?"}],
  extra_body={
    "repetition_penalty": 1.10
  }
)

(this is also useful for top_k which is not a supported argument for the OpenAI client but is supported in both APIs in FastChat)

It would be good to at least document that this API exists and confirm whether the divergences actually impact using the OpenAI client.

The text was updated successfully, but these errors were encountered:

digisomni · 2024-03-16T19:03:53Z

Can you link to the lines where it is diverging? I know of one part of the diverging code as I had added that close to a year ago. However, I just want to be sure we are thinking about the same thing.

freddyheppell · 2024-03-17T12:34:24Z

I said "diverging" as that was what the original PR (#1536) that introduced this second protocol and API said.

I've noticed these differences between the OpenAI client library and the two chat completion endpoints in FastChat's OpenAI server:

repetition_penalty - diverged protocol, diverged endpoint, not in the protocol or endpoint for the standard chat completion API, unsupported by OpenAI library except through extra_body
top_k - diverged protocol, diverged endpoint, standard protocol, standard endpoint, unsupported by OpenAI library except through extra_body which affects both endpoints.

There are also two slight code differences, which appear to just be oversights when updating one endpoint:

Standard endpoint JSON parses to handle non-200 responses (fix content maybe a str #2968), extended doesn't do this
Standard endpoint checks if usage exists, extended just uses it without checking. Added in switch to aiohttp post request mode #2273 but not super clear why.

I think documenting the API differences would be helpful, as it's not immediately clear without digging around the code and PR history exactly why these two chat completion endpoints exist, whether the diverged one can still be used with the OpenAI library, and how to actually use the additional parameters.

Perhaps the top_k argument should also be deprecated in the standard API, as it's not actually part of the OpenAI API protocol. Users who need it can then be encouraged to use the diverged/extended API, same as repetition_penalty.

digisomni · 2024-03-19T12:29:29Z

Okay I gotcha. So the reasoning behind it is that OpenAI will only support things that, well, OpenAI supports. So being locked into their API is rough, we should maintain support but not require all our use cases to fit into it.

There's tons of different types of models being released on the regular and we all use FastChat to interface with them, so the idea here is that we should have a derivative API perhaps that allows us to add nice "production-ready" features, e.g. token checking, and other things, to help make testing all the way through to deployment of apps with FastChat more amicable.

As for documenting, I'll have to return to this and see what needs updating / adding and probably do it in another nice packaged PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document diverged OpenAI API #3147

Document diverged OpenAI API #3147

freddyheppell commented Mar 13, 2024

digisomni commented Mar 16, 2024

freddyheppell commented Mar 17, 2024

digisomni commented Mar 19, 2024

Document diverged OpenAI API #3147

Document diverged OpenAI API #3147

Comments

freddyheppell commented Mar 13, 2024

digisomni commented Mar 16, 2024

freddyheppell commented Mar 17, 2024

digisomni commented Mar 19, 2024