Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document diverged OpenAI API #3147

Open
freddyheppell opened this issue Mar 13, 2024 · 3 comments
Open

Document diverged OpenAI API #3147

freddyheppell opened this issue Mar 13, 2024 · 3 comments

Comments

@freddyheppell
Copy link

The existence of the diverged OpenAI API appears to be undocumented at the moment, but has some useful and requested features (e.g. repetition_penalty #1914).

As far as I can tell, it appears these endpoints can be used with the OpenAI client, and additional parameters passed with the extra_body argument, e.g.:

# ...
openai.base_url = "http://localhost:8000/api/v1/"
# ...
completion = openai.chat.completions.create(
  model=model,
  messages=[{"role": "user", "content": "Hello! What is your name?"}],
  extra_body={
    "repetition_penalty": 1.10
  }
)

(this is also useful for top_k which is not a supported argument for the OpenAI client but is supported in both APIs in FastChat)

It would be good to at least document that this API exists and confirm whether the divergences actually impact using the OpenAI client.

@digisomni
Copy link
Contributor

Can you link to the lines where it is diverging? I know of one part of the diverging code as I had added that close to a year ago. However, I just want to be sure we are thinking about the same thing.

@freddyheppell
Copy link
Author

I said "diverging" as that was what the original PR (#1536) that introduced this second protocol and API said.

I've noticed these differences between the OpenAI client library and the two chat completion endpoints in FastChat's OpenAI server:

There are also two slight code differences, which appear to just be oversights when updating one endpoint:

I think documenting the API differences would be helpful, as it's not immediately clear without digging around the code and PR history exactly why these two chat completion endpoints exist, whether the diverged one can still be used with the OpenAI library, and how to actually use the additional parameters.

Perhaps the top_k argument should also be deprecated in the standard API, as it's not actually part of the OpenAI API protocol. Users who need it can then be encouraged to use the diverged/extended API, same as repetition_penalty.

@digisomni
Copy link
Contributor

Okay I gotcha. So the reasoning behind it is that OpenAI will only support things that, well, OpenAI supports. So being locked into their API is rough, we should maintain support but not require all our use cases to fit into it.

There's tons of different types of models being released on the regular and we all use FastChat to interface with them, so the idea here is that we should have a derivative API perhaps that allows us to add nice "production-ready" features, e.g. token checking, and other things, to help make testing all the way through to deployment of apps with FastChat more amicable.

As for documenting, I'll have to return to this and see what needs updating / adding and probably do it in another nice packaged PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants