Conversation
|
I think this is awesome, I don't have the time to contribute right now, but this is great! One additional idea I had was to also allow virtual model aliases to be configured via a UI for the API. It could allow mapping of things like "gpt-3.5-turbo" to a full model config (generation params, model file, loading params, etc.) - and allow switching models on the fly easily, like if an API call came in with "code-davinci-002" you could switch to a code completion model. |
|
That's an interesting idea @matatonic, I'll see if I can come up with something in this direction -- the only caveat is that the user will have to have downloaded an appropriate model. This PR will probably take forever to complete, but I think it's very important. Eventually I'll get there. |
This reverts commit 5d0a6f4.
|
So this is going to sound bananas as someone who almost exclusively uses the openAI API with textgen but I think your core approach of having a core “textgen” API with addons or extensions for other APIs feels like the right one. On the other hand yes I do think it would be great for the openai API extension for now to be treated as a “first class citizen” of sorts and get a fresh coat of paint. The reason I would not change everything over to JUST be the openai API is that we are still in 2023, ChatGPT just launched and made waves months ago so of course as the current market leader all current “cool” stuff is being made to utilize it. But who is to say that will still be the case in 2024? Meta is in the process of launching Llama on so many services for free and already has the server infrastructure to transmit text to servers at high speed all around the world. What if 2024 sees meta launch free API access to Llama2 or Llama2.5 or whatever and they have their own API fornat? Suddenly we may see services crested mostly for that platform as people will not have to pay to use it. Same for Microsoft, google, apple. And it’s not like the OpenAI API is the most user friendly - having to deal with all those dictionaries within dictionaries within arrays. As they continue to push toward multi-modal they could even change their own API again the way they did from completions to chat completions. So I just worry about you basing the core of your API software on open AI and then it getting deprecated or outdated. Maybe I am misunderstanding the intent here though - maybe by “one API” you just mean people wouldn’t have to turn on and off different endpoints they would just be available and you could add on to them and all would have access to the core program features and changes made to the GUI, which would be great. Anyway I’m just one voice and don’t know very much so take this with a grain of salt. I do think you’ve accomplished something amazing here which is democratizing AI and making it as easy to use as possible. |
|
@teddybear082 even if another API becomes more popular, using FastAPI, type hints, SSE instead of websockets, and having an API documentation will still be a benefit. It will be a matter of changing the endpoints and parameters while retaining the overall structure. It will also be possible for extensions to add endpoints very easily by simply importing |
|
I think that the goal of this PR has been achieved -- moving the features of the old API to the OpenAI API. It is passing all my tests. The docs need some more polishing, but that can be done later. I'll merge the PR so that more people can test the updated API in the dev branch. |
|
just for completeness here, I did pull down the dev branch today and it also cleared my openai-api tests. |
|
How do I do to abort the completion? Closing the connection doesn't make the completion stop, and |
|
@oobabooga Yes, it's working as expected. Thank you! However, I found another potential problem with this API, the first generated token never comes with space, and an empty token appears before it. {
"id": "conv-1699323154472440320",
"object": "text_completion.chunk",
"created": 1699323154,
"model": "mistral-7b-v0.1.Q6_K.gguf",
"choices": [
{
"index": 0,
"finish_reason": null,
"text": "",
"logprobs": {
"top_logprobs": [
{}
]
}
}
]
}
{
"id": "conv-1699323154472440320",
"object": "text_completion.chunk",
"created": 1699323154,
"model": "mistral-7b-v0.1.Q6_K.gguf",
"choices": [
{
"index": 0,
"finish_reason": null,
"text": "upon",
"logprobs": {
"top_logprobs": [
{}
]
}
}
]
}This is llama.cpp's OpenAI proxy for reference: {
"id": "cmpl",
"object": "text_completion.chunk",
"created": 1699322948,
"model": "LLaMA_CPP",
"choices": [
{
"finish_reason": null,
"index": 0,
"text": " Upon"
}
]
} |
|
That is indeed a bug. It should be fixed after 97c21e5 If you notice anything else weird, please let me know! |
|
I don't know if this has any impact on any thing and you may have seen it but on the openai playground, the scripts now recommend a slightly different format to call the openai API in python code. Perhaps this doesn't impact at all the server side of trying to emulate the openai server, only the requestor, but figured I would pass along just in case. There's also a new "response format" field that can be passed, maybe something like jsonformers can be used or grammar implemented when someone passes that field, not sure: https://platform.openai.com/playground?mode=chat&model=gpt-4-1106-preview https://platform.openai.com/docs/quickstart?context=python https://platform.openai.com/docs/api-reference/chat/streaming |
The most popular LLM API in the world is the OpenAI API, so I think that it makes sense to emulate it in this project when the
--apiflag is provided. This is already the case for vLLM and FastChat.The goal is to start from the current
openaiextension and make the following changes before merging:http.serverimplementation with FastAPI, such that the API docs can be accessed at127.0.0.1:5001/docs.Status
I have converted everything to fastAPI, and the completion endpoints seemingly work (both with and without streaming).
OpenAI API reference
https://platform.openai.com/docs/api-reference
cc @matatonic