Add "/chat/completions" as alias for "/v1/chat/completions" #5722

jorgealias · 2024-02-26T06:51:49Z

From server : improvements and maintenance #4216

Regarding:

Use multiple mount points for the OAI API

llama.cpp/examples/server/server.cpp

Lines 2682 to 2684 in af19d35

// TODO: add mount point without "/v1" prefix -- how?

svr.Post("/v1/chat/completions", [&llama](const httplib::Request &req, httplib::Response &res)

Reusing the same "completions" lambda for the mount point without "/v1" prefix.

Added a new "concurrent OAI completions requests no v1" testing step to validate it is working for the new mount point.

The change only moved the original lambda two indents left, to match other "auto" formatting. Changes are seen better when ignoring the white space differences: view ignoring ws

If other similar changes are necessary, for more mount points, then some work can be done to keep these changes more concise.

phymbert

Thanks for the tests. I would suggest to better introduce an --api-prefix option which will allow to prefix all endpoints according to what expects the client. This fix does not address /v1/embeddings.

phymbert · 2024-02-26T21:23:37Z

@ngxson What is your point of view on the proposed approach ?

ngxson · 2024-02-26T21:41:42Z

Thanks for the tests. I would suggest to better introduce an --api-prefix option which will allow to prefix all endpoints according to what expects the client. This fix does not address /v1/embeddings.

Yeah maybe a good idea, but I'm not sure what are possible values for --api-prefix (I mean it's obvious that we can use whatever prefix we want, but as a normal user, what are possible use cases?)

If we default --api-prefix to /v1, then what will happen when OpenAI release their new /v2? In that case, changing default value of --api-prefix may introduce breaking changes.

phymbert · 2024-02-26T22:04:17Z

Thanks for the tests. I would suggest to better introduce an --api-prefix option which will allow to prefix all endpoints according to what expects the client. This fix does not address /v1/embeddings.

Yeah maybe a good idea, but I'm not sure what are possible values for --api-prefix (I mean it's obvious that we can use whatever prefix we want, but as a normal user, what are possible use cases?)

If we default --api-prefix to /v1, then what will happen when OpenAI release their new /v2? In that case, changing default value of --api-prefix may introduce breaking changes.

Good point, what about --api-prefix-v1 ? :/ IMHO this should not be done at the server level but with some proxy rewrite rules.

@jorgealias what are the clients which expects /chat/completions ? All clients I have seen can have a base_url.

For example, you can set openai.base_url='http://localhost:8080/v1/chat .

@ggerganov What was the motivation to have different prefixes ? URI versionning is the common REST standard.

ngxson · 2024-02-26T22:09:21Z

Instead of --api-prefix, I would propose --openai-version "v1,v2" (default to v1) and the / will be pointed to the highest one in the list.

Still, it's a future-proof thing that I'm not even sure if that's a good idea or not. Maybe we should wait until OAI release v2 API?

jorgealias · 2024-02-27T02:53:34Z

@jorgealias what are the clients which expects /chat/completions ?

All clients I have seen can have a base_url.
For example, you can set openai.base_url='http://localhost:8080/v1/chat' .

The change is only providing a simple implementation for the comment at line 2683:

// TODO: add mount point without "/v1" prefix -- how?

Attempting to answer the question how.

Using openai.base_url='http://localhost:8080/v1/' could be indeed a valid option when all APIs have the same version.

More can be done, to remap/reuse existing handlers, and support multiple or different versions, if such a reuse is doable and necessary, for various clients. That could be also applicable for other handlers, like /v1/embeddings. I am not aware of such other clients but my knowledge is limited.

Changing the version usually implies more than just a simple reuse. It might not be safe to make future assumptions and come with options on how to handle such changes, for matching other APIs. Indeed, it is better to wait.

Regarding this specific change, it might have been faster to provide a message showing this implementation or to ask why would that be necessary, but here we are. If there is no immediate need, please press the close button :)

Why this? Just trying to help, starting somewhere. One way to say thank you for the great work and effort to build this project.

phymbert · 2024-02-27T06:28:09Z

thanks, @jorgealias : contributions are very welcomed. Maybe we can switch this one to draft at the moment the time to sort out a good approach to support versionning.

Since you provided a scenario on the test framework, it would be nice if you could test missing features in the server. We are planning to refractor the code, and having good coverage will definitely help. Typically, everything around multimodal.

ggerganov · 2024-02-27T14:04:17Z

@ggerganov What was the motivation to have different prefixes ? URI versionning is the common REST standard.

It was a feature request in one of the discussions: #4160 (comment)

ngxson · 2024-02-27T20:11:32Z

@jorgealias You are right about the point of fixing TODO: add mount point without "/v1" prefix -- how?. In fact, I think what's being asked in the TODO is "how to mount one handler to multiple routes?", and your PR did addresses the issue by using lambda function.

So I think it's good to merge this one.

@phymbert Sorry I misunderstood your initial question. In fact, the /v1 that we have is not to have versioning, but because most codes that you can find on the internet that uses OpenAI API points to /v1/chat/completions. This endpoint is actually shown in OpenAI docs:

So that's why I mentioned that we should not be future-proof, because the /v1 part in llama.cpp is solely to be compatible with codes already using OpenAI API, and no one will know when they release the v2 of their API.

For the same purpose, the other endpoints supported by OpenAI like /embeddings, /completions or /models should be able to access with and without the /v1 part. However, since these changes requires some modifications in the code that parse json, we can do that in another PR.

For now, This PR seems OK for me to be merged.

phymbert

ok thanks for the explanation.

jorgealias · 2024-02-28T06:57:45Z

I am pushing now, but @phymbert did ask

This fix does not address /v1/embeddings

I am not sure what needs to be addressed but if it is about an extra /v1, I did mention that things can be done a bit differently to support more similar changes easily.

Adding a helper function like:

typedef httplib::Server& (httplib::Server::*ServerMemberFn)
    (const std::string &, const httplib::Server::Handler);

static void server_handler(httplib::Server &svr, ServerMemberFn function,
    const std::vector<std::string> &patterns, const httplib::Server::Handler handler
) {
    for (const auto& pattern : patterns)
    {
        (svr.*function)(pattern, handler);
    }
}

will help eliminating the need for extra named lambdas

If only one path for a POST, then use the standard:

    svr.Post("/infill", [&llama, &validate_api_key](const httplib::Request &req, httplib::Response &res)
         // { lambda block here }
    );

For multiple paths for a POST with the same lambda, then use the new helper:

    server_handler(svr, &httplib::Server::Post, { "/chat/completions", "/v1/chat/completions" },
        [&llama, &validate_api_key, &sparams](const httplib::Request &req, httplib::Response &res)
         // { lambda block here }
    );

or

    server_handler(svr, &httplib::Server::Post, { "/embeddings", "/v1/embeddings" },
        [&llama](const httplib::Request &req, httplib::Response &res)
         // { lambda block here }
    );

As in, only replacing:

svr.Post("/infill"
with
server_handler(svr, &httplib::Server::Post, { "/embeddings", "/v1/embeddings" }

And that should also work for srv.Get(

ggerganov · 2024-02-28T08:38:54Z

Adding a helper function like:

Yup, good idea

* Add "/chat/completions" as alias for "/v1/chat/completions" * merge to upstream master * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <[email protected]>

jorgealias added 3 commits February 25, 2024 22:58

Add "/chat/completions" as alias for "/v1/chat/completions"

6f318cf

Merge branch 'master' into multiple-mount-points

00d5cdb

merge to upstream master

aa12e73

ggerganov approved these changes Feb 26, 2024

View reviewed changes

ggerganov requested a review from phymbert February 26, 2024 08:07

phymbert requested changes Feb 26, 2024

View reviewed changes

ngxson mentioned this pull request Feb 27, 2024

server : improvements and maintenance #4216

Open

10 tasks

phymbert approved these changes Feb 27, 2024

View reviewed changes

minor : fix trailing whitespace

f993d14

ggerganov merged commit efc7225 into ggerganov:master Feb 28, 2024
57 of 58 checks passed

ngxson mentioned this pull request Feb 28, 2024

Clean up server code #5762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "/chat/completions" as alias for "/v1/chat/completions" #5722

Add "/chat/completions" as alias for "/v1/chat/completions" #5722

jorgealias commented Feb 26, 2024

phymbert left a comment

phymbert commented Feb 26, 2024 •

edited

Loading

ngxson commented Feb 26, 2024

phymbert commented Feb 26, 2024 •

edited

Loading

ngxson commented Feb 26, 2024 •

edited

Loading

jorgealias commented Feb 27, 2024

phymbert commented Feb 27, 2024

ggerganov commented Feb 27, 2024

ngxson commented Feb 27, 2024 •

edited

Loading

phymbert left a comment

jorgealias commented Feb 28, 2024

ggerganov commented Feb 28, 2024


	// TODO: add mount point without "/v1" prefix -- how?
	svr.Post("/v1/chat/completions", [&llama](const httplib::Request &req, httplib::Response &res)

Add "/chat/completions" as alias for "/v1/chat/completions" #5722

Add "/chat/completions" as alias for "/v1/chat/completions" #5722

Conversation

jorgealias commented Feb 26, 2024

phymbert left a comment

Choose a reason for hiding this comment

phymbert commented Feb 26, 2024 • edited Loading

ngxson commented Feb 26, 2024

phymbert commented Feb 26, 2024 • edited Loading

ngxson commented Feb 26, 2024 • edited Loading

jorgealias commented Feb 27, 2024

phymbert commented Feb 27, 2024

ggerganov commented Feb 27, 2024

ngxson commented Feb 27, 2024 • edited Loading

phymbert left a comment

Choose a reason for hiding this comment

jorgealias commented Feb 28, 2024

ggerganov commented Feb 28, 2024

phymbert commented Feb 26, 2024 •

edited

Loading

phymbert commented Feb 26, 2024 •

edited

Loading

ngxson commented Feb 26, 2024 •

edited

Loading

ngxson commented Feb 27, 2024 •

edited

Loading