Skip to content

Commit

Permalink
server : allow to get default generation settings for completion (gge…
Browse files Browse the repository at this point in the history
  • Loading branch information
z80maniac authored and hodlen committed Apr 1, 2024
1 parent 7bfe882 commit 35b93a6
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 2 deletions.
16 changes: 15 additions & 1 deletion examples/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,21 @@ Notice that each `probs` is an array of length `n_probs`.

It also accepts all the options of `/completion` except `stream` and `prompt`.

- **GET** `/props`: Return the required assistant name and anti-prompt to generate the prompt in case you have specified a system prompt for all slots.
- **GET** `/props`: Return current server settings.

### Result JSON

```json
{
"assistant_name": "",
"user_name": "",
"default_generation_settings": { ... }
}
```
- `assistant_name` - the required assistant name to generate the prompt in case you have specified a system prompt for all slots.
- `user_name` - the required anti-prompt to generate the prompt in case you have specified a system prompt for all slots.
- `default_generation_settings` - the default generation settings for the `/completion` endpoint, has the same fields as the `generation_settings` response object from the `/completion` endpoint.
- **POST** `/v1/chat/completions`: OpenAI-compatible Chat Completions API. Given a ChatML-formatted json description in `messages`, it returns the predicted completion. Both synchronous and streaming mode are supported, so scripted and interactive applications work fine. While no strong claims of compatibility with OpenAI API spec is being made, in our experience it suffices to support many apps. Only ChatML-tuned models, such as Dolphin, OpenOrca, OpenHermes, OpenChat-3.5, etc can be used with this endpoint. Compared to `api_like_OAI.py` this API implementation does not require a wrapper to be served.
Expand Down
7 changes: 6 additions & 1 deletion examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@ struct llama_server_context

// slots / clients
std::vector<llama_client_slot> slots;
json default_generation_settings_for_props;

llama_server_queue queue_tasks;
llama_server_response queue_results;
Expand Down Expand Up @@ -430,6 +431,9 @@ struct llama_server_context
slots.push_back(slot);
}

default_generation_settings_for_props = get_formated_generation(slots.front());
default_generation_settings_for_props["seed"] = -1;

batch = llama_batch_init(n_ctx, 0, params.n_parallel);

// empty system prompt
Expand Down Expand Up @@ -2614,7 +2618,8 @@ int main(int argc, char **argv)
res.set_header("Access-Control-Allow-Origin", req.get_header_value("Origin"));
json data = {
{ "user_name", llama.name_user.c_str() },
{ "assistant_name", llama.name_assistant.c_str() }
{ "assistant_name", llama.name_assistant.c_str() },
{ "default_generation_settings", llama.default_generation_settings_for_props }
};
res.set_content(data.dump(), "application/json; charset=utf-8");
});
Expand Down

0 comments on commit 35b93a6

Please sign in to comment.