Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: allow to get default generation settings for completion #5307

Merged

Conversation

z80maniac
Copy link
Contributor

What does it do?

The PR adds a new field into the /props response - default_generation_settings. This object contains the default server params that will be used to generate the response. Its contents are exactly the same as in the generation_settings object from the /completion endpoint.

What does it solve?

This PR mainly addresses one of my points in #4216 (comment)

  1. There should be a way to get some common server params, like the context size, without doing a /completion request. Since an API client can't set the context size, it needs to get it from the server. Yes, the context size is returned from the /completion endpoint, but there should be a way to get it before sending the first inference request, because the API client might need to use the context size to process the input or something else. I think some common "static" server params like n_ctx and model can be returned from the /props.

Now API clients can get the context size without any trickery. Before that I used {"n_predict": 0} request, but recently it stopped working (#5246) and it was a hack anyway.

Also, this PR will allow API clients to get the default inference params before doing any inference. For example, if the API client decides to populate its GUI with these default values.

Implementation

  1. For the new default_generation_settings object, I take the first slot available. Here the code assumes that all slots have identical default params. Not sure if this is true or not.

  2. The JSON is stored right after creating the slot. We can't get info from the slot itself after that, because it will/may be polluted by API params that user already sent to the server.

  3. This line is kind of awkward:

    default_generation_settings_for_props["seed"] = -1;

    But i'm not sure how to do it "properly".

Example

The resulting /props response example:

{
  "assistant_name": "",
  "default_generation_settings": {
    "frequency_penalty": 0,
    "grammar": "",
    "ignore_eos": false,
    "logit_bias": [],
    "min_p": 0.05000000074505806,
    "mirostat": 0,
    "mirostat_eta": 0.10000000149011612,
    "mirostat_tau": 5,
    "model": "/opt/models/text/llama-2-13b-chat.Q4_K_M.gguf",
    "n_ctx": 2048,
    "n_keep": 0,
    "n_predict": -1,
    "n_probs": 0,
    "penalize_nl": true,
    "penalty_prompt_tokens": [],
    "presence_penalty": 0,
    "repeat_last_n": 64,
    "repeat_penalty": 1.100000023841858,
    "seed": -1,
    "stop": [],
    "stream": true,
    "temperature": 0.800000011920929,
    "tfs_z": 1,
    "top_k": 40,
    "top_p": 0.949999988079071,
    "typical_p": 1,
    "use_penalty_prompt_tokens": false
  },
  "user_name": ""
}

@ggerganov ggerganov merged commit a2d60c9 into ggerganov:master Feb 5, 2024
53 checks passed
@z80maniac
Copy link
Contributor Author

Damn, I was a little bit too late... So, several minutes ago I got a letter from someone who told me that there's already an endpoint /model.json for that info. I checked it and it indeed returns all the model settings, including the context size. But it's not documented anywhere, so I had no idea it even existed. What is that endpoint anyway? Some sort of intentionally undocumented hack for developers?

curl -Ss http://127.0.0.1:8080/model.json | jq

{
  "frequency_penalty": 0,
  "grammar": "",
  "ignore_eos": false,
  "logit_bias": [],
  "min_p": 0.05000000074505806,
  "mirostat": 0,
  "mirostat_eta": 0.10000000149011612,
  "mirostat_tau": 5,
  "model": "/opt/models/text/llama-2-13b-chat.Q4_K_M.gguf",
  "n_ctx": 512,
  "n_keep": 0,
  "n_predict": -1,
  "n_probs": 0,
  "penalize_nl": true,
  "penalty_prompt_tokens": [],
  "presence_penalty": 0,
  "repeat_last_n": 64,
  "repeat_penalty": 1.100000023841858,
  "seed": 4294967295,
  "stop": [],
  "stream": true,
  "temperature": 0.800000011920929,
  "tfs_z": 1,
  "top_k": 40,
  "top_p": 0.949999988079071,
  "typical_p": 1,
  "use_penalty_prompt_tokens": false
}

I still think that /props is more appropriate endpoint for this, because why have two separate endpoints for server info?

@ggerganov
Copy link
Owner

ggerganov commented Feb 5, 2024

FWIW, I also didn't know (or forgot) there is a model.json endpoint. We should leave just one of those - maybe keep the original one and document it in the readme?

@z80maniac
Copy link
Contributor Author

Either way works for me, but I personally think that there's no need for a separate endpoint, when all can be in /props. But to not break compatibility the original endpoint needs to stay, and this PR needs to be reverted. Though, I'm not sure if such undocumented stuff should be protected by compatibility guarantees.

@ggerganov
Copy link
Owner

Ok, let's remove model.json

@z80maniac
Copy link
Contributor Author

z80maniac commented Feb 5, 2024

It seems like /model.json is used in examples/server/public/completion.js. So it needs to be reworked too, I guess. It's embedded into the server itself, so there should be no compatibility issues in this regard. I can try to replace it with the new endpoint.

EDIT: seems like /model.json is in dead code and not even called from anywhere, unless I'm missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants