-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: allow to get default generation settings for completion #5307
server: allow to get default generation settings for completion #5307
Conversation
Damn, I was a little bit too late... So, several minutes ago I got a letter from someone who told me that there's already an endpoint
{
"frequency_penalty": 0,
"grammar": "",
"ignore_eos": false,
"logit_bias": [],
"min_p": 0.05000000074505806,
"mirostat": 0,
"mirostat_eta": 0.10000000149011612,
"mirostat_tau": 5,
"model": "/opt/models/text/llama-2-13b-chat.Q4_K_M.gguf",
"n_ctx": 512,
"n_keep": 0,
"n_predict": -1,
"n_probs": 0,
"penalize_nl": true,
"penalty_prompt_tokens": [],
"presence_penalty": 0,
"repeat_last_n": 64,
"repeat_penalty": 1.100000023841858,
"seed": 4294967295,
"stop": [],
"stream": true,
"temperature": 0.800000011920929,
"tfs_z": 1,
"top_k": 40,
"top_p": 0.949999988079071,
"typical_p": 1,
"use_penalty_prompt_tokens": false
} I still think that |
FWIW, I also didn't know (or forgot) there is a |
Either way works for me, but I personally think that there's no need for a separate endpoint, when all can be in |
Ok, let's remove |
It seems like EDIT: seems like |
What does it do?
The PR adds a new field into the
/props
response -default_generation_settings
. This object contains the default server params that will be used to generate the response. Its contents are exactly the same as in thegeneration_settings
object from the/completion
endpoint.What does it solve?
This PR mainly addresses one of my points in #4216 (comment)
Now API clients can get the context size without any trickery. Before that I used
{"n_predict": 0}
request, but recently it stopped working (#5246) and it was a hack anyway.Also, this PR will allow API clients to get the default inference params before doing any inference. For example, if the API client decides to populate its GUI with these default values.
Implementation
For the new
default_generation_settings
object, I take the first slot available. Here the code assumes that all slots have identical default params. Not sure if this is true or not.The JSON is stored right after creating the slot. We can't get info from the slot itself after that, because it will/may be polluted by API params that user already sent to the server.
This line is kind of awkward:
But i'm not sure how to do it "properly".
Example
The resulting
/props
response example: