You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
slot 0 is processing [task id: 0]
print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)
print_timings: eval time = 0.00 ms / 0 runs ( -nan ms per token, -nan tokens per second)
print_timings: total time = 0.00 ms
{"timestamp":1706720709,"level":"INFO","function":"log_server_request","line":2368,"message":"request","remote_addr":"127.0.0.1","remote_port":37752,"status":200,"method":"POST","path":"/completion","params":{"{\"n_predict\": 0}":""}}
Call the API again, with the prompt (or without - it doesn't matter):
The server does not respond and no logs are produced.
Additional info
git bisect showed that the offending commit is 48c857a.
I used the {"n_predict": 0} trick to get the current context size from the server without clearing up the current cache. Ideally, there should be an API endpoint to return this info, though (/props maybe).
The docs don't say anything about an empty prompt, but I guess with n_predict: 0 it should be allowed (and the server does it correctly for the first request). At least it shouldn't block the entire server forever.
The text was updated successfully, but these errors were encountered:
Current Behavior
After passing an empty prompt to the
server
it stops processing any further requests.Environment and Context
Commit: d3bac7d
OS: Kubuntu 23.10
Steps to Reproduce
I used this model:
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/blob/main/llama-2-13b-chat.Q4_K_M.gguf
The server is built with just
make
, no other params.Start the server:
startup log
prompt
:It completes OK. The
server
output:prompt
(or without - it doesn't matter):The server does not respond and no logs are produced.
Additional info
git bisect
showed that the offending commit is 48c857a.I used the
{"n_predict": 0}
trick to get the current context size from the server without clearing up the current cache. Ideally, there should be an API endpoint to return this info, though (/props
maybe).The docs don't say anything about an empty prompt, but I guess with
n_predict: 0
it should be allowed (and theserver
does it correctly for the first request). At least it shouldn't block the entire server forever.The text was updated successfully, but these errors were encountered: