We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sometimes the suffix leaks into the FIM inference response for some reason, which leads to unwanted duplication.
Not [CARET]qual|`≠`
Not equal|`≠`
Not equal|`≠`qual|`≠`
llama-server -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256
slot launch_slot_: id 0 | task 1130 | processing task slot update_slots: id 0 | task 1130 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 1225 slot update_slots: id 0 | task 1130 | kv cache rm [1107, end) slot update_slots: id 0 | task 1130 | prompt processing progress, n_past = 1225, n_tokens = 118, progress = 0.096327 slot update_slots: id 0 | task 1130 | prompt done, n_past = 1225, n_tokens = 118 slot release: id 0 | task 1130 | stop processing: n_past = 1226, truncated = 0 slot print_timing: id 0 | task 1130 | prompt eval time = 55.02 ms / 118 tokens ( 0.47 ms per token, 2144.56 tokens per second) eval time = 19.55 ms / 2 tokens ( 9.78 ms per token, 102.30 tokens per second) total time = 74.57 ms / 120 tokens srv update_slots: all slots are idle request: POST /infill 127.0.0.1 200 slot launch_slot_: id 0 | task 1133 | processing task slot update_slots: id 0 | task 1133 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 1225 slot update_slots: id 0 | task 1133 | need to evaluate at least 1 token to generate logits, n_past = 1225, n_prompt_tokens = 1225 slot update_slots: id 0 | task 1133 | kv cache rm [1224, end) slot update_slots: id 0 | task 1133 | prompt processing progress, n_past = 1225, n_tokens = 1, progress = 0.000816 slot update_slots: id 0 | task 1133 | prompt done, n_past = 1225, n_tokens = 1 slot release: id 0 | task 1133 | stop processing: n_past = 1226, truncated = 0 slot print_timing: id 0 | task 1133 | prompt eval time = 20.36 ms / 1 tokens ( 20.36 ms per token, 49.10 tokens per second) eval time = 18.47 ms / 2 tokens ( 9.24 ms per token, 108.27 tokens per second) total time = 38.84 ms / 3 tokens srv update_slots: all slots are idle request: POST /infill 127.0.0.1 200
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Sometimes the suffix leaks into the FIM inference response for some reason, which leads to unwanted duplication.
Input code
Expected suggestion
Actual suggestion
Server command
Server log
The text was updated successfully, but these errors were encountered: