Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIM result sometimes includes parts of the suffix #13

Open
Jaid opened this issue Jan 27, 2025 · 0 comments
Open

FIM result sometimes includes parts of the suffix #13

Jaid opened this issue Jan 27, 2025 · 0 comments

Comments

@Jaid
Copy link

Jaid commented Jan 27, 2025

Sometimes the suffix leaks into the FIM inference response for some reason, which leads to unwanted duplication.

Image

Input code

Not [CARET]qual|``

Expected suggestion

Not equal|``

Actual suggestion

Not equal|``qual|``

Server command

llama-server -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256

Server log

slot launch_slot_: id  0 | task 1130 | processing task
slot update_slots: id  0 | task 1130 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 1225
slot update_slots: id  0 | task 1130 | kv cache rm [1107, end)
slot update_slots: id  0 | task 1130 | prompt processing progress, n_past = 1225, n_tokens = 118, progress = 0.096327
slot update_slots: id  0 | task 1130 | prompt done, n_past = 1225, n_tokens = 118
slot      release: id  0 | task 1130 | stop processing: n_past = 1226, truncated = 0
slot print_timing: id  0 | task 1130 |
prompt eval time =      55.02 ms /   118 tokens (    0.47 ms per token,  2144.56 tokens per second)
       eval time =      19.55 ms /     2 tokens (    9.78 ms per token,   102.30 tokens per second)
      total time =      74.57 ms /   120 tokens
srv  update_slots: all slots are idle
request: POST /infill 127.0.0.1 200
slot launch_slot_: id  0 | task 1133 | processing task
slot update_slots: id  0 | task 1133 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 1225
slot update_slots: id  0 | task 1133 | need to evaluate at least 1 token to generate logits, n_past = 1225, n_prompt_tokens = 1225
slot update_slots: id  0 | task 1133 | kv cache rm [1224, end)
slot update_slots: id  0 | task 1133 | prompt processing progress, n_past = 1225, n_tokens = 1, progress = 0.000816
slot update_slots: id  0 | task 1133 | prompt done, n_past = 1225, n_tokens = 1
slot      release: id  0 | task 1133 | stop processing: n_past = 1226, truncated = 0
slot print_timing: id  0 | task 1133 |
prompt eval time =      20.36 ms /     1 tokens (   20.36 ms per token,    49.10 tokens per second)
       eval time =      18.47 ms /     2 tokens (    9.24 ms per token,   108.27 tokens per second)
      total time =      38.84 ms /     3 tokens
srv  update_slots: all slots are idle
request: POST /infill 127.0.0.1 200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant