FIM result sometimes includes parts of the suffix #13

Jaid · 2025-01-27T07:24:46Z

Sometimes the suffix leaks into the FIM inference response for some reason, which leads to unwanted duplication.

Input code

Not [CARET]qual|`≠`

Expected suggestion

Not equal|`≠`

Actual suggestion

Not equal|`≠`qual|`≠`

Server command

llama-server -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256

Server log

slot launch_slot_: id  0 | task 1130 | processing task
slot update_slots: id  0 | task 1130 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 1225
slot update_slots: id  0 | task 1130 | kv cache rm [1107, end)
slot update_slots: id  0 | task 1130 | prompt processing progress, n_past = 1225, n_tokens = 118, progress = 0.096327
slot update_slots: id  0 | task 1130 | prompt done, n_past = 1225, n_tokens = 118
slot      release: id  0 | task 1130 | stop processing: n_past = 1226, truncated = 0
slot print_timing: id  0 | task 1130 |
prompt eval time =      55.02 ms /   118 tokens (    0.47 ms per token,  2144.56 tokens per second)
       eval time =      19.55 ms /     2 tokens (    9.78 ms per token,   102.30 tokens per second)
      total time =      74.57 ms /   120 tokens
srv  update_slots: all slots are idle
request: POST /infill 127.0.0.1 200
slot launch_slot_: id  0 | task 1133 | processing task
slot update_slots: id  0 | task 1133 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 1225
slot update_slots: id  0 | task 1133 | need to evaluate at least 1 token to generate logits, n_past = 1225, n_prompt_tokens = 1225
slot update_slots: id  0 | task 1133 | kv cache rm [1224, end)
slot update_slots: id  0 | task 1133 | prompt processing progress, n_past = 1225, n_tokens = 1, progress = 0.000816
slot update_slots: id  0 | task 1133 | prompt done, n_past = 1225, n_tokens = 1
slot      release: id  0 | task 1133 | stop processing: n_past = 1226, truncated = 0
slot print_timing: id  0 | task 1133 |
prompt eval time =      20.36 ms /     1 tokens (   20.36 ms per token,    49.10 tokens per second)
       eval time =      18.47 ms /     2 tokens (    9.24 ms per token,   108.27 tokens per second)
      total time =      38.84 ms /     3 tokens
srv  update_slots: all slots are idle
request: POST /infill 127.0.0.1 200

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIM result sometimes includes parts of the suffix #13

FIM result sometimes includes parts of the suffix #13

Jaid commented Jan 27, 2025

FIM result sometimes includes parts of the suffix #13

FIM result sometimes includes parts of the suffix #13

Comments

Jaid commented Jan 27, 2025

Input code

Expected suggestion

Actual suggestion

Server command

Server log