Use fixed slot id for FIM requests by ggerganov · Pull Request #155 · ggml-org/llama.vscode

ggerganov · 2026-01-11T10:14:42Z

This change should improve the prefix cache utilization for FIM requests by forcing the main task on the same slot (id = 0) when multiple completions are used.

However, there is currently a bug in llama-server that has to be fixed first, so putting into draft until we fix it (see ggml-org/llama.cpp#18663 (comment))

igardev · 2026-01-13T14:41:59Z

@ggerganov Is the bug already fixed, so that we can merge the pull request?

ggerganov · 2026-01-13T14:47:02Z

Not yet, still working on a fix: ggml-org/llama.cpp#18789

Use fixed slot id for FIM requests

fffa956

igardev marked this pull request as ready for review February 8, 2026 21:30

igardev merged commit 0182f4f into master Feb 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fixed slot id for FIM requests#155

Use fixed slot id for FIM requests#155
igardev merged 1 commit intomasterfrom
gg/use-slot-0

ggerganov commented Jan 11, 2026

Uh oh!

igardev commented Jan 13, 2026

Uh oh!

ggerganov commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Jan 11, 2026

Uh oh!

igardev commented Jan 13, 2026

Uh oh!

ggerganov commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants