Skip to content

Use fixed slot id for FIM requests#155

Merged
igardev merged 1 commit intomasterfrom
gg/use-slot-0
Feb 8, 2026
Merged

Use fixed slot id for FIM requests#155
igardev merged 1 commit intomasterfrom
gg/use-slot-0

Conversation

@ggerganov
Copy link
Member

rel #148

This change should improve the prefix cache utilization for FIM requests by forcing the main task on the same slot (id = 0) when multiple completions are used.

However, there is currently a bug in llama-server that has to be fixed first, so putting into draft until we fix it (see ggml-org/llama.cpp#18663 (comment))

@igardev
Copy link
Collaborator

igardev commented Jan 13, 2026

@ggerganov Is the bug already fixed, so that we can merge the pull request?

@ggerganov
Copy link
Member Author

Not yet, still working on a fix: ggml-org/llama.cpp#18789

@igardev igardev marked this pull request as ready for review February 8, 2026 21:30
@igardev igardev merged commit 0182f4f into master Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants