cache: Speculative FIM #17

VJHack · 2025-01-02T06:15:55Z

Now that a cache mechanism is in place as of #15 , there are times when the server is idle. This time could be used to speculatively send requests to the server and cache responses ahead of time. If the prediction turns out to be wrong, it can discard those results from the cache. Essentially, this would allow us to make predictions in advance and backtrack if it's incorrect. Doing this would maximize the utilization of the server.

Here is a scenario when this could be useful:
The user has the following code with a suggestion.

While waiting for the user to accept or reject the suggestion, we could make another request assuming the user already accepted the current suggestion and cache that response. Since this next suggestion is already cached, it would show up much quicker.

ggerganov · 2025-01-02T10:48:58Z

This is a great idea. Btw, it's better to call this feature with some different name from "speculative decoding" because this will conflict with the established meaning of this term. Maybe something like "speculative FIM" or "speculative suggestion/completion"?

ggerganov · 2025-01-23T12:45:39Z

This feature is already implemented in the llama.vscode extension and it works very nice. We should add it here too.

VJHack · 2025-01-23T15:12:41Z

This feature is already implemented in the llama.vscode extension and it works very nice. We should add it here too.

I'll work on adding this feature here today.

VJHack changed the title ~~cache: Speculative Decoding~~ cache: Speculative FIM Jan 2, 2025

VJHack mentioned this issue Jan 3, 2025

cache: Keep cached suggestion when user types same letters #16

Closed

VJHack added the enhancement New feature or request label Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache: Speculative FIM #17

cache: Speculative FIM #17

VJHack commented Jan 2, 2025 •

edited

Loading

ggerganov commented Jan 2, 2025

ggerganov commented Jan 23, 2025 •

edited by VJHack

Loading

VJHack commented Jan 23, 2025

cache: Speculative FIM #17

cache: Speculative FIM #17

Comments

VJHack commented Jan 2, 2025 • edited Loading

ggerganov commented Jan 2, 2025

ggerganov commented Jan 23, 2025 • edited by VJHack Loading

VJHack commented Jan 23, 2025

VJHack commented Jan 2, 2025 •

edited

Loading

ggerganov commented Jan 23, 2025 •

edited by VJHack

Loading