Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: Speculative FIM #17

Open
VJHack opened this issue Jan 2, 2025 · 3 comments
Open

cache: Speculative FIM #17

VJHack opened this issue Jan 2, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@VJHack
Copy link
Member

VJHack commented Jan 2, 2025

Now that a cache mechanism is in place as of #15 , there are times when the server is idle. This time could be used to speculatively send requests to the server and cache responses ahead of time. If the prediction turns out to be wrong, it can discard those results from the cache. Essentially, this would allow us to make predictions in advance and backtrack if it's incorrect. Doing this would maximize the utilization of the server.

Here is a scenario when this could be useful:
The user has the following code with a suggestion.
Screen Shot 2025-01-01 at 11 59 32 PM

While waiting for the user to accept or reject the suggestion, we could make another request assuming the user already accepted the current suggestion and cache that response. Since this next suggestion is already cached, it would show up much quicker.
Screen Shot 2025-01-02 at 12 02 21 AM

@ggerganov
Copy link
Member

This is a great idea. Btw, it's better to call this feature with some different name from "speculative decoding" because this will conflict with the established meaning of this term. Maybe something like "speculative FIM" or "speculative suggestion/completion"?

@VJHack VJHack changed the title cache: Speculative Decoding cache: Speculative FIM Jan 2, 2025
@VJHack VJHack added the enhancement New feature or request label Jan 5, 2025
@ggerganov
Copy link
Member

ggerganov commented Jan 23, 2025

This feature is already implemented in the llama.vscode extension and it works very nice. We should add it here too.

@VJHack
Copy link
Member Author

VJHack commented Jan 23, 2025

This feature is already implemented in the llama.vscode extension and it works very nice. We should add it here too.

I'll work on adding this feature here today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants