You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llama.cpp / llama-cpp-python are apparently going to allow for batched inference [1] [2].
Is this something you've on your radar, and are planning to reflect in the lmql-server, as well? So far, we have batch_size=1 for llama.cpp models, right?
Thanks again for creating and maintaining this great project!
Gregor
The text was updated successfully, but these errors were encountered:
Thanks for raising this, we will keep it on our radar. It should be simple to add support for this, once it is upstreamed in llama.cpp/llama-cpp-python.
Hi Luca,
llama.cpp
/llama-cpp-python
are apparently going to allow for batched inference [1] [2].Is this something you've on your radar, and are planning to reflect in the lmql-server, as well? So far, we have
batch_size=1
for llama.cpp models, right?Thanks again for creating and maintaining this great project!
Gregor
The text was updated successfully, but these errors were encountered: