Feature request: Batched inference for llama.cpp models #261

ggbetz · 2023-11-01T11:00:18Z

Hi Luca,

llama.cpp / llama-cpp-python are apparently going to allow for batched inference [1] [2].

Is this something you've on your radar, and are planning to reflect in the lmql-server, as well? So far, we have batch_size=1 for llama.cpp models, right?

Thanks again for creating and maintaining this great project!

Gregor

The text was updated successfully, but these errors were encountered:

lbeurerkellner · 2023-11-02T11:37:21Z

Thanks for raising this, we will keep it on our radar. It should be simple to add support for this, once it is upstreamed in llama.cpp/llama-cpp-python.

reuank · 2023-11-28T16:50:40Z

Just a quick update on this topic. It looks like llama-cpp-python will add that feature very soon: abetlen/llama-cpp-python#951.

lbeurerkellner · 2024-02-27T14:18:36Z

Marking this as a good first issue for backend work. The llama.cpp backend lives in https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/backends/llama_cpp_model.py and is currently limited to max_batch_size of 1.

Saibo-creator · 2024-03-20T09:34:53Z

It looks like the batch inference feature mentioned earlier is still in development. I'll hold off for now.

balu54 · 2024-07-23T16:41:40Z

Is parallel inference will comes with batche inference feature?

Marking this as a good first issue for backend work. The llama.cpp backend lives in https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/backends/llama_cpp_model.py and is currently limited to max_batch_size of 1.

Is parallel inference includes in batch inference.

lbeurerkellner added the enhancement New feature or request label Nov 2, 2023

lbeurerkellner added the good first issue Good for newcomers label Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Batched inference for llama.cpp models #261

Feature request: Batched inference for llama.cpp models #261

ggbetz commented Nov 1, 2023

lbeurerkellner commented Nov 2, 2023

reuank commented Nov 28, 2023

lbeurerkellner commented Feb 27, 2024

Saibo-creator commented Mar 20, 2024

balu54 commented Jul 23, 2024

Feature request: Batched inference for llama.cpp models #261

Feature request: Batched inference for llama.cpp models #261

Comments

ggbetz commented Nov 1, 2023

lbeurerkellner commented Nov 2, 2023

reuank commented Nov 28, 2023

lbeurerkellner commented Feb 27, 2024

Saibo-creator commented Mar 20, 2024

balu54 commented Jul 23, 2024