Publish all wheels to PyPI #741

simonw · 2023-09-20T19:26:16Z

It looks like PyPI only has the source distribution for each release: https://pypi.org/project/llama-cpp-python/0.2.6/#files

But the GitHub release at https://github.com/abetlen/llama-cpp-python/releases/tag/v0.2.6 lists many more files than that:

Would it be possible to push those wheels to PyPI as well?

I'd love to be able to pip install llama-cpp-python and get a compiled wheel for my platform.

The text was updated successfully, but these errors were encountered:

abetlen · 2023-10-01T00:28:53Z

Hey @simonw! Big fan of your datasette project.

I hear you and I would like to make the setup process a little easier and less error-prone.

Currently llama.cpp supports a number of optional accelerations including several BLAS libraries, CUDA versions, OpenCL, and Metal. In theory I could build a pre-built wheel that just includes a version of llama.cpp with no real accelerations enabled but I feel like this is counterintuitive to the goal of providing users with the fastest local inference for their hardware.

I'm open to suggestions though, and I'll try to think of some possible solutions.

simonw · 2023-10-05T23:52:30Z

Two approaches I can think of trying that might work are:

publish separate wheels for each platform, with separate names
publish one large wheel that bundles the different versions together and has code that can pick the "right" one

For that first option, one way that could work is to have a llama-cpp-python package which everyone installs but which doesn't actually work until you install one of the "backend" packages: llama-cpp-python-cuda-12 or llama-cpp-python-metal or similar.

How large are the different binaries? If all of them could be bundled in a single wheel that was less than 50MB then that could be a neat solution, if you can write code that can detect which one to use.

You could even distribute that as llama-cpp-python-bundle and tell people to install that one if they aren't sure which version would work best for them.

it's a tricky problem though! I bet there are good options I've missed here.

abetlen · 2024-04-04T20:55:11Z

Hey @simonw it took a while but this is finally possible through a self-hosted PEP503 repository on Github Pages (see #1247)

You should now be able to specify

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

on the CLI or

 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
llama-cpp-python

in a requirements.txt to install pre-built binary version of llama-cpp-python.

The PR also includes initial support for Metal and CUDA wheels though I had to limit the number of supported Python and CUDA versions to avoid a combinatorial explosion in the number of builds.

abetlen added the enhancement New feature or request label Dec 22, 2023

abetlen mentioned this issue Mar 3, 2024

Binary wheels #1247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish all wheels to PyPI #741

Publish all wheels to PyPI #741

simonw commented Sep 20, 2023

abetlen commented Oct 1, 2023

simonw commented Oct 5, 2023 •

edited

Loading

abetlen commented Apr 4, 2024

Publish all wheels to PyPI #741

Publish all wheels to PyPI #741

Comments

simonw commented Sep 20, 2023

abetlen commented Oct 1, 2023

simonw commented Oct 5, 2023 • edited Loading

abetlen commented Apr 4, 2024

simonw commented Oct 5, 2023 •

edited

Loading