Binary wheels #1247

abetlen · 2024-03-03T16:45:13Z

Adds initial support for releasing binary wheels. Uses github-pages to publish one index per backend (CPU, Metal, CUDA, etc) to a static site.

Planned release tags for this PR (because I can test these):

cpu
cu121
cu122
cu123
metal

Usage (won't work until merge)

CPU

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Metal

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal

CUDA (12.1)

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

TODO

Port the release workflows from https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels
Reduce number of wheels produced per tag

Related: #1178, #243 and #741

oobabooga · 2024-03-04T00:37:49Z

I have been building llama-cpp-python wheels for each new release in my fork of jllllll's repository:

https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions

It has some minor changes to avoid hitting rate limit errors. It also builds for all Mac versions (I think jllllll removed Mac 11 due to an error that used to happen but doesn't anymore).

abetlen · 2024-03-04T01:17:32Z

@oobabooga yes I saw, really appreciate you guys carrying that! I'll probably port over those release workflows and modify them so they name the release based on the version and backend tag ie v0.2.55-cu121 or something like that.

It would be great if we can reduce the number of builds per release though.

ElliottDyson · 2024-03-27T17:20:27Z

Any chance intel arc binaries could also be added? Thanks

ElliottDyson · 2024-04-03T07:43:02Z

Any chance intel arc binaries could also be added? Thanks

With Intel's new IPEX-LLM release this is no longer necessary. A pointer to this however might be useful as it isn't well advertised.

* Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name

* feat: add support for KV cache quantization options (abetlen#1307) * add KV cache quantization options abetlen#1220 abetlen#1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <[email protected]> * fix: Changed local API doc references to hosted (abetlen#1317) * chore: Bump version * fix: last tokens passing to sample_repetition_penalties function (abetlen#1295) Co-authored-by: ymikhaylov <[email protected]> Co-authored-by: Andrei <[email protected]> * feat: Update llama.cpp * fix: segfault when logits_all=False. Closes abetlen#1319 * feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (abetlen#1247) * Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name * feat: Update llama.cpp * chore: Bump version * fix(ci): use correct script name * docs: LLAMA_CUBLAS -> LLAMA_CUDA * docs: Add docs explaining how to install pre-built wheels. * docs: Rename cuBLAS section to CUDA * fix(docs): incorrect tool_choice example (abetlen#1330) * feat: Update llama.cpp * fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 abetlen#1314 * fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 Closes abetlen#1314 * feat: Update llama.cpp * fix: Always embed metal library. Closes abetlen#1332 * feat: Update llama.cpp * chore: Bump version --------- Co-authored-by: Limour <[email protected]> Co-authored-by: Andrei Betlen <[email protected]> Co-authored-by: lawfordp2017 <[email protected]> Co-authored-by: Yuri Mikhailov <[email protected]> Co-authored-by: ymikhaylov <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name

Generate binary wheel index on release

3dd2616

abetlen added 3 commits March 4, 2024 12:42

Add total release downloads badge

5c501c0

Update download label

af42fb7

Use official cibuildwheel action

22bc1e8

acon96 mentioned this pull request Mar 20, 2024

Add llama-cpp-python home-assistant/wheels-custom-integrations#686

Closed

abetlen added 2 commits April 3, 2024 01:10

Merge branch 'main' into binary-wheels

79de495

Add workflows to build CUDA and Metal wheels

cdf7be7

abetlen added 3 commits April 3, 2024 14:58

Update generate index workflow

3fcfa8b

Update workflow name

6f72de1

Merge branch 'main' into binary-wheels

b5374e9

abetlen marked this pull request as ready for review April 3, 2024 19:31

abetlen merged commit 5a930ee into main Apr 3, 2024
16 checks passed

abetlen mentioned this pull request Apr 4, 2024

Publish all wheels to PyPI #741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary wheels #1247

Binary wheels #1247

abetlen commented Mar 3, 2024 •

edited

Loading

oobabooga commented Mar 4, 2024

abetlen commented Mar 4, 2024 •

edited

Loading

ElliottDyson commented Mar 27, 2024

ElliottDyson commented Apr 3, 2024

Binary wheels #1247

Binary wheels #1247

Conversation

abetlen commented Mar 3, 2024 • edited Loading

Usage (won't work until merge)

TODO

oobabooga commented Mar 4, 2024

abetlen commented Mar 4, 2024 • edited Loading

ElliottDyson commented Mar 27, 2024

ElliottDyson commented Apr 3, 2024

abetlen commented Mar 3, 2024 •

edited

Loading

abetlen commented Mar 4, 2024 •

edited

Loading