Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary wheels #1247

Merged
merged 9 commits into from
Apr 3, 2024
Merged

Binary wheels #1247

merged 9 commits into from
Apr 3, 2024

Conversation

abetlen
Copy link
Owner

@abetlen abetlen commented Mar 3, 2024

Adds initial support for releasing binary wheels. Uses github-pages to publish one index per backend (CPU, Metal, CUDA, etc) to a static site.

Planned release tags for this PR (because I can test these):

  • cpu
  • cu121
  • cu122
  • cu123
  • metal

Usage (won't work until merge)

CPU

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Metal

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal

CUDA (12.1)

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

TODO

Related: #1178, #243 and #741

@oobabooga
Copy link
Contributor

I have been building llama-cpp-python wheels for each new release in my fork of jllllll's repository:

https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions

It has some minor changes to avoid hitting rate limit errors. It also builds for all Mac versions (I think jllllll removed Mac 11 due to an error that used to happen but doesn't anymore).

@abetlen
Copy link
Owner Author

abetlen commented Mar 4, 2024

@oobabooga yes I saw, really appreciate you guys carrying that! I'll probably port over those release workflows and modify them so they name the release based on the version and backend tag ie v0.2.55-cu121 or something like that.

It would be great if we can reduce the number of builds per release though.

@ElliottDyson
Copy link

Any chance intel arc binaries could also be added? Thanks

@ElliottDyson
Copy link

Any chance intel arc binaries could also be added? Thanks

With Intel's new IPEX-LLM release this is no longer necessary. A pointer to this however might be useful as it isn't well advertised.

@abetlen abetlen marked this pull request as ready for review April 3, 2024 19:31
@abetlen abetlen merged commit 5a930ee into main Apr 3, 2024
16 checks passed
xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 6, 2024
* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name
xhedit added a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 6, 2024
* feat: add support for KV cache quantization options (abetlen#1307)

* add KV cache quantization options

abetlen#1220
abetlen#1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <[email protected]>

* fix: Changed local API doc references to hosted (abetlen#1317)

* chore: Bump version

* fix: last tokens passing to sample_repetition_penalties function (abetlen#1295)

Co-authored-by: ymikhaylov <[email protected]>
Co-authored-by: Andrei <[email protected]>

* feat: Update llama.cpp

* fix: segfault when logits_all=False. Closes abetlen#1319

* feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (abetlen#1247)

* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name

* feat: Update llama.cpp

* chore: Bump version

* fix(ci): use correct script name

* docs: LLAMA_CUBLAS -> LLAMA_CUDA

* docs: Add docs explaining how to install pre-built wheels.

* docs: Rename cuBLAS section to CUDA

* fix(docs): incorrect tool_choice example (abetlen#1330)

* feat: Update llama.cpp

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 abetlen#1314

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 Closes abetlen#1314

* feat: Update llama.cpp

* fix: Always embed metal library. Closes abetlen#1332

* feat: Update llama.cpp

* chore: Bump version

---------

Co-authored-by: Limour <[email protected]>
Co-authored-by: Andrei Betlen <[email protected]>
Co-authored-by: lawfordp2017 <[email protected]>
Co-authored-by: Yuri Mikhailov <[email protected]>
Co-authored-by: ymikhaylov <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 30, 2024
* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants