-
Notifications
You must be signed in to change notification settings - Fork 961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary wheels #1247
Binary wheels #1247
Conversation
I have been building llama-cpp-python wheels for each new release in my fork of jllllll's repository: https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions It has some minor changes to avoid hitting rate limit errors. It also builds for all Mac versions (I think jllllll removed Mac 11 due to an error that used to happen but doesn't anymore). |
@oobabooga yes I saw, really appreciate you guys carrying that! I'll probably port over those release workflows and modify them so they name the release based on the version and backend tag ie It would be great if we can reduce the number of builds per release though. |
Any chance intel arc binaries could also be added? Thanks |
With Intel's new IPEX-LLM release this is no longer necessary. A pointer to this however might be useful as it isn't well advertised. |
* Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name
* feat: add support for KV cache quantization options (abetlen#1307) * add KV cache quantization options abetlen#1220 abetlen#1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <[email protected]> * fix: Changed local API doc references to hosted (abetlen#1317) * chore: Bump version * fix: last tokens passing to sample_repetition_penalties function (abetlen#1295) Co-authored-by: ymikhaylov <[email protected]> Co-authored-by: Andrei <[email protected]> * feat: Update llama.cpp * fix: segfault when logits_all=False. Closes abetlen#1319 * feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (abetlen#1247) * Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name * feat: Update llama.cpp * chore: Bump version * fix(ci): use correct script name * docs: LLAMA_CUBLAS -> LLAMA_CUDA * docs: Add docs explaining how to install pre-built wheels. * docs: Rename cuBLAS section to CUDA * fix(docs): incorrect tool_choice example (abetlen#1330) * feat: Update llama.cpp * fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 abetlen#1314 * fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 Closes abetlen#1314 * feat: Update llama.cpp * fix: Always embed metal library. Closes abetlen#1332 * feat: Update llama.cpp * chore: Bump version --------- Co-authored-by: Limour <[email protected]> Co-authored-by: Andrei Betlen <[email protected]> Co-authored-by: lawfordp2017 <[email protected]> Co-authored-by: Yuri Mikhailov <[email protected]> Co-authored-by: ymikhaylov <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
* Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name
Adds initial support for releasing binary wheels. Uses github-pages to publish one index per backend (CPU, Metal, CUDA, etc) to a static site.
Planned release tags for this PR (because I can test these):
cpu
cu121
cu122
cu123
metal
Usage (won't work until merge)
CPU
Metal
CUDA (12.1)
TODO
Related: #1178, #243 and #741