Skip to content

Conversation

@AlphaAtlas
Copy link
Contributor

requires #2058

Splits models between GPU and CPU, see: ggml-org/llama.cpp#1412

@AlphaAtlas
Copy link
Contributor Author

Also, this should probably be tested on GPU-less llama builds (I can do this tonight), and a note about installing the GPU version of llama.cpp should be added: https://pypi.org/project/llama-cpp-python/

@AlphaAtlas
Copy link
Contributor Author

AlphaAtlas commented May 14, 2023

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

@Malrama
Copy link

Malrama commented May 14, 2023

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

@AlphaAtlas
Copy link
Contributor Author

AlphaAtlas commented May 14, 2023

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try

pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

llama-cpp-python won't rebuild if its in pip's cache.

Copy link
Contributor

@mayaeary mayaeary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions.

@Malrama
Copy link

Malrama commented May 14, 2023

Now it's working! Perfect. Omg. This is brilliant!

@james-s-tayler
Copy link

This is miracle-tier.

@oobabooga
Copy link
Owner

oobabooga commented May 15, 2023

It works for me. Not as fast as the old cuda branch gptq-for-llama yet but several times faster than cpu-only.

Documentation on the additional installation steps that are required:

https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-offloading

@oobabooga oobabooga merged commit 071f077 into oobabooga:main May 15, 2023
@skatardude10
Copy link

skatardude10 commented May 15, 2023

For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1

to set those environment variables for the session, and then I was able to run successfully

pip install llama-cpp-python

@LiliumSancta
Copy link

LiliumSancta commented May 15, 2023

It works for me. Not as fast as the old cuda branch gptq-for-llama yet but several times faster than cpu-only.

Documentation on the additional installation steps that are required:

https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-offloading

llama.cpp is my way to run 13B and larger models on my 8GB GPU, but there is a big difference in speed between running within text-generation-webui and running llama.cpp natively. I believe there is something wrong with the speed using the llama-cpp-python api, I don't know if it's something in the API itself or in the implementation, but in some cases the performance can be less than half of the original... See abetlen/llama-cpp-python#181

@MNPS8
Copy link

MNPS8 commented May 15, 2023

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try

pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

llama-cpp-python won't rebuild if its in pip's cache.

I'm sorry, but i still didn't understand. I tried this command both in prompt and in python and it gives an error with "CMAKE_ARGS" as unrecognized.

Where or how is the code "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" supposed to be inserted?

For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1

to set those environment variables for the session, and then I was able to run successfully

pip install llama-cpp-python

I also tried this solution, but i can't find a micromamba-cmd.bat in all the folder.

I tried everything, both llama-cpp-python and LLAMA_CUBLAS were successfully installed, yet i can't offload anything to the gpu, i write the command line but the program seems to ignore it. Any help or hint would be appreciated, thank you.

@Thireus
Copy link
Contributor

Thireus commented May 17, 2023

Does anyone have some performance benchmarks? I'm getting 5 tokens/s with q5_1 Vicuna 13B.

@AlphaAtlas
Copy link
Contributor Author

@Thireus See: #2088

I havent actually gotten around to following up on this yet :P

@mlbrnm
Copy link

mlbrnm commented Jun 15, 2023

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try

pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

llama-cpp-python won't rebuild if its in pip's cache.

I'm sorry, but i still didn't understand. I tried this command both in prompt and in python and it gives an error with "CMAKE_ARGS" as unrecognized.

Where or how is the code "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" supposed to be inserted?

For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1

to set those environment variables for the session, and then I was able to run successfully
pip install llama-cpp-python

I also tried this solution, but i can't find a micromamba-cmd.bat in all the folder.

I tried everything, both llama-cpp-python and LLAMA_CUBLAS were successfully installed, yet i can't offload anything to the gpu, i write the command line but the program seems to ignore it. Any help or hint would be appreciated, thank you.

Did you ever find a solution to this? I am in the exact same position.

@shikage
Copy link

shikage commented Dec 21, 2023

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try

pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

llama-cpp-python won't rebuild if its in pip's cache.

I'm sorry, but i still didn't understand. I tried this command both in prompt and in python and it gives an error with "CMAKE_ARGS" as unrecognized.
Where or how is the code "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" supposed to be inserted?

For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1

to set those environment variables for the session, and then I was able to run successfully
pip install llama-cpp-python

I also tried this solution, but i can't find a micromamba-cmd.bat in all the folder.
I tried everything, both llama-cpp-python and LLAMA_CUBLAS were successfully installed, yet i can't offload anything to the gpu, i write the command line but the program seems to ignore it. Any help or hint would be appreciated, thank you.

Did you ever find a solution to this? I am in the exact same position.

You may be on windows, in which case try the approach from this earlier comment:
#2060 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.