Add llama.cpp GPU offload option #2060

AlphaAtlas · 2023-05-14T18:21:05Z

requires #2058

Splits models between GPU and CPU, see: ggml-org/llama.cpp#1412

AlphaAtlas · 2023-05-14T18:51:04Z

Also, this should probably be tested on GPU-less llama builds (I can do this tonight), and a note about installing the GPU version of llama.cpp should be added: https://pypi.org/project/llama-cpp-python/

AlphaAtlas · 2023-05-14T19:08:07Z

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

Malrama · 2023-05-14T19:13:39Z

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

AlphaAtlas · 2023-05-14T19:49:25Z

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try

pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

llama-cpp-python won't rebuild if its in pip's cache.

mayaeary

A few suggestions.

modules/llamacpp_model.py

modules/shared.py

Co-authored-by: Maya <[email protected]>

Malrama · 2023-05-14T22:13:32Z

Now it's working! Perfect. Omg. This is brilliant!

james-s-tayler · 2023-05-15T00:04:49Z

This is miracle-tier.

oobabooga · 2023-05-15T01:57:57Z

It works for me. Not as fast as the old cuda branch gptq-for-llama yet but several times faster than cpu-only.

Documentation on the additional installation steps that are required:

https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-offloading

skatardude10 · 2023-05-15T04:15:58Z

For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1

to set those environment variables for the session, and then I was able to run successfully

pip install llama-cpp-python

LiliumSancta · 2023-05-15T12:39:00Z

It works for me. Not as fast as the old cuda branch gptq-for-llama yet but several times faster than cpu-only.

Documentation on the additional installation steps that are required:

https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-offloading

llama.cpp is my way to run 13B and larger models on my 8GB GPU, but there is a big difference in speed between running within text-generation-webui and running llama.cpp natively. I believe there is something wrong with the speed using the llama-cpp-python api, I don't know if it's something in the API itself or in the implementation, but in some cases the performance can be less than half of the original... See abetlen/llama-cpp-python#181

MNPS8 · 2023-05-15T22:40:24Z

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try
pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
llama-cpp-python won't rebuild if its in pip's cache.

I'm sorry, but i still didn't understand. I tried this command both in prompt and in python and it gives an error with "CMAKE_ARGS" as unrecognized.

Where or how is the code "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" supposed to be inserted?

For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed
set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1
to set those environment variables for the session, and then I was able to run successfully

pip install llama-cpp-python

I also tried this solution, but i can't find a micromamba-cmd.bat in all the folder.

I tried everything, both llama-cpp-python and LLAMA_CUBLAS were successfully installed, yet i can't offload anything to the gpu, i write the command line but the program seems to ignore it. Any help or hint would be appreciated, thank you.

Thireus · 2023-05-17T22:17:55Z

Does anyone have some performance benchmarks? I'm getting 5 tokens/s with q5_1 Vicuna 13B.

AlphaAtlas · 2023-05-18T00:45:35Z

@Thireus See: #2088

I havent actually gotten around to following up on this yet :P

mlbrnm · 2023-06-15T02:35:51Z

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try
pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
llama-cpp-python won't rebuild if its in pip's cache.
I'm sorry, but i still didn't understand. I tried this command both in prompt and in python and it gives an error with "CMAKE_ARGS" as unrecognized.

Where or how is the code "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" supposed to be inserted?
For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed
set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1
to set those environment variables for the session, and then I was able to run successfully
pip install llama-cpp-python
I also tried this solution, but i can't find a micromamba-cmd.bat in all the folder.

I tried everything, both llama-cpp-python and LLAMA_CUBLAS were successfully installed, yet i can't offload anything to the gpu, i write the command line but the program seems to ignore it. Any help or hint would be appreciated, thank you.

Did you ever find a solution to this? I am in the exact same position.

shikage · 2023-12-21T17:08:09Z

Come to think of it, maybe this should be fully automated? Like if an Nvidia GPU and build reqs are present, build llama cublast, otherwise build llama clblast if possible, otherwise default to the regular pip binary.

How do I do that on Windows? I have updated to 0.150 and everything is working fine but it doesn't seem to have cublast enabled. I tried using "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" but I am not 100% sure how I use this command in PowerShell. Do you know? :)

Try
pip cache remove llama-cpp-python
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
llama-cpp-python won't rebuild if its in pip's cache.
I'm sorry, but i still didn't understand. I tried this command both in prompt and in python and it gives an error with "CMAKE_ARGS" as unrecognized.
Where or how is the code "CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python" supposed to be inserted?
For anyone on the One-Click installer on Windows, I had to open a visual studio developer command prompt with build tools installed (typed cmake to make sure I had it), navigate to my Oobabooga directory and run micromamba-cmd.bat to enter the venv, and then typed
set CMAKE_ARGS="-DLLAMA_CUBLAS=on" 
set FORCE_CMAKE=1
to set those environment variables for the session, and then I was able to run successfully
pip install llama-cpp-python
I also tried this solution, but i can't find a micromamba-cmd.bat in all the folder.
I tried everything, both llama-cpp-python and LLAMA_CUBLAS were successfully installed, yet i can't offload anything to the gpu, i write the command line but the program seems to ignore it. Any help or hint would be appreciated, thank you.
Did you ever find a solution to this? I am in the exact same position.

You may be on windows, in which case try the approach from this earlier comment:
#2060 (comment)

AlphaAtlas added 2 commits May 14, 2023 14:13

Add gpu layers to llamacpp_model.py

3d56218

Add argument to shared.py

6c7dea2

mayaeary reviewed May 14, 2023

View reviewed changes

modules/llamacpp_model.py Outdated Show resolved Hide resolved

modules/shared.py Outdated Show resolved Hide resolved

AlphaAtlas and others added 2 commits May 14, 2023 17:45

Change argument in modules/llamacpp_model.py

cd6138d

Co-authored-by: Maya <[email protected]>

Change argument in modules/shared.py

61b058c

Co-authored-by: Maya <[email protected]>

Add docs

06e79d0

oobabooga merged commit 071f077 into oobabooga:main May 15, 2023

Add llama.cpp GPU offload option #2060

Add llama.cpp GPU offload option #2060

Conversation

AlphaAtlas commented May 14, 2023

Uh oh!

AlphaAtlas commented May 14, 2023

Uh oh!

AlphaAtlas commented May 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Malrama commented May 14, 2023

Uh oh!

AlphaAtlas commented May 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayaeary left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Malrama commented May 14, 2023

Uh oh!

james-s-tayler commented May 15, 2023

Uh oh!

oobabooga commented May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skatardude10 commented May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LiliumSancta commented May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MNPS8 commented May 15, 2023

Uh oh!

Thireus commented May 17, 2023

Uh oh!

AlphaAtlas commented May 18, 2023

Uh oh!

mlbrnm commented Jun 15, 2023

Uh oh!

shikage commented Dec 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

AlphaAtlas commented May 14, 2023 •

edited

Loading

AlphaAtlas commented May 14, 2023 •

edited

Loading

oobabooga commented May 15, 2023 •

edited

Loading

skatardude10 commented May 15, 2023 •

edited

Loading

LiliumSancta commented May 15, 2023 •

edited

Loading