llama cpp #17457

Freed-Wu · 2023-07-19T22:37:35Z

new package: llama-cpp

packages/llama-cpp/build.sh

Freed-Wu · 2023-07-20T16:00:40Z

Can we provide a subpackage to support openCL? Is it named llama-cpp-opencl.subpackage.sh? or create a new llama-cpp-opencl/build.sh?

truboxl · 2023-07-21T08:24:30Z

Preferably single package that enable multiple features

Has resolved.

ghost · 2023-07-22T14:59:10Z

Can we provide a subpackage to support openCL? Is it named llama-cpp-opencl.subpackage.sh? or create a new llama-cpp-opencl/build.sh?

Packages to enable OpenCL for llama.cpp are: ocl-icd opencl-headers opencl-clhpp

CLBlast is required:

$HOME
git clone https://github.com/CNugteren/CLBlast

Build and install CLBlast:

cd CLBlast
cmake -B build \
  -DBUILD_SHARED_LIBS=OFF \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=/data/data/com.termux/files/usr
cd build
make -j8
make install

I dunno if you needed this information, but it would be nice if Termux simply handled the whole process.

Thank you. Edit: In case it's needed, here's building instruction:

CPU:

$HOME
cd llama.cpp
cmake -B build
cd build
cmake --build . --config Release

GPU(OpenCL):

$HOME
cd llama.cpp
cmake -B build -DLLAMA_CLBLAST=ON
cd build
cmake --build . --config Release

It's notable that a model loaded from the ~/storage/downloads folder is signifcantly slower compared to loading it from the $HOME path.

packages/llama-cpp/build.sh

Freed-Wu · 2023-07-24T06:18:41Z

but it would be nice if Termux simply handled the whole process.

Related PRs: #17482, #17468

It's notable that a model loaded from the ~/storage/downloads folder is signifcantly slower compared to loading it from the $HOME path.

It is an expected behaviour, not a bug. /sdcard/Downloads (~/storage/download) is in a partition different from /data/data/com.termux/files/usr, so load model from there should be slow.

This version is not valid

Related issues: ggerganov/llama.cpp#2292

Compiled deb files:

llama-cpp-opencl_0.0.0-r854-fff0e0e-0_aarch64.deb.zip
clblast_1.6.1_aarch64.deb.zip
llama-cpp_0.0.0-r854-fff0e0e-0_aarch64.deb.zip

In my mobile, it cannot work, I guess /system/vendor/lib64 has some libraries effect the program?

$ LD_LIBRARY_PATH="/system/vendor/lib64" clinfo -l
Platform #0: QUALCOMM Snapdragon(TM)
`-- Device #0: QUALCOMM Adreno(TM)
$ LD_LIBRARY_PATH="/system/vendor/lib64" llama -i -ins --color -t $(nproc) --prompt-cache $PREFIX/tmp/prompt-cache -c 2048 --numa -m ~/ggml-model-q4_0.bin -ngl 1
main: build = 854 (fff0e0e)
main: seed  = 1690178858
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM)'
ggml_opencl: device FP16 support: true
llama.cpp: loading model from /data/data/com.termux/files/home/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 49954
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 5258.03 MB (+ 1026.00 MB per state)
llama_model_load_internal: offloading 1 repeating layers to GPU
llama_model_load_internal: offloaded 1/33 layers to GPU
llama_model_load_internal: total VRAM used: 109 MB
llama_new_context_with_model: kv self size  = 1024.00 MB

system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/data/data/com.termux/files/usr/tmp/prompt-cache'
main: session file does not exist, will create
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 CLBlast: OpenCL error: clEnqueueNDRangeKernel: -54
GGML_ASSERT: /data/data/com.termux/files/home/.termux-build/llama-cpp-opencl/src/ggml-opencl.cpp:1747: false
zsh: abort      LD_LIBRARY_PATH="/system/vendor/lib64" llama -i -ins --color -t $(nproc)   -c

Freed-Wu · 2023-07-24T06:53:12Z

In my mobile, it cannot work

This bug is similar as ggerganov/llama.cpp#2341:

GGML_ASSERT: /build/wnslnw6pk8d4c8k0b8w4w4qz45wgy9hw-source/ggml-opencl.cpp:1524: false

Have reported here

licy183 · 2023-08-28T13:43:34Z

Hi @Freed-Wu, can you test whether this package works fine when you are free? Thanks!

ghost · 2023-08-28T14:44:10Z

I figured this must be merged before it's available in Termux. Is there some simple way to try this?

TomJo2000 · 2023-08-28T14:58:07Z

I figured this must be merged before it's available in Termux. Is there some simple way to try this?

You can download the .deb corresponding with your CPU architecture from the CI run and install it locally using apt install /path/to/file.deb

The CI artifacts are packed into a .tar.zip compressed archive by GitHub Actions, so you will need to unzip and untar it first.

ghost · 2023-08-28T15:52:14Z

It's functioning for sure. If it were up to me then I'd suggest changing the way llama.cpp is built, or maybe add variation on build method, for example:

The package builds llama.cpp with OpenBlas, which is fine, it works, but I've noticed building without it has higher performance. There's a case where OpenBlas is fastest for absorbing large amounts of text, but I think it's an edge-case. For my device, Samsung s10+, make is actually the fastest compared to OpenBlas, or even CLBlast currently. Maybe there's cmake optimizations that I'm missing, either way, OpenBlas lowers performance.

It appears llama behaves as main in llama.cpp, which is neat. It worked in my test, and other features are available, perplexity, quantize, ect. But, how to use server?

Thank you.

truboxl · 2023-08-28T16:28:31Z

~ $ llama
main: build = 0 (unknown)
main: seed  = 1693239998
libc: Fatal signal 4 (SIGILL), code 2 (ILL_ILLOPN), fault addr 0x7f21f77763d3 in tid 89 (llama), pid 89 (llama)
Illegal instruction
~ $ llama-bench
libc: Fatal signal 4 (SIGILL), code 2 (ILL_ILLOPN), fault addr 0x7fa8e40563d3 in tid 115 (llama-bench), pid 115 (llama-bench)
Illegal instruction
~ $ llama-server
libc: Fatal signal 4 (SIGILL), code 2 (ILL_ILLOPN), fault addr 0x7f775e04e3d3 in tid 118 (llama-server), pid 118 (llama-server)
Illegal instruction

termux-docker x86_64

packages/llama-cpp/build.sh

ghost · 2023-08-28T21:54:41Z

Still working for me. The speed decrease for OpenBlas is significant: almost a full token/per second. Perhaps it should be a seperate build, like CLBlast.

server example run llama-server -m ~/Vicuna-7b.Q4_0.gguf -c 2048 -t 3 -b 7:

How can he 7 sentence?! (kidding)

truboxl · 2023-09-22T02:00:10Z

Please rebase this PR to the latest version

Freed-Wu requested review from finagolfin and Grimler91 as code owners July 19, 2023 22:37

Freed-Wu force-pushed the llama-cpp branch 2 times, most recently from 3771685 to ae2057a Compare July 19, 2023 23:03

licy183 previously requested changes Jul 20, 2023

View reviewed changes

packages/llama-cpp/build.sh Outdated Show resolved Hide resolved

licy183 reviewed Jul 20, 2023

View reviewed changes

packages/llama-cpp/build.sh Outdated Show resolved Hide resolved

Freed-Wu force-pushed the llama-cpp branch from ae2057a to 8ef96f2 Compare July 20, 2023 12:18

Freed-Wu mentioned this pull request Jul 20, 2023

Use date as version ggerganov/llama.cpp#2292

Closed

licy183 force-pushed the llama-cpp branch from 8ef96f2 to efaa0f6 Compare July 20, 2023 14:55

truboxl requested changes Jul 22, 2023

View reviewed changes

packages/llama-cpp/build.sh Show resolved Hide resolved

termux deleted a comment from TGSMLM Jul 22, 2023

licy183 force-pushed the llama-cpp branch from efaa0f6 to 5ae0f93 Compare August 27, 2023 16:26

licy183 requested a review from truboxl August 27, 2023 16:27

truboxl approved these changes Aug 27, 2023

View reviewed changes

truboxl self-requested a review August 28, 2023 16:17

truboxl reviewed Aug 28, 2023

View reviewed changes

packages/llama-cpp/build.sh Outdated Show resolved Hide resolved

Freed-Wu force-pushed the llama-cpp branch from 5ae0f93 to 96ee3dc Compare August 28, 2023 18:43

Freed-Wu force-pushed the llama-cpp branch from 96ee3dc to 5cfb0e4 Compare September 9, 2023 16:58

Freed-Wu mentioned this pull request Mar 3, 2024

new package: llama-cpp-opencl #17468

Open

new package: llama-cpp

5a842a6

TomJo2000 force-pushed the llama-cpp branch from 5cfb0e4 to 5a842a6 Compare April 4, 2024 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama cpp #17457

llama cpp #17457

Freed-Wu commented Jul 19, 2023 •

edited

Loading

Freed-Wu commented Jul 20, 2023

truboxl commented Jul 21, 2023

ghost commented Jul 22, 2023 •

edited by ghost

Loading

Freed-Wu commented Jul 24, 2023 •

edited

Loading

Freed-Wu commented Jul 24, 2023 •

edited

Loading

licy183 commented Aug 28, 2023

ghost commented Aug 28, 2023

TomJo2000 commented Aug 28, 2023 •

edited

Loading

ghost commented Aug 28, 2023

truboxl commented Aug 28, 2023 •

edited

Loading

ghost commented Aug 28, 2023

truboxl commented Sep 22, 2023

llama cpp #17457

Are you sure you want to change the base?

llama cpp #17457

Conversation

Freed-Wu commented Jul 19, 2023 • edited Loading

Freed-Wu commented Jul 20, 2023

truboxl commented Jul 21, 2023

ghost commented Jul 22, 2023 • edited by ghost Loading

Freed-Wu commented Jul 24, 2023 • edited Loading

Freed-Wu commented Jul 24, 2023 • edited Loading

licy183 commented Aug 28, 2023

ghost commented Aug 28, 2023

TomJo2000 commented Aug 28, 2023 • edited Loading

ghost commented Aug 28, 2023

truboxl commented Aug 28, 2023 • edited Loading

ghost commented Aug 28, 2023

truboxl commented Sep 22, 2023

Freed-Wu commented Jul 19, 2023 •

edited

Loading

ghost commented Jul 22, 2023 •

edited by ghost

Loading

Freed-Wu commented Jul 24, 2023 •

edited

Loading

Freed-Wu commented Jul 24, 2023 •

edited

Loading

TomJo2000 commented Aug 28, 2023 •

edited

Loading

truboxl commented Aug 28, 2023 •

edited

Loading