Skip to content

build question newbie #345

@gilbrotheraway

Description

@gilbrotheraway

hello, i just found this repo and I'm getting incredible performance on my rock5b SBC

i saw some build flags flowing around like
DGGML_NATIVE=1
OpenMP
flax-vectors-something
tinyblas

I'm wondering what they do and if I'm missing any other one to squeeze even more performance

here are some quick numbers i got

k_llama.cpp$
user@rock-5b:/srv/dev-disk-by-uuid-0444eaaf-0405-4373-ad45-74f5ca64d1df/fast/github/ik_llama.cpp$ ./build/bin/llama-bench -m models/bitnet1582b4t-iq2_bn.gguf  -m models/bitnet1582b4t-iq2_bn_r4.gguf -m models/deepcogito_cogito-v1-preview-llama-3B-IQ4_NL.gguf -m models/deepcogito_cogito-v1-preview-llama-3B-Q4_0.gguf -m models/deepcogito_cogito-v1-preview-llama-3B-Q4_K_M.gguf -m models/deepcogito_cogito-v1-preview-llama-3B-Q4_K_S.gguf -p 64,128,256,512,1024 -n 64,128,256,512,1024 -t 4 -rtr 1
| model                          |       size |     params | backend
   | threads | rtr |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --: | ------------: | ---------------: |
============ Repacked 211 tensors
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |          pp64 |    318.86 ± 6.89 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |         pp128 |    238.43 ± 0.36 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |         pp256 |    158.87 ± 0.16 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |         pp512 |     98.19 ± 0.11 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |        pp1024 |     70.59 ± 0.04 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |          tg64 |    161.93 ± 0.04 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |         tg128 |    150.32 ± 0.47 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |         tg256 |    131.80 ± 0.06 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |         tg512 |    106.54 ± 0.03 |
| bitnet-25 2B IQ2_BN - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B | CPU        |       4 |   1 |        tg1024 |     74.70 ± 0.08 |
============ Repacked 1 tensors
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |          pp64 |    318.16 ± 0.97 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |         pp128 |    236.25 ± 1.11 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |         pp256 |    157.40 ± 0.17 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |         pp512 |     97.44 ± 0.10 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |        pp1024 |     70.36 ± 0.04 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |          tg64 |    162.03 ± 0.04 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |         tg128 |    150.46 ± 0.04 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |         tg256 |    131.58 ± 1.27 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |         tg512 |    106.38 ± 0.22 |
| bitnet-25 2B IQ2_BN_R4 - 2.00 bpw Bitnet | 934.16 MiB |     2.74 B
| CPU        |       4 |   1 |        tg1024 |     74.93 ± 0.03 |
============ Repacked 197 tensors
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |          pp64 |    312.00 ± 0.70 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |         pp128 |    228.23 ± 0.85 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |         pp256 |    150.19 ± 0.27 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |         pp512 |     90.48 ± 0.15 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |        pp1024 |     64.53 ± 0.04 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |          tg64 |    170.81 ± 0.05 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |         tg128 |    155.30 ± 0.03 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |         tg256 |    130.97 ± 0.09 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |         tg512 |     96.60 ± 0.17 |
| llama ?B IQ4_NL - 4.5 bpw      |   1.98 GiB |     3.61 B | CPU
   |       4 |   1 |        tg1024 |     59.32 ± 0.03 |
============ Repacked 194 tensors
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |          pp64 |    142.40 ± 0.18 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         pp128 |    122.02 ± 0.12 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         pp256 |     95.33 ± 0.11 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         pp512 |     67.30 ± 0.08 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |        pp1024 |     51.75 ± 0.03 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |          tg64 |    101.11 ± 0.05 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         tg128 |     95.60 ± 0.01 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         tg256 |     84.97 ± 0.02 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         tg512 |     69.57 ± 0.06 |
| llama ?B Q4_0                  |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |        tg1024 |     48.06 ± 0.03 |
============ Repacked 197 tensors
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |          pp64 |    309.64 ± 0.78 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |         pp128 |    227.22 ± 1.13 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |         pp256 |    149.46 ± 0.34 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |         pp512 |     90.10 ± 0.12 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |        pp1024 |     64.23 ± 0.05 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |          tg64 |    164.21 ± 0.07 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |         tg128 |    149.79 ± 0.07 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |         tg256 |    125.76 ± 0.06 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |         tg512 |     94.72 ± 0.08 |
| llama ?B Q4_K - Medium         |   2.08 GiB |     3.61 B | CPU
   |       4 |   1 |        tg1024 |     58.99 ± 0.07 |
============ Repacked 197 tensors
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |          pp64 |    310.07 ± 1.15 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         pp128 |    226.93 ± 0.88 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         pp256 |    149.10 ± 0.58 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         pp512 |     90.04 ± 0.12 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |        pp1024 |     64.23 ± 0.05 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |          tg64 |    164.18 ± 0.04 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         tg128 |    150.28 ± 0.07 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         tg256 |    125.84 ± 0.04 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |         tg512 |     94.57 ± 0.12 |
| llama ?B Q4_K - Small          |   1.99 GiB |     3.61 B | CPU
   |       4 |   1 |        tg1024 |     58.67 ± 0.05 |
build: c9eec172 (3644)

8B

build/bin/llama-bench -m models/deepcogito_cogito-v1-preview-llama-8B-IQ4_NL.gguf -p 64,128,256,512 -n
64,128,256,512 -t 4 -rtr 1
| model                          |       size |     params | backend
   | threads | rtr |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --: | ------------: | ---------------: |
============ Repacked 225 tensors
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |          pp64 |    183.79 ± 3.47 |
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |         pp128 |    139.43 ± 0.79 |
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |         pp256 |     94.39 ± 0.20 |
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |         pp512 |     57.99 ± 0.04 |
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |          tg64 |    110.81 ± 0.03 |
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |         tg128 |    100.95 ± 0.03 |
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |         tg256 |     85.88 ± 0.10 |
| llama 8B IQ4_NL - 4.5 bpw      |   4.35 GiB |     8.03 B | CPU
   |       4 |   1 |         tg512 |     65.49 ± 0.03 |

this is like 2000% improvement

Thank you very much 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions