Add C++ gguf reader into cmake, add python gguf reader example #5216

cmp-nct · 2024-01-30T14:11:28Z

The existing gguf example is not present in the cmake examples, now it can be compiled.
It appears to work, though the data print won't work.

In addition I added a reader.py which prints information about a gguf file.
I'm sure it is far from perfect or complete, but provides a base to start with.

Example output:

python reader.py  ggml-model-q4k.gguf
Key-Value Pairs:
GGUF.version                               : [3]
GGUF.tensor_count                          : [611]
GGUF.kv_count                              : [19]
general.architecture                       : [105 110 116 101 114 110 108 109  50]
general.name                               : [ 73 110 116 101 114 110  76  77  50]
internlm2.context_length                   : [32768]
internlm2.block_count                      : [32]
internlm2.embedding_length                 : [4096]
internlm2.feed_forward_length              : [14336]
internlm2.rope.freq_base                   : [1000000.]
internlm2.attention.head_count             : [32]
internlm2.attention.layer_norm_rms_epsilon : [1.e-05]
internlm2.attention.head_count_kv          : [8]
tokenizer.ggml.model                       : [108 108  97 109  97]
tokenizer.ggml.tokens                      : [ 60 117 110 107  62]
tokenizer.ggml.scores                      : [0.]
tokenizer.ggml.token_type                  : [2]
tokenizer.ggml.bos_token_id                : [1]
tokenizer.ggml.eos_token_id                : [2]
tokenizer.ggml.padding_token_id            : [2]
general.quantization_version               : [2]
general.file_type                          : [15]
----
Tensors:
Tensor Name                    | Shape: Shape           | Size: Size         | Quantization: Quantization
--------------------------------------------------------------------------------
token_embd.weight              | Shape: 4096x92544      | Size: 379060224    | Quantization: Q4_K
blk.0.attn_q.weight            | Shape: 4096x4096       | Size: 16777216     | Quantization: Q4_K
blk.0.attn_k.weight            | Shape: 4096x1024       | Size: 4194304      | Quantization: Q4_K
blk.0.attn_v.weight            | Shape: 4096x1024       | Size: 4194304      | Quantization: Q6_K
blk.0.attn_qkv_lora_a.weight   | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.attn_qkv_lora_b.weight   | Shape: 256x6144        | Size: 1572864      | Quantization: Q4_K
blk.0.attn_output.weight       | Shape: 4096x4096       | Size: 16777216     | Quantization: Q4_K
blk.0.attn_out_lora_a.weight   | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.attn_out_lora_b.weight   | Shape: 256x4096        | Size: 1048576      | Quantization: Q4_K
blk.0.ffn_gate.weight          | Shape: 4096x14336      | Size: 58720256     | Quantization: Q4_K
blk.0.ffn_gate_lora_a.weight   | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.ffn_gate_lora_b.weight   | Shape: 256x14336       | Size: 3670016      | Quantization: Q4_K
blk.0.ffn_up.weight            | Shape: 4096x14336      | Size: 58720256     | Quantization: Q4_K
blk.0.ffn_up_lora_a.weight     | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.ffn_up_lora_b.weight     | Shape: 256x14336       | Size: 3670016      | Quantization: Q4_K
blk.0.ffn_down.weight          | Shape: 14336x4096      | Size: 58720256     | Quantization: Q6_K
blk.0.ffn_down_lora_a.weight   | Shape: 14336x256       | Size: 3670016      | Quantization: Q6_K
blk.0.ffn_down_lora_b.weight   | Shape: 256x4096        | Size: 1048576      | Quantization: Q6_K
blk.0.attn_norm.weight         | Shape: 4096            | Size: 4096         | Quantization: F32
...

another whitespace :|

cmp-nct · 2024-02-13T01:56:57Z

@ggerganov let's merge it in ? it's useful
I'm not sure how the procedure for merges is on github

ggerganov · 2024-02-13T07:26:43Z

It does not pass the Lint check

cmp-nct · 2024-02-13T17:49:59Z

It does not pass the Lint check

solved:)

0cc4m · 2024-02-13T20:52:56Z

This PR breaks Vulkan building with a number of linker errors.

» cmake --build build_vulkan -- -j32
-- Vulkan found
-- ccache found, compilation results will be cached. Disable with LLAMA_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.1s)
-- Generating done (0.1s)
-- Build files have been written to: /media/veryhighspeed/koboldai/llama.cpp/build_vulkan
[  0%] Generating build details from Git
-- Found Git: /usr/bin/git (found version "2.43.0")
[  1%] Built target ggml-vulkan
[  4%] Built target ggml
[  5%] Built target ggml_static
[  6%] Linking CXX executable ../../bin/gguf
[  8%] Built target llama
[  9%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 10%] Built target test-c
[ 11%] Built target llava
[ 11%] Built target build_info
[ 12%] Built target llava_static
[ 13%] Linking CXX executable ../../bin/quantize-stats
[ 13%] Linking CXX executable ../../bin/benchmark
[ 14%] Linking CXX executable ../../bin/quantize
[ 15%] Linking CXX static library libcommon.a
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o: in function `ggml_init':
ggml.c:(.text+0x20042): undefined reference to `ggml_vk_init_cpu_assist'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o: in function `ggml_graph_compute_thread':
ggml.c:(.text+0x37ad9): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37c19): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37c49): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37d04): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37d87): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o:ggml.c:(.text+0x37dbe): more undefined references to `ggml_vk_compute_forward_cpu_assist' follow
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o: in function `ggml_graph_compute':
ggml.c:(.text+0x3c2cc): undefined reference to `ggml_vk_preallocate_buffers_graph_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x3c2d7): undefined reference to `ggml_vk_preallocate_buffers_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x3c307): undefined reference to `ggml_vk_build_graph_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x3c4ec): undefined reference to `ggml_vk_graph_cleanup_cpu_assist'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o: in function `ggml_backend_reg_get_count':
ggml-backend.c:(.text+0x2556): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o: in function `ggml_backend_reg_find_by_name':
ggml-backend.c:(.text+0x2662): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o: in function `ggml_backend_reg_init_backend_from_str':
ggml-backend.c:(.text+0x281f): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ggml-backend.c:(.text+0x28b6): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ggml-backend.c:(.text+0x294d): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o:ggml-backend.c:(.text+0x2a86): more undefined references to `ggml_backend_vk_reg_devices' follow
collect2: error: ld returned 1 exit status
make[2]: *** [examples/gguf/CMakeFiles/gguf.dir/build.make:106: bin/gguf] Error 1
make[1]: *** [CMakeFiles/Makefile2:2819: examples/gguf/CMakeFiles/gguf.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 19%] Built target common
[ 20%] Built target quantize
[ 21%] Built target benchmark
[ 22%] Built target quantize-stats
make: *** [Makefile:146: all] Error 2

cmp-nct · 2024-02-13T21:13:24Z

Maybe that's why the gguf example lay dormant, or it was too long dormant.
Can we just not include it into vulkan by a cmake flag ?

Adding the gguf into cmake is a super minor thing, I'm sorry if that broke something unexpected.
If the solution isn't straight forward we should either revert the cmake addition or exclude it for incompatible platforms/libraries

0cc4m · 2024-02-13T21:23:06Z

Maybe that's why the gguf example lay dormant, or it was too long dormant. Can we just not include it into vulkan by a cmake flag ?

Not sure, this might be an issue on the Vulkan side, since it doesn't seem to happen with other backends. The Vulkan CMake code is a little different from the other backends, and that's the cause of this issue.

@netrunnereve If I revert #5230 , this issue is resolved for me. But that was problematic on Windows. Any ideas?

ggerganov · 2024-02-14T08:29:43Z

Maybe we can change the Vulkan objects to be part of ggml lib similar to other backends.

Instead of having separate ggml-vulkan target:

https://github.com/ggerganov/llama.cpp/blob/03bf161eb6dea6400ee49c6dc6b69bdcfa9fd3fc/CMakeLists.txt#L430

Add it here:

https://github.com/ggerganov/llama.cpp/blob/03bf161eb6dea6400ee49c6dc6b69bdcfa9fd3fc/CMakeLists.txt#L1024-L1042

sorasoras · 2024-02-14T09:11:22Z

This also break Rocm build on windows

 cmake --build . --config Release
[9/47] Linking CXX executable bin\gguf.exe
FAILED: bin/gguf.exe
cmd.exe /C "cd . && C:\PROGRA~1\AMD\ROCm\5.7\bin\CLANG_~1.EXE -fuse-ld=lld-link -nostartfiles -nostdlib -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Xlinker /subsystem:console CMakeFiles/ggml.dir/ggml.c.obj CMakeFiles/ggml.dir/ggml-alloc.c.obj CMakeFiles/ggml.dir/ggml-backend.c.obj CMakeFiles/ggml.dir/ggml-quants.c.obj examples/gguf/CMakeFiles/gguf.dir/gguf.cpp.obj -o bin\gguf.exe -Xlinker /MANIFEST:EMBED -Xlinker /implib:examples\gguf\gguf.lib -Xlinker /pdb:bin\gguf.pdb -Xlinker /version:0.0   --hip-link  --offload-arch=gfx1100  --offload-arch=gfx1031  "C:/Program Files/AMD/ROCm/5.7/lib/hipblas.lib"  "C:/Program Files/AMD/ROCm/5.7/lib/rocblas.lib"  "C:/Program Files/AMD/ROCm/5.7/lib/clang/17.0.0/lib/windows/clang_rt.builtins-x86_64.lib"  "C:/Program Files/AMD/ROCm/5.7/lib/amdhip64.lib"  -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -loldnames  && cd ."
lld-link: error: undefined symbol: ggml_init_cublas
>>> referenced by CMakeFiles/ggml.dir/ggml.c.obj:(ggml_init)

lld-link: error: undefined symbol: ggml_cuda_compute_forward
>>> referenced by CMakeFiles/ggml.dir/ggml.c.obj:(ggml_compute_forward)

lld-link: error: undefined symbol: ggml_backend_cuda_reg_devices
>>> referenced by CMakeFiles/ggml.dir/ggml-backend.c.obj:(ggml_backend_reg_get_count)
>>> referenced by CMakeFiles/ggml.dir/ggml-backend.c.obj:(ggml_backend_reg_find_by_name)
>>> referenced by CMakeFiles/ggml.dir/ggml-backend.c.obj:(ggml_backend_reg_init_backend_from_str)
>>> referenced 5 more times
CLANG_~1: error: linker command failed with exit code 1 (use -v to see invocation)
[42/47] Linking CXX executable bin\train-text-from-scratch.exe
ninja: build stopped: subcommand failed.

@cmp-nct

kurnevsky · 2024-02-14T16:21:09Z

This also break Rocm build on windows

Not only windows - nix is also broken.

netrunnereve · 2024-02-16T16:26:29Z

Maybe that's why the gguf example lay dormant, or it was too long dormant. Can we just not include it into vulkan by a cmake flag ?

Not sure, this might be an issue on the Vulkan side, since it doesn't seem to happen with other backends. The Vulkan CMake code is a little different from the other backends, and that's the cause of this issue.

@netrunnereve If I revert #5230 , this issue is resolved for me. But that was problematic on Windows. Any ideas?

I think we may have to do what @ggerganov suggested and move everything to the ggml lib rather than creating a seperate ggml-vulkan object (you can test his patch at #5525). As @sorasoras mentioned the ROCM build is also failing, and ROCM is built using a seperate object just like Vulkan.

* Update CMakeLists.txt * Create reader.py * Update reader.py * Update reader.py another whitespace :| * Update reader.py * lintlintlint

cmp-nct added 2 commits January 30, 2024 14:54

Update CMakeLists.txt

2fc88aa

Create reader.py

ccd4e4c

cmp-nct marked this pull request as ready for review January 30, 2024 14:20

cmp-nct added 2 commits January 30, 2024 15:54

Update reader.py

e86dff5

Update reader.py

5960879

another whitespace :|

ggerganov approved these changes Jan 30, 2024

View reviewed changes

cmp-nct added 2 commits February 13, 2024 15:08

Update reader.py

c8c2e95

lintlintlint

5315a5a

ggerganov merged commit 6c00a06 into ggml-org:master Feb 13, 2024
50 of 54 checks passed

ggerganov mentioned this pull request Feb 16, 2024

Update ggml_sycl_op_mul_mat_vec_q #5502

Merged

abhilash1910 mentioned this pull request Feb 16, 2024

Attempt fix vulkan build #5522

Closed

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

gguf : add python reader example (ggml-org#5216)

c67193e

* Update CMakeLists.txt * Create reader.py * Update reader.py * Update reader.py another whitespace :| * Update reader.py * lintlintlint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C++ gguf reader into cmake, add python gguf reader example #5216

Add C++ gguf reader into cmake, add python gguf reader example #5216

cmp-nct commented Jan 30, 2024

cmp-nct commented Feb 13, 2024

ggerganov commented Feb 13, 2024

cmp-nct commented Feb 13, 2024

0cc4m commented Feb 13, 2024

cmp-nct commented Feb 13, 2024 •

edited

Loading

0cc4m commented Feb 13, 2024

ggerganov commented Feb 14, 2024

sorasoras commented Feb 14, 2024 •

edited

Loading

kurnevsky commented Feb 14, 2024

netrunnereve commented Feb 16, 2024

Add C++ gguf reader into cmake, add python gguf reader example #5216

Add C++ gguf reader into cmake, add python gguf reader example #5216

Conversation

cmp-nct commented Jan 30, 2024

cmp-nct commented Feb 13, 2024

ggerganov commented Feb 13, 2024

cmp-nct commented Feb 13, 2024

0cc4m commented Feb 13, 2024

cmp-nct commented Feb 13, 2024 • edited Loading

0cc4m commented Feb 13, 2024

ggerganov commented Feb 14, 2024

sorasoras commented Feb 14, 2024 • edited Loading

kurnevsky commented Feb 14, 2024

netrunnereve commented Feb 16, 2024

cmp-nct commented Feb 13, 2024 •

edited

Loading

sorasoras commented Feb 14, 2024 •

edited

Loading