Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++ gguf reader into cmake, add python gguf reader example #5216

Merged
merged 6 commits into from
Feb 13, 2024

Conversation

cmp-nct
Copy link
Contributor

@cmp-nct cmp-nct commented Jan 30, 2024

The existing gguf example is not present in the cmake examples, now it can be compiled.
It appears to work, though the data print won't work.

In addition I added a reader.py which prints information about a gguf file.
I'm sure it is far from perfect or complete, but provides a base to start with.

Example output:

python reader.py  ggml-model-q4k.gguf
Key-Value Pairs:
GGUF.version                               : [3]
GGUF.tensor_count                          : [611]
GGUF.kv_count                              : [19]
general.architecture                       : [105 110 116 101 114 110 108 109  50]
general.name                               : [ 73 110 116 101 114 110  76  77  50]
internlm2.context_length                   : [32768]
internlm2.block_count                      : [32]
internlm2.embedding_length                 : [4096]
internlm2.feed_forward_length              : [14336]
internlm2.rope.freq_base                   : [1000000.]
internlm2.attention.head_count             : [32]
internlm2.attention.layer_norm_rms_epsilon : [1.e-05]
internlm2.attention.head_count_kv          : [8]
tokenizer.ggml.model                       : [108 108  97 109  97]
tokenizer.ggml.tokens                      : [ 60 117 110 107  62]
tokenizer.ggml.scores                      : [0.]
tokenizer.ggml.token_type                  : [2]
tokenizer.ggml.bos_token_id                : [1]
tokenizer.ggml.eos_token_id                : [2]
tokenizer.ggml.padding_token_id            : [2]
general.quantization_version               : [2]
general.file_type                          : [15]
----
Tensors:
Tensor Name                    | Shape: Shape           | Size: Size         | Quantization: Quantization
--------------------------------------------------------------------------------
token_embd.weight              | Shape: 4096x92544      | Size: 379060224    | Quantization: Q4_K
blk.0.attn_q.weight            | Shape: 4096x4096       | Size: 16777216     | Quantization: Q4_K
blk.0.attn_k.weight            | Shape: 4096x1024       | Size: 4194304      | Quantization: Q4_K
blk.0.attn_v.weight            | Shape: 4096x1024       | Size: 4194304      | Quantization: Q6_K
blk.0.attn_qkv_lora_a.weight   | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.attn_qkv_lora_b.weight   | Shape: 256x6144        | Size: 1572864      | Quantization: Q4_K
blk.0.attn_output.weight       | Shape: 4096x4096       | Size: 16777216     | Quantization: Q4_K
blk.0.attn_out_lora_a.weight   | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.attn_out_lora_b.weight   | Shape: 256x4096        | Size: 1048576      | Quantization: Q4_K
blk.0.ffn_gate.weight          | Shape: 4096x14336      | Size: 58720256     | Quantization: Q4_K
blk.0.ffn_gate_lora_a.weight   | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.ffn_gate_lora_b.weight   | Shape: 256x14336       | Size: 3670016      | Quantization: Q4_K
blk.0.ffn_up.weight            | Shape: 4096x14336      | Size: 58720256     | Quantization: Q4_K
blk.0.ffn_up_lora_a.weight     | Shape: 4096x256        | Size: 1048576      | Quantization: Q4_K
blk.0.ffn_up_lora_b.weight     | Shape: 256x14336       | Size: 3670016      | Quantization: Q4_K
blk.0.ffn_down.weight          | Shape: 14336x4096      | Size: 58720256     | Quantization: Q6_K
blk.0.ffn_down_lora_a.weight   | Shape: 14336x256       | Size: 3670016      | Quantization: Q6_K
blk.0.ffn_down_lora_b.weight   | Shape: 256x4096        | Size: 1048576      | Quantization: Q6_K
blk.0.attn_norm.weight         | Shape: 4096            | Size: 4096         | Quantization: F32
...

@cmp-nct cmp-nct marked this pull request as ready for review January 30, 2024 14:20
@cmp-nct
Copy link
Contributor Author

cmp-nct commented Feb 13, 2024

@ggerganov let's merge it in ? it's useful
I'm not sure how the procedure for merges is on github

@ggerganov
Copy link
Owner

It does not pass the Lint check

@cmp-nct
Copy link
Contributor Author

cmp-nct commented Feb 13, 2024

It does not pass the Lint check

solved:)

@ggerganov ggerganov merged commit 6c00a06 into ggerganov:master Feb 13, 2024
50 of 54 checks passed
@0cc4m
Copy link
Collaborator

0cc4m commented Feb 13, 2024

This PR breaks Vulkan building with a number of linker errors.

» cmake --build build_vulkan -- -j32
-- Vulkan found
-- ccache found, compilation results will be cached. Disable with LLAMA_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.1s)
-- Generating done (0.1s)
-- Build files have been written to: /media/veryhighspeed/koboldai/llama.cpp/build_vulkan
[  0%] Generating build details from Git
-- Found Git: /usr/bin/git (found version "2.43.0")
[  1%] Built target ggml-vulkan
[  4%] Built target ggml
[  5%] Built target ggml_static
[  6%] Linking CXX executable ../../bin/gguf
[  8%] Built target llama
[  9%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 10%] Built target test-c
[ 11%] Built target llava
[ 11%] Built target build_info
[ 12%] Built target llava_static
[ 13%] Linking CXX executable ../../bin/quantize-stats
[ 13%] Linking CXX executable ../../bin/benchmark
[ 14%] Linking CXX executable ../../bin/quantize
[ 15%] Linking CXX static library libcommon.a
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o: in function `ggml_init':
ggml.c:(.text+0x20042): undefined reference to `ggml_vk_init_cpu_assist'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o: in function `ggml_graph_compute_thread':
ggml.c:(.text+0x37ad9): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37c19): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37c49): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37d04): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x37d87): undefined reference to `ggml_vk_compute_forward_cpu_assist'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o:ggml.c:(.text+0x37dbe): more undefined references to `ggml_vk_compute_forward_cpu_assist' follow
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml.c.o: in function `ggml_graph_compute':
ggml.c:(.text+0x3c2cc): undefined reference to `ggml_vk_preallocate_buffers_graph_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x3c2d7): undefined reference to `ggml_vk_preallocate_buffers_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x3c307): undefined reference to `ggml_vk_build_graph_cpu_assist'
/usr/bin/ld: ggml.c:(.text+0x3c4ec): undefined reference to `ggml_vk_graph_cleanup_cpu_assist'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o: in function `ggml_backend_reg_get_count':
ggml-backend.c:(.text+0x2556): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o: in function `ggml_backend_reg_find_by_name':
ggml-backend.c:(.text+0x2662): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o: in function `ggml_backend_reg_init_backend_from_str':
ggml-backend.c:(.text+0x281f): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ggml-backend.c:(.text+0x28b6): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ggml-backend.c:(.text+0x294d): undefined reference to `ggml_backend_vk_reg_devices'
/usr/bin/ld: ../../CMakeFiles/ggml.dir/ggml-backend.c.o:ggml-backend.c:(.text+0x2a86): more undefined references to `ggml_backend_vk_reg_devices' follow
collect2: error: ld returned 1 exit status
make[2]: *** [examples/gguf/CMakeFiles/gguf.dir/build.make:106: bin/gguf] Error 1
make[1]: *** [CMakeFiles/Makefile2:2819: examples/gguf/CMakeFiles/gguf.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 19%] Built target common
[ 20%] Built target quantize
[ 21%] Built target benchmark
[ 22%] Built target quantize-stats
make: *** [Makefile:146: all] Error 2

@cmp-nct
Copy link
Contributor Author

cmp-nct commented Feb 13, 2024

Maybe that's why the gguf example lay dormant, or it was too long dormant.
Can we just not include it into vulkan by a cmake flag ?

Adding the gguf into cmake is a super minor thing, I'm sorry if that broke something unexpected.
If the solution isn't straight forward we should either revert the cmake addition or exclude it for incompatible platforms/libraries

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 13, 2024

Maybe that's why the gguf example lay dormant, or it was too long dormant. Can we just not include it into vulkan by a cmake flag ?

Not sure, this might be an issue on the Vulkan side, since it doesn't seem to happen with other backends. The Vulkan CMake code is a little different from the other backends, and that's the cause of this issue.

@netrunnereve If I revert #5230 , this issue is resolved for me. But that was problematic on Windows. Any ideas?

@ggerganov
Copy link
Owner

Maybe we can change the Vulkan objects to be part of ggml lib similar to other backends.

Instead of having separate ggml-vulkan target:

add_library(ggml-vulkan OBJECT ggml-vulkan.cpp ggml-vulkan.h)

Add it here:

llama.cpp/CMakeLists.txt

Lines 1024 to 1042 in 03bf161

add_library(ggml OBJECT
ggml.c
ggml.h
ggml-alloc.c
ggml-alloc.h
ggml-backend.c
ggml-backend.h
ggml-quants.c
ggml-quants.h
${GGML_SOURCES_CUDA} ${GGML_HEADERS_CUDA}
${GGML_SOURCES_OPENCL} ${GGML_HEADERS_OPENCL}
${GGML_SOURCES_METAL} ${GGML_HEADERS_METAL}
${GGML_SOURCES_MPI} ${GGML_HEADERS_MPI}
${GGML_SOURCES_EXTRA} ${GGML_HEADERS_EXTRA}
${GGML_SOURCES_SYCL} ${GGML_HEADERS_SYCL}
${GGML_SOURCES_KOMPUTE} ${GGML_HEADERS_KOMPUTE}
)

@sorasoras
Copy link

sorasoras commented Feb 14, 2024

This also break Rocm build on windows

 cmake --build . --config Release
[9/47] Linking CXX executable bin\gguf.exe
FAILED: bin/gguf.exe
cmd.exe /C "cd . && C:\PROGRA~1\AMD\ROCm\5.7\bin\CLANG_~1.EXE -fuse-ld=lld-link -nostartfiles -nostdlib -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Xlinker /subsystem:console CMakeFiles/ggml.dir/ggml.c.obj CMakeFiles/ggml.dir/ggml-alloc.c.obj CMakeFiles/ggml.dir/ggml-backend.c.obj CMakeFiles/ggml.dir/ggml-quants.c.obj examples/gguf/CMakeFiles/gguf.dir/gguf.cpp.obj -o bin\gguf.exe -Xlinker /MANIFEST:EMBED -Xlinker /implib:examples\gguf\gguf.lib -Xlinker /pdb:bin\gguf.pdb -Xlinker /version:0.0   --hip-link  --offload-arch=gfx1100  --offload-arch=gfx1031  "C:/Program Files/AMD/ROCm/5.7/lib/hipblas.lib"  "C:/Program Files/AMD/ROCm/5.7/lib/rocblas.lib"  "C:/Program Files/AMD/ROCm/5.7/lib/clang/17.0.0/lib/windows/clang_rt.builtins-x86_64.lib"  "C:/Program Files/AMD/ROCm/5.7/lib/amdhip64.lib"  -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -loldnames  && cd ."
lld-link: error: undefined symbol: ggml_init_cublas
>>> referenced by CMakeFiles/ggml.dir/ggml.c.obj:(ggml_init)

lld-link: error: undefined symbol: ggml_cuda_compute_forward
>>> referenced by CMakeFiles/ggml.dir/ggml.c.obj:(ggml_compute_forward)

lld-link: error: undefined symbol: ggml_backend_cuda_reg_devices
>>> referenced by CMakeFiles/ggml.dir/ggml-backend.c.obj:(ggml_backend_reg_get_count)
>>> referenced by CMakeFiles/ggml.dir/ggml-backend.c.obj:(ggml_backend_reg_find_by_name)
>>> referenced by CMakeFiles/ggml.dir/ggml-backend.c.obj:(ggml_backend_reg_init_backend_from_str)
>>> referenced 5 more times
CLANG_~1: error: linker command failed with exit code 1 (use -v to see invocation)
[42/47] Linking CXX executable bin\train-text-from-scratch.exe
ninja: build stopped: subcommand failed.

@cmp-nct

@kurnevsky
Copy link
Contributor

This also break Rocm build on windows

Not only windows - nix is also broken.

@netrunnereve
Copy link
Collaborator

Maybe that's why the gguf example lay dormant, or it was too long dormant. Can we just not include it into vulkan by a cmake flag ?

Not sure, this might be an issue on the Vulkan side, since it doesn't seem to happen with other backends. The Vulkan CMake code is a little different from the other backends, and that's the cause of this issue.

@netrunnereve If I revert #5230 , this issue is resolved for me. But that was problematic on Windows. Any ideas?

I think we may have to do what @ggerganov suggested and move everything to the ggml lib rather than creating a seperate ggml-vulkan object (you can test his patch at #5525). As @sorasoras mentioned the ROCM build is also failing, and ROCM is built using a seperate object just like Vulkan.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* Update CMakeLists.txt

* Create reader.py

* Update reader.py

* Update reader.py

another whitespace :|

* Update reader.py

* lintlintlint
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Update CMakeLists.txt

* Create reader.py

* Update reader.py

* Update reader.py

another whitespace :|

* Update reader.py

* lintlintlint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants