-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
Description
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
If the bug concerns the server, please try to reproduce it first using the server test scenario framework.
[river@drfxi bin]$ ./benchmark
main: build = 2252 (525213d2)
main: built with clang version 17.0.6 (Fedora 17.0.6-6.fc40) for x86_64-redhat-linux-gnu
Starting Test
Allocating Memory of size 800194560 bytes, 763 MB
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
Device 0: AMD Radeon RX 7800 XT, compute capability 11.0, VMM: no
Creating new tensors
------ Test 1 - Matrix Mult via F32 code
n_threads=1
m11: type = 0 ( f32) ne = 11008 x 4096 x 1, nb = ( 4, 44032, 180355072) - Sum of tensor m11 is 45088768.00
m2: type = 0 ( f32) ne = 11008 x 128 x 1, nb = ( 4, 44032, 5636096) - Sum of tensor m2 is 2818048.00
gf->nodes[0]: type = 0 ( f32) ne = 4096 x 128 x 1, nb = ( 4, 16384, 2097152) - Sum of tensor gf->nodes[0] is 11542724608.00
------ Test 2 - Matrix Mult via q4_1 code
n_threads=1
Matrix Multiplication of (11008,4096,1) x (11008,128,1) - about 11.54 gFLOPS
Iteration;NThreads; SizeX; SizeY; SizeZ; Required_FLOPS; Elapsed_u_Seconds; gigaFLOPS
=====================================================================================
0; 1; 11008; 4096; 128; 11542724608; 6695; 1724.08
ABORT - ERROR in Matrix Multiplication result - expected 11542724608.00, got 4294967296.00 (delta 7247757312.00 > allowed_delta 11542.72)