Skip to content

ggml-cpu: optimize avx2 q6_k#22345

Merged
ggerganov merged 1 commit into
ggml-org:masterfrom
netrunnereve:q6_k
Apr 26, 2026
Merged

ggml-cpu: optimize avx2 q6_k#22345
ggerganov merged 1 commit into
ggml-org:masterfrom
netrunnereve:q6_k

Conversation

@netrunnereve
Copy link
Copy Markdown
Collaborator

Basically I took the optimizations I did for AVX a while back and brought them over to AVX2.

PR:

model size params backend threads test t/s
llama 1B Q6_K 860.86 MiB 1.10 B CPU 4 pp512 63.15 ± 0.34
llama 1B Q6_K 860.86 MiB 1.10 B CPU 4 tg128 15.63 ± 0.08
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  345 runs -  3408.30 us/run - 117.44 MFLOP/run -  34.46 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  315 runs -  3435.15 us/run - 234.88 MFLOP/run -  68.38 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  207 runs -  4928.92 us/run - 352.32 MFLOP/run -  71.48 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  252 runs -  4204.82 us/run - 469.76 MFLOP/run - 111.72 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  210 runs -  5041.41 us/run - 587.20 MFLOP/run - 116.48 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  144 runs -  7309.56 us/run - 939.52 MFLOP/run - 128.53 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  3 runs - 417778.33 us/run -  60.13 GFLOP/run - 143.93 GFLOPS

Master:

model size params backend threads test t/s
llama 1B Q6_K 860.86 MiB 1.10 B CPU 4 pp512 49.75 ± 0.40
llama 1B Q6_K 860.86 MiB 1.10 B CPU 4 tg128 15.53 ± 0.19
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  276 runs -  4364.08 us/run - 117.44 MFLOP/run -  26.91 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  280 runs -  3598.24 us/run - 234.88 MFLOP/run -  65.28 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  207 runs -  5086.99 us/run - 352.32 MFLOP/run -  69.26 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  180 runs -  5805.74 us/run - 469.76 MFLOP/run -  80.91 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  168 runs -  6138.51 us/run - 587.20 MFLOP/run -  95.66 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  108 runs -  9657.79 us/run - 939.52 MFLOP/run -  97.28 GFLOPS
  MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                  2 runs - 594931.50 us/run -  60.13 GFLOP/run - 101.07 GFLOPS

Requirements

@netrunnereve netrunnereve requested a review from ggerganov as a code owner April 25, 2026 02:25
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 25, 2026
@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 25, 2026
@ggerganov ggerganov merged commit 2dd8416 into ggml-org:master Apr 26, 2026
43 of 46 checks passed
@netrunnereve netrunnereve deleted the q6_k branch April 28, 2026 20:41
IntelNav pushed a commit to IntelNav/llama.cpp that referenced this pull request Apr 29, 2026
IntelNav pushed a commit to IntelNav/llama.cpp that referenced this pull request Apr 29, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants