IQ1_S: much faster CPU prompt processing #517

ikawrakow · 2025-06-11T12:00:36Z

This PR is a follow up of #515 and #516, and applies the same technique to IQ1_S. We see nearly 2X increase in prompt processing speed compared to IQ1_S and `IQ1_S_R4.

Sweep-bench for IQ1_S quantization of LlaMA-3.1-8B on a Ryzen-7950X CPU:

IQ1_S, main branch

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	3.272	156.47	4.605	27.79
512	128	512	3.351	152.77	5.092	25.14
512	128	1024	3.402	150.52	5.084	25.18
512	128	1536	3.677	139.25	5.201	24.61
512	128	2048	3.586	142.79	5.515	23.21

IQ1_S_R4, main branch

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	3.101	165.10	4.543	28.18
512	128	512	3.166	161.74	4.836	26.47
512	128	1024	3.309	154.75	5.282	24.23
512	128	1536	3.348	152.92	5.093	25.13
512	128	2048	3.447	148.55	5.265	24.31

IQ1_S, PR

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	1.855	275.94	4.643	27.57
512	128	512	1.940	263.87	5.056	25.32
512	128	1024	2.188	234.05	5.099	25.10
512	128	1536	2.097	244.20	5.112	25.04
512	128	2048	2.184	234.42	5.368	23.85

TG is slightly faster too - 24.4 vs 23.1 t/s on the Ryzen-5975WX

Faster iq1_s GEMM via repacking to Q8_0_R8

3d56720

TG is slightly faster too - 24.4 vs 23.1 t/s on the Ryzen-5975WX

ikawrakow merged commit 3f54b49 into main Jun 11, 2025

ikawrakow mentioned this pull request Jun 11, 2025

IQ3_S: much faster CPU prompt processing #518

Merged

ciprianveg mentioned this pull request Jun 12, 2025

Bug: tg speed drop after https://github.com/ikawrakow/ik_llama.cpp/pull/518 #523

Closed

This was referenced Jun 12, 2025

Faster CPU prompt processing for Q4_K and Q5_K #525

Merged

Much faster CPU prompt processing (part 1) #531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

IQ1_S: much faster CPU prompt processing #517

IQ1_S: much faster CPU prompt processing #517

Uh oh!

ikawrakow commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

IQ1_S: much faster CPU prompt processing #517

IQ1_S: much faster CPU prompt processing #517

Uh oh!

Conversation

ikawrakow commented Jun 11, 2025

IQ1_S, main branch

IQ1_S_R4, main branch

IQ1_S, PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant