IQ5_K_R4 #149

ikawrakow · 2024-12-18T12:29:07Z

Adding IQ5_K with 4 interleaved rows.

We get very signifiant performance gains on ARM_NEON and more modest gains on AVX2/Zen4.

Here is PP-512 for LLaMA-3.1-8B on Zen4 (Ryzen-7950X), ARM_NEON (M2-Max) and AVX2 (Ryzen-5975WX)

Platform	Threads	IQ5_K	IQ5_K_R4	Speedup
ARM_NEON	8	53.80 ± 1.08	93.33 ± 2.02	1.735
Zen4	16	168.09 ± 0.58	230.23 ± 0.23	1.370
AVX2	32	177.16 ± 0.31	231.50 ± 0.43	1.307

TG does not look good on AVX2/Zen4. On ARM_NEON we get a decent performance gain.
Here results for TG-128 on LLaMA-3.1-8B with different numbers of threads:

Platform	Threads	IQ5_K	IQ5_K_R4	Speedup
ARM_NEON	2	5.92 ± 0.07	6.98 ± 0.00	1.179
	4	11.53 ± 0.01	13.35 ± 0.01	1.158
	8	20.29 ± 0.46	21.17 ± 0.18	1.043

Much slower than the others.

But TG is still slower than iq5_k

saood06 · 2025-03-27T06:53:47Z

TG does not look good on AVX2/Zen4

Does this mean regression compared to non-interleaved or just no benefit?

ikawrakow · 2025-03-27T07:08:50Z

I don't remember. But the "Better Zen4" commit in the PR says "But TG is still slower than iq5_k".

Iwan Kawrakow added 7 commits December 18, 2024 09:17

iq5_k_r4: Zen4

5eac4ed

Much slower than the others.

iq5_k_r5: WIP

27e987d

Minor

f1b37f6

iq5_k_r4: fix AVX2 nrc_y = 1 case

a8b0f81

iq5_k_r4: better Zen4

ece6d0a

But TG is still slower than iq5_k

iq5_k_r4: slightly better AVX2

f9d182f

iq5_k_r4: NEON

6c935be

ikawrakow merged commit 59d742b into main Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

IQ5_K_R4 #149

IQ5_K_R4 #149

Uh oh!

ikawrakow commented Dec 18, 2024

Uh oh!

saood06 commented Mar 27, 2025

Uh oh!

ikawrakow commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

IQ5_K_R4 #149

IQ5_K_R4 #149

Uh oh!

Conversation

ikawrakow commented Dec 18, 2024

Uh oh!

saood06 commented Mar 27, 2025

Uh oh!

ikawrakow commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants