IQ4_K_R4 #138

ikawrakow · 2024-12-12T14:12:05Z

On to R4 implementation of the new iqk quants.

First IQ4_K

We get very signifiant performance gains on ARM_NEON and more modest gains on AVX2/Zen4. I suspect my AVX2/Zen4 implementation is not optimum, but I did not see a better way for now.

Here is PP-512 for LLaMA-3.1-8B on Zen4 (Ryzen-7950X), ARM_NEON (M2-Max) and AVX2 (Ryzen-5975WX)

Platform	Threads	IQ4_K	IQ4_K_R4	Speedup
ARM_NEON	8	58.20 ± 1.03	108.02 ± 1.10	1.856
Zen4	16	182.20 ± 0.38	232.63 ± 0.39	1.277
AVX2	32	206.43 ± 0.49	227.60 ± 0.46	1.103

We get decent performance gains for TG as well.
Here results for TG-128 on LLaMA-3.1-8B with different numbers of threads:

Platform	Threads	Q2_K_S	Q2_K_R4	Speedup
ARM_NEON	2	8.44 ± 0.02	10.56 ± 0.01	1.251
	4	15.90 ± 0.05	19.32 ± 0.14	1.215
	8	24.54 ± 0.15	25.16 ± 0.03	1.025
Zen4	1	5.26 ± 0.00	6.73 ± 0.00	1.279
	2	9.71 ± 0.01	12.43 ± 0.00	1.269
	4	13.48 ± 0.06	14.00 ± 0.03	1.039
AVX2	2	4.02 ± 0.00	6.91 ± 0.00	1.719
	4	8.03 ± 0.00	11.13 ± 0.00	1.386
	8	11.81 ± 0.00	12.75 ± 0.00	1.079

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

On Zen4 we get PP-512(LLaMA-3.1-8B) = 232.6 t/s, up from 182.2 t/s for iq4_k. Applying the extra shift costs a ~6 performance penalty.

PP-512 = 227.60 t/s. The shifts are really costly.

We get PP-512(LLaMA-3.1-8B) = 108 t/s, up from 58.2 t/s for iq4_k.

Iwan Kawrakow added 4 commits December 12, 2024 11:00

iq4_k_r4: WIP

fb79167

iq4_k_r4: Zen4 and hopefully AVX2

ba9a9a1

On Zen4 we get PP-512(LLaMA-3.1-8B) = 232.6 t/s, up from 182.2 t/s for iq4_k. Applying the extra shift costs a ~6 performance penalty.

iq4_k_r4: AVX2

6b505ec

PP-512 = 227.60 t/s. The shifts are really costly.

iq4_k_r4: NEON

78c8453

We get PP-512(LLaMA-3.1-8B) = 108 t/s, up from 58.2 t/s for iq4_k.

ikawrakow merged commit 2700d3a into main Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

IQ4_K_R4 #138

IQ4_K_R4 #138

Uh oh!

ikawrakow commented Dec 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

IQ4_K_R4 #138

IQ4_K_R4 #138

Uh oh!

Conversation

ikawrakow commented Dec 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants