Improved IQ1_M quantization #327

ikawrakow · 2025-04-13T05:36:00Z

I was experimenting with LlaMA-4-Scout quantization and was bothered by the extremely long quantization time of IQ1_M, so looked into speeding things up.

This PR improves IQ1_M quantization speed by a huge margin. There is also a minor improvement in quantization accuracy.

The table shows PPL comparisons between the main branch and this PR for LLaMA-v1-7B¹(L1-7B in the table), LLaMA-v2-7B¹ (L2-7B), Mistral-7B¹ (M-7B), LLaMA-3.1-8B-Instruct (L3-8B), and DeepSeek-V2-Lite (DSL). Context is always 512 tokens. Also given are the quantization times (Q-time for short in the table) in seconds on a Ryzen-7950X CPU. Unlike earlier quantization improvement PRs, which used "pure" quantization (--pure command line option in llama-quantize), tested is the default IQ1_M quantization mix.

Model	Quantization	PPL (main)	PPL (this PR)	Q-time (main)	Q-time (this PR)
L1-7B	IQ1_M	10.9274	10.8046	N/A²	N/A²
L2-7B	IQ1_M	10.7642	10.6809	129.4	52.8
M-7B	IQ1_M	9.6336	9.6236	146.1	58.4
L3-8B	IQ1_M	22.7422	21.9715	148.1	60.0
DSL	IQ1_M	9.2758	9.1137	267.4	109.2

Speedup for the default IQ1_M quantization mix is in the range of 2.5X. When quantizing pure IQ1_M, the speedup is about 3X.

¹ Why use such ancient models? The LLaMA-v1 models were the basis for k-quants development. I-quants were developed using LLaMA-v1, LLaMA-v2 and Mistral-7B. In my experience, if a quantization technique does well on all 3 of these, it is (almost) guaranteed to do well on any other model out there.

² I have this model on an old HDD. In this case quantization time is dominated by the time needed to read the data from the HDD. I could have copied the model to the SSD drive, but I think the timing for the other models gives enough indication of the relative performance.

Iwan Kawrakow added 3 commits April 12, 2025 18:55

Much faster and it looks like better iq1_m quantiation

514637e

Cleanup

2977369

Minor

4291d7e

ikawrakow merged commit d210661 into main Apr 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved IQ1_M quantization #327

Improved IQ1_M quantization #327

Uh oh!

ikawrakow commented Apr 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improved IQ1_M quantization #327

Improved IQ1_M quantization #327

Uh oh!

Conversation

ikawrakow commented Apr 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants