Add importance matrix support for legacy quants? #4932

ikawrakow · 2024-01-14T12:38:44Z

I have the implementation ready, but I'm not sure if this is what we want. Use of an importance matrix does improve perplexity for all models I have tried. But on the other hand the "legacy" ggml quants Q4_0 and Q5_0 are never very good, but they are also never really bad (Q4_1 and Q5_1 have more erratic behavior, for some models being better than Q4_0/Q5_0 and for other models being worse). Hence, one may want to preserve them the way they are as a kind of reference.

Opinions?

The text was updated successfully, but these errors were encountered:

abc-nix · 2024-01-14T13:07:16Z

Hi, ikawrakow.
First, thank you very much for the new improved quants. They have yet to be widely tested but the initial PPL results you have shared are very good, and I really want to test them with q2-q3 ks quants of very dense models like goliath-120B and the newer community MoEs when I have time.

For your question, my opinion (as only a simple user) is that the "legacy" quants should remain the same, at least for now. The new imatrix technique you have invented seem to initially work very well with English (when using an English text as you have showcased), but I believe that a longer period of testing by the community will let us know how well they work in different uses and languages. Until then, I believe having the "legacy" quants exist as a fallback is the most desirable option.

A lot of people don't use the quantization tool themselves and rely on the quants released by users like TheBloke. If there is an issue that has been overlooked and the new quants (created using an English text for the imatrix generation) perform worse in some areas than the previous versions, having the legacy quants as an alternative would be the best option in my opinion.

ikawrakow · 2024-01-14T13:37:45Z

@abc-nix Does your opinion remain the same after learning that one will be still able to quantize both, k-quants and legacy quants, without using an importance matrix? So that, in case there are issues, one can always fall back to the existing quantization?

abc-nix · 2024-01-14T14:21:30Z

@ikawrakow My opinion amounts to less than a grain of sand, so don't consider it a general opinion but my own.

I did take into account that the current k-quants can optionally use the new and improved imatrix method (forced only for q2-k quants I think), which is a great benefit. With this new method we will find in the wild as many quants as there are datasets used to compute the imatrix, but this will also bring many quants that may compete to being the best of its size. It may also improve desired performance in certain areas depending on the dataset used, making the new k-quants much better compared to generalized quants. But a normal user, from the outside, will not be able to distinguish one from another only looking at the final file.

I think there should still be a reproducible format, the legacy format, that should be predictable to perform the same no matter if I create it, or I download it from a huggingface repo. Keeping the "legacy" quants as they are (even if it is optional to use the imatrix method, and this method could improve user experience) should also make it easier for people to help resolve issues some users may experience (like people complaining something is wrong with llama.cpp but after testing the legacy quant they realize the issue is with the specific dataset used on their k-quant or an issue with imatrix instead of the general program). Sometimes more options can also lead to more chaos. Having a reference quant that should be the same (without the risk of "mistakenly" using a bad dataset) for all users would make it easier to troubleshot.

As I said, this is only my opinion. Discard it as you would a grain of sand.

JohannesGaessler · 2024-01-14T17:14:16Z

I only really care about the legacy quants for development; it is much easier to prototype features for q4_0 or q8_0 than any of the k-quants due to the much simpler data structure. I don't particularly care whether the legacy quants have slightly better/worse perplexity because I usually only need to check whether it changes.

sorasoras · 2024-01-14T18:10:14Z

@abc-nix Does your opinion remain the same after learning that one will be still able to quantize both, k-quants and legacy quants, without using an importance matrix? So that, in case there are issues, one can always fall back to the existing quantization?

I think supporting legacy quants is needed.
Not All tensor in Qwen14B support K quants.
llama_model_loader: - type f32: 121 tensors
llama_model_loader: - type q5_0: 20 tensors
llama_model_loader: - type q8_0: 20 tensors
llama_model_loader: - type q4_K: 121 tensors
llama_model_loader: - type q5_K: 40 tensors
llama_model_loader: - type q6_K: 1 tensors
Some of Tensor have to fallback to Q5_0,Q8_0.
importance matrix support for legacy quants would indeed improve overall perplexity.

ggerganov · 2024-01-15T12:40:06Z

Optional importance matrix support for legacy quants similar to the one in #4930 would be useful.

ikawrakow · 2024-01-17T05:43:15Z

Closed via #4969

ikawrakow added the enhancement New feature or request label Jan 14, 2024

ikawrakow mentioned this issue Jan 16, 2024

Importance matrix support for legacy quants #4969

Merged

ikawrakow closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add importance matrix support for legacy quants? #4932

Add importance matrix support for legacy quants? #4932

ikawrakow commented Jan 14, 2024

abc-nix commented Jan 14, 2024

ikawrakow commented Jan 14, 2024

abc-nix commented Jan 14, 2024

JohannesGaessler commented Jan 14, 2024

sorasoras commented Jan 14, 2024

ggerganov commented Jan 15, 2024

ikawrakow commented Jan 17, 2024

Add importance matrix support for legacy quants? #4932

Add importance matrix support for legacy quants? #4932

Comments

ikawrakow commented Jan 14, 2024

abc-nix commented Jan 14, 2024

ikawrakow commented Jan 14, 2024

abc-nix commented Jan 14, 2024

JohannesGaessler commented Jan 14, 2024

sorasoras commented Jan 14, 2024

ggerganov commented Jan 15, 2024

ikawrakow commented Jan 17, 2024