I have the implementation ready, but I'm not sure if this is what we want. Use of an importance matrix does improve perplexity for all models I have tried. But on the other hand the "legacy" ggml quants Q4_0 and Q5_0 are never very good, but they are also never really bad (Q4_1 and Q5_1 have more erratic behavior, for some models being better than Q4_0/Q5_0 and for other models being worse). Hence, one may want to preserve them the way they are as a kind of reference.
Opinions?