QTIP: Quantization with Trellises and Incoherence Processing #10125

sorasoras · 2024-11-01T18:16:43Z

sorasoras
Nov 1, 2024

introduce QTIP, a new LLM quantization algorithm that uses trellis coded quantization and incoherence processing to achieve a state of the art combination of speed and quantization quality.

https://www.reddit.com/r/LocalLLaMA/comments/1ggwrx6/new_quantization_method_qtip_quantization_with/

Quote
" It should be pretty easy to integrate QTIP into llama.cpp. QTIP replaces the vector quantizer in QuIP# with a trellis quantizer. Llama.cpp's vector quantizer is based off of QuIP#'s E8P vector quantizer, so it should be straightforward to swap QTIP's trellis quantizer in instead."

JomboKabary · 2024-11-02T02:28:25Z

JomboKabary
Nov 2, 2024

We really need this! More info about QTIP at: https://arxiv.org/pdf/2406.11235

1 reply

robbiemu Nov 3, 2024

Is it me or should the fact that their graph shows their quantizations sometimes OUTPERFORMING the fp16 baseline require some explanation? (Did not read it yet)

Edit: my mistake, fp16@4bit is what is shown.. is that q4_0_0 here?

JasonOSX · 2024-11-02T10:40:38Z

JasonOSX
Nov 2, 2024

This was already asked before you posted this in the topic about sota quants, but I guess the extra visibility can't hurt. I'm curious to see whether this is truly easy to integrate and also if it can surpass the current quants without any downsides.

0 replies

robbiemu · 2024-11-08T15:40:06Z

robbiemu
Nov 8, 2024

what is the architecture for quantization in llama.cpp? I need to find documentation :D I am curious if we can bring this in and still apply (vs ignore) importance matrices

0 replies

Mushoz · 2024-12-12T11:14:14Z

Mushoz
Dec 12, 2024

The first QTIP quantized models are now hitting HF, see: https://www.reddit.com/r/LocalLLaMA/comments/1hbng5l/qtip_2_3_and_4_bit_llama_33_70b_instruct_now_on_hf/

Would be awesome if this could be supported in Llamacpp

0 replies

MrReplikant · 2024-12-13T14:39:51Z

MrReplikant
Dec 13, 2024

I agree, it would be really nice to see this here. For the benefits it provides, I think it would be well worth it

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QTIP: Quantization with Trellises and Incoherence Processing #10125

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

QTIP: Quantization with Trellises and Incoherence Processing #10125

sorasoras Nov 1, 2024

Replies: 5 comments · 1 reply

JomboKabary Nov 2, 2024

robbiemu Nov 3, 2024

JasonOSX Nov 2, 2024

robbiemu Nov 8, 2024

Mushoz Dec 12, 2024

MrReplikant Dec 13, 2024

sorasoras
Nov 1, 2024

Replies: 5 comments 1 reply

JomboKabary
Nov 2, 2024

JasonOSX
Nov 2, 2024

robbiemu
Nov 8, 2024

Mushoz
Dec 12, 2024

MrReplikant
Dec 13, 2024