Replies: 5 comments 1 reply
-
We really need this! More info about QTIP at: https://arxiv.org/pdf/2406.11235 |
Beta Was this translation helpful? Give feedback.
-
This was already asked before you posted this in the topic about sota quants, but I guess the extra visibility can't hurt. I'm curious to see whether this is truly easy to integrate and also if it can surpass the current quants without any downsides. |
Beta Was this translation helpful? Give feedback.
-
what is the architecture for quantization in llama.cpp? I need to find documentation :D I am curious if we can bring this in and still apply (vs ignore) importance matrices |
Beta Was this translation helpful? Give feedback.
-
The first QTIP quantized models are now hitting HF, see: https://www.reddit.com/r/LocalLLaMA/comments/1hbng5l/qtip_2_3_and_4_bit_llama_33_70b_instruct_now_on_hf/ Would be awesome if this could be supported in Llamacpp |
Beta Was this translation helpful? Give feedback.
-
I agree, it would be really nice to see this here. For the benefits it provides, I think it would be well worth it |
Beta Was this translation helpful? Give feedback.
-
introduce QTIP, a new LLM quantization algorithm that uses trellis coded quantization and incoherence processing to achieve a state of the art combination of speed and quantization quality.
https://www.reddit.com/r/LocalLLaMA/comments/1ggwrx6/new_quantization_method_qtip_quantization_with/
Quote
" It should be pretty easy to integrate QTIP into llama.cpp. QTIP replaces the vector quantizer in QuIP# with a trellis quantizer. Llama.cpp's vector quantizer is based off of QuIP#'s E8P vector quantizer, so it should be straightforward to swap QTIP's trellis quantizer in instead."
Beta Was this translation helpful? Give feedback.
All reactions