Remove exllamav1 loaders#5128
Conversation
|
How does it compare for not having an ampere card though? Also for not using flash attention. I'm not using it often either but I'm also not really using quip/awq or HQQ at all if we're going by that. I think that exllama 1 was also compatible with the old flash attention that ran on cards below ampere but I only have pascal and ampere so I can't really confirm. During the holidays nobody who used it is probably going to notice to complain. |
|
Try to migrate from exllamav1 to exllamav2 with my AMD Instinct cards and have garbage output. |
|
My GPU perform much better on exllamav1 on 13b models. Disabling cache 8 bit won't fix the performance issue Performance is about 10x slower on exllamav2 compare to v1 Please bring back exllamav1 and exllamav1_hf |
|
我体验了exllamav2,但感觉他并不是那么完美,速度确实很快,但相同参数回复短了很多 |
|
Please bring back exllamav1 and exllamav1_hf! This allows you to load the 10.7B models completely, while exllamav2 gives you an out of memory for 8gb GPU. |
|
exllama2 sucks in some cases and to get the previous one back, you have to downgrade the version thanks for the new spokes in the wheels 🥰 |
|
@oobabooga pls revert |
|
If you have a performance problem with exllamav2 that was not present exllamav1, you should open an issue in the exllamav2 repository. |
|
in my case it's not about performance exllama and exllama2 have different results // we run 1mln rows daily and this is critical/noticeable for us |
|
damn, you can't roll back to the previous version :? it just won't start :/ try to roll back to that commit and start it from scratch yourself - you'll understand why there is such a return request |
|
To be fair, it reverted for me fine. Need to check how well it works. |
ExLlamav1 hasn't received a commit in 3 months and does not support Mixtral.
The downsides of ExLlamav2 relative to v1 are slightly higher VRAM usage and slightly higher perplexity for the same GPTQ model:
The perplexity difference is not significant and the VRAM usage can be reduced with
--cache_8bit. So I see no point in keeping ExLlamav1.