Remove non-HF ExLlamaV2 loader#5431
Conversation
|
Won't this cause problems for #5375 ? |
|
In this case we can't use native sampling of exllamav2 though. |
|
It's not true that there is zero speed difference. Non-HF loader is around 10% faster for goliath-120b. |
|
Quick bench using ooba from before this commit and exllamav2 master branch from 5 minutes ago on runpod A100 80GB. HF: Non-HF: |
Why is this important, if the builtin sampling in exllamav2 works fine? |
|
For some stuff I like HF samplers and for some stuff the native ones. I forgot about the extra 1 t/s. It happens in llama.cpp too, a tiny difference due to overhead from HF. Not to mention seeing the actual top speeds in .cpp It also helps to troubleshoot issues with HF vs the original loader. There are like a million reasons to keep it. |
|
Thanks for restoring it! |
This reverts commit cde000d.
Since PR #4814, the speed difference between
ExLlamav2andExLlamav2_HFis zero. So I see no point in keeping the non-HF version, which is redundant and which samples in a way not guaranteed to be consistent with HF transformers sampling.