-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ models support #31
Comments
It's working without any problem but why the generation speed is slow compared non quantized models? |
Hello! There shouldn't be any major changes in generation, but
|
Thanks for the fast response. Do you plan to work on it someday? I can implement it If you can explain flash attention a little bit. |
It's available at this branch:https://github.com/Minami-su/attention_sinks_autogptq @synacktraa |
Thankyou🙏 |
Can it handle GPTQ models like transformers library's
AutoModelForCausalLM
does?The text was updated successfully, but these errors were encountered: