-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for quantization and qlora models? #906
Comments
yes quite need. that is amazing |
I am not a maintainer. This is a duplicate of #744 |
I am using vllm with qlora. I merge the adapters with
and point vllm to this dir. Given that the A & B matrices are added to original weight matrices, there is no change in dimension or architecture. So, vllm can consume it. I would like a way to pass the model directly after load without saving it though |
Closing in favour of #3225 because quantization and LoRA are both supported. We just need QLoRA support. |
would love to use this with quantization and adapters
The text was updated successfully, but these errors were encountered: