-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Unable to run meta-llama/Llama-Guard-3-8B-INT8 #6756
Comments
@thesues @chenqianfzh It looks like this is an 8bit BNB model. Would it be easy to add support for these checkpoints as well? |
It won't be difficult. I will work on it with higher priority. |
seems version 0.5.4+cu124 is working with bnb 4bit model. but it says WARNING 08-06 06:27:07 config.py:254] bitsandbytes quantization is not fully optimized yet. The speed can be slower than non-quantized models. Will that be a easy fix/support too? |
with this PR, meta-llama/Llama-Guard-3-8B-INT8 is supported. |
A lot of quantizations here are not in the optimized method list yet. It is not our top priority now to optimize the speed yet, as we are working to support more quantization features of bnb. |
Your current environment
Latest Docker image, RTX 4090
🐛 Describe the bug
The text was updated successfully, but these errors were encountered: