-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
## Measurement/inference error (3): hidden_states #600
Comments
I appreciate you going through the steps to troubleshoot, but there are cases where models simply can't be quantized with FP16 precision because quantization requires running a reference forward pass. If that overflows because the model isn't properly normalized, there isn't much I can immediately do about it. ExLlama isn't really written to accept an arbitrary set of weights that isn't produced by training/finetuning. It makes certain assumptions about what language models are and how they work, and the sad truth is it doesn't take very long to cut a model up and stick it back together randomly, breaking any of those assumptions in the process and producing something that would take me hours or days to account for. There's simply no way for me to keep up. |
Thank you for taking the time to explain |
It's not that specifically 8 bpw is a problem. It's something that goes wrong during inference on the unquantized model, resulting in If a merge ever works it works more or less by accident, not because the methodology is sound. The results are always going to be unpredictable. I can't say exactly what the problem is in this case, and I don't even have a place to start. It's like your car isn't starting after you doubled the number of cylinders by cutting the engine in half and sticking another engine in the middle. The mechanic is just going to shrug and say, yeah, that's not how you get a more powerful engine. I guess, some things you can try:
|
I have made a fork to address this issue.: I requested a pull request in case my changes are helpful. |
Trying to make EXL2 quantization of Twilight-Miqu-146B using oobabooga/text-generation-webui version 1.14 always encounters an error at layer 169.
I get the same error using the following two commands:
cd exllamav2-0.1.9 python convert.py -i ../models/Twilight-Miqu-146B -o CCCW -cf ../models/Twilight-Miqu-146B-EXL2-RPCAL -c PIPPA-cleaned/pippa_raw_fix.parquet -b 8 -hb 8 -nr
cd exllamav2-0.1.9 python convert.py -i ../models/Twilight-Miqu-146B -o CCCW -cf ../models/Twilight-Miqu-146B-EXL2 -b 8 -hb 8 -nr
I also tried updating exllamav2 to version 0.1.9 and still got the same error. And I also tried re-downloading the original model file but it still didn’t work.
Environment: win10, AMD Ryzen Threadripper 7960X, 256GB RAM, 2x RTX 4090 & 3x RTX A6000, Python 3.11.9
I don't understand the programming code at all and only have a superficial understanding of it. Please help me. Thank you.
The text was updated successfully, but these errors were encountered: