-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484
Comments
Confirmed b1305 works, b1309 works. |
Did you convert your model after #3525? That change is breaking for |
I tried re-converting the model and it works. |
So is there a way to fix the GGUF file? Because I don't have bandwith to d/l the FP16 model and I'm not sure anyone has updated the quants that are released, have they? Still it's another almost 100GB d/l. |
On linux. when trying to convert the HF base model to f16 gguf, It wouldn't let me continue creating the file
I should have enough space though. |
Is there a GGUF up yet? I can't see how to download and convert it or else I would try. Yet seems like someone should be able to share a fixed one somewhere soon hopefully to avoid that. |
@groovybits First download torch, transformers, and requirements.txt, Then run Here's a tool you can use to get the model: https://github.com/bodaay/HuggingFaceModelDownloader |
It's not just falcon 180b either.. all other falcon are similarly broken. |
Falcon 40b is working for me, here is a scrip that should do the trick. Make sure you have Git LFS installed. # From the root of llama.cpp
git clone https://huggingface.co/tiiuae/falcon-40b models/falcon-40b
pip3 install requirements.txt
pip3 install transformers torch
# convert to gguff
python3 convert-falcon-hf-to-gguf.py models/falcon-40b
# quantize
./quantize ./models/falcon-40b/ggml-model-f16.gguf ./models/falcon-40b/ggml-model-q4_0.gguf q4_0
# Profit |
Is there a way to convert previous GGUF file to the current GGUF file? |
There is no way to convert old GGUF to the new one - you would need to start from the original model |
Yea.. if you reconvert from scratch it's working. Problem is I can't download 400gb to try it. Falcon 40b is only interesting for me to see if lora merging would work for it before the same can be done the 180b model to finally get good use out of it. Until someone converts it, I'm sunk. |
@Ph0rk0z the 180B chat falcon repository is updated now. |
Yes, it's half downloaded, we're back. Still no falcon 40b. Guess I have to test lora merges on the big model only. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
I'm attempting to run llama.cpp, latest master, with TheBloke's Falcon 180B Q5/Q6 quantized GGUF models, but it errors out with "invalid character".
I'm unable to find any issues about this online anywhere.
Another system of mind causes the same problem, and a buddy's system does as well.
llama.cpp functions normally on other models, such as Llama2, WizardLM, etc.
The downloaded GGUF file works with "text-generation-webui" so it is functioning, and verified as a good copy by others in the community.
Current Behavior
Happy to provide longer output, but it was pretty standard model shapes/sizes ahead of the loader and error.
Environment and Context
Dell R740xd, 640GB RAM, Skylake processors Xeon Silver 4112 CPU @ 2.60GHz, Ubuntu Focal 20.04,
Please let me know if this is already known, I can't seem to find it, and/or if I can help repo somehow. Thx
The text was updated successfully, but these errors were encountered: