[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

zgiles · 2023-10-05T03:21:50Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I'm attempting to run llama.cpp, latest master, with TheBloke's Falcon 180B Q5/Q6 quantized GGUF models, but it errors out with "invalid character".
I'm unable to find any issues about this online anywhere.
Another system of mind causes the same problem, and a buddy's system does as well.
llama.cpp functions normally on other models, such as Llama2, WizardLM, etc.

The downloaded GGUF file works with "text-generation-webui" so it is functioning, and verified as a good copy by others in the community.

Current Behavior

$ ./main -t 8 -m ../falcon-180b-chat.Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "USER: Write a story about llamas. ASSISTANT:"
# ( OR any number of parameters, just -m <model> is enough )
...
< Many Tensors >
...
lama_model_loader: - tensor  640:          blk.79.attn_norm.weight f32      [ 14848,     1,     1,     1 ]
llama_model_loader: - tensor  641:           blk.79.ffn_down.weight q6_K     [ 59392, 14848,     1,     1 ]
llama_model_loader: - tensor  642:                 output_norm.bias f32      [ 14848,     1,     1,     1 ]                                                                                                                                   
llama_model_loader: - tensor  643:               output_norm.weight f32      [ 14848,     1,     1,     1 ]                                                                                                                                   
llama_model_loader: - kv   0:                       general.architecture str                                                                                                                                                                  
llama_model_loader: - kv   1:                               general.name str                               
llama_model_loader: - kv   2:                      falcon.context_length u32                                                                                                                                                                  
llama_model_loader: - kv   3:                  falcon.tensor_data_layout str                                           
llama_model_loader: - kv   4:                    falcon.embedding_length u32                                           
llama_model_loader: - kv   5:                 falcon.feed_forward_length u32                               
llama_model_loader: - kv   6:                         falcon.block_count u32     
llama_model_loader: - kv   7:                falcon.attention.head_count u32     
llama_model_loader: - kv   8:             falcon.attention.head_count_kv u32     
llama_model_loader: - kv   9:        falcon.attention.layer_norm_epsilon f32     
llama_model_loader: - kv  10:                          general.file_type u32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr     
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - kv  17:               general.quantization_version u32     
llama_model_loader: - type  f32:  322 tensors
llama_model_loader: - type q8_0:    1 tensors
llama_model_loader: - type q5_K:  201 tensors
llama_model_loader: - type q6_K:  120 tensors
error loading model: invalid character
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../falcon-180b-chat.Q5_K_M.gguf'
main: error: unable to load model

Happy to provide longer output, but it was pretty standard model shapes/sizes ahead of the loader and error.

Environment and Context

Dell R740xd, 640GB RAM, Skylake processors Xeon Silver 4112 CPU @ 2.60GHz, Ubuntu Focal 20.04,

$ git log | head -1
commit 019ba1dcd0c7775a5ac0f7442634a330eb0173cc

$ shasum -a 256 ../falcon-180b-chat.Q5_K_M.gguf 
e49e65f34b807d7cdae33d91ce8bd7610f87cd534a2d17ef965c6cf6b03bf3d8  ../falcon-180b-chat.Q5_K_M.gguf

Please let me know if this is already known, I can't seem to find it, and/or if I can help repo somehow. Thx

The text was updated successfully, but these errors were encountered:

BarfingLemurs · 2023-10-05T04:28:34Z

@zgiles I loaded a Q4_0 model on b1305.
b1311 fails to load the model. Can you confirm b1311 has the breaking commit?

Check #3252

zgiles · 2023-10-05T12:30:31Z

Confirmed b1305 works, b1309 works.
b1311 does not work.

goerch · 2023-10-06T07:31:27Z

Did you convert your model after #3525? That change is breaking for BPE vocabulary. Sorry I didn't consider announcing this more prominently.

ggerganov · 2023-10-06T08:06:57Z

I tried re-converting the model and it works.
We have to put a notice in the readme hot topics

Ph0rk0z · 2023-10-15T13:23:35Z

So is there a way to fix the GGUF file? Because I don't have bandwith to d/l the FP16 model and I'm not sure anyone has updated the quants that are released, have they? Still it's another almost 100GB d/l.

BarfingLemurs · 2023-10-15T15:00:57Z

On linux. when trying to convert the HF base model to f16 gguf, It wouldn't let me continue creating the file ~~after 100gb or so.~~ (200gb)

output_norm.bias, n_dims = 1, torch.bfloat16 --> float32
output_norm.weight, n_dims = 1, torch.bfloat16 --> float32
gguf: write header
gguf: write metadata
gguf: write tensors
Traceback (most recent call last):
  File "/home/user/llama.cpp/./convert-falcon-hf-to-gguf.py", line 245, in <module>
    gguf_writer.write_tensors_to_file()
  File "/home/user/llama.cpp/gguf-py/gguf/gguf.py", line 836, in write_tensors_to_file
    shutil.copyfileobj(self.temp_file, self.fout)
  File "/usr/lib/python3.10/shutil.py", line 198, in copyfileobj
    fdst_write(buf)
OSError: [Errno 28] No space left on device

I should have enough space though.

groovybits · 2023-10-17T01:45:14Z

Is there a GGUF up yet? I can't see how to download and convert it or else I would try. Yet seems like someone should be able to share a fixed one somewhere soon hopefully to avoid that.

BarfingLemurs · 2023-10-17T05:29:18Z

@groovybits First download torch, transformers, and requirements.txt,

Then run python3 convert-falcon-hf-to-gguf.py falcon-180B-chat 1

Here's a tool you can use to get the model: https://github.com/bodaay/HuggingFaceModelDownloader

Ph0rk0z · 2023-10-17T13:46:32Z

It's not just falcon 180b either.. all other falcon are similarly broken.

only-cliches · 2023-10-17T21:11:11Z

Falcon 40b is working for me, here is a scrip that should do the trick. Make sure you have Git LFS installed.

# From the root of llama.cpp
git clone https://huggingface.co/tiiuae/falcon-40b models/falcon-40b

pip3 install requirements.txt
pip3 install transformers torch

# convert to gguff
python3 convert-falcon-hf-to-gguf.py models/falcon-40b

# quantize
./quantize ./models/falcon-40b/ggml-model-f16.gguf ./models/falcon-40b/ggml-model-q4_0.gguf q4_0

# Profit

jxy · 2023-10-18T02:58:40Z

Is there a way to convert previous GGUF file to the current GGUF file?

ggerganov · 2023-10-18T07:32:24Z

There is no way to convert old GGUF to the new one - you would need to start from the original model

Ph0rk0z · 2023-10-18T11:23:37Z

Falcon 40b is working for me, here is a scrip that should do the trick. Make sure you have Git LFS installed.

# From the root of llama.cpp
git clone https://huggingface.co/tiiuae/falcon-40b models/falcon-40b

pip3 install requirements.txt
pip3 install transformers torch

# convert to gguff
python3 convert-falcon-hf-to-gguf.py models/falcon-40b

# quantize
./quantize ./models/falcon-40b/ggml-model-f16.gguf ./models/falcon-40b/ggml-model-q4_0.gguf q4_0

# Profit

Yea.. if you reconvert from scratch it's working. Problem is I can't download 400gb to try it. Falcon 40b is only interesting for me to see if lora merging would work for it before the same can be done the 180b model to finally get good use out of it. Until someone converts it, I'm sunk.

BarfingLemurs · 2023-10-20T10:11:54Z

@Ph0rk0z the 180B chat falcon repository is updated now.

Ph0rk0z · 2023-10-20T11:26:50Z

Yes, it's half downloaded, we're back. Still no falcon 40b. Guess I have to test lora merges on the big model only.

ggerganov mentioned this issue Oct 5, 2023

add refact model #3329

Merged

ggerganov mentioned this issue Oct 5, 2023

convert : update Falcon script for new HF config #3448

Merged

ggerganov closed this as completed Oct 18, 2023

MB-Finski mentioned this issue Feb 14, 2024

Add neuralbeagle as a new llm model nextcloud/llm#53

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

zgiles commented Oct 5, 2023 •

edited

Loading

BarfingLemurs commented Oct 5, 2023 •

edited

Loading

zgiles commented Oct 5, 2023

goerch commented Oct 6, 2023

ggerganov commented Oct 6, 2023

Ph0rk0z commented Oct 15, 2023

BarfingLemurs commented Oct 15, 2023 •

edited

Loading

groovybits commented Oct 17, 2023

BarfingLemurs commented Oct 17, 2023

Ph0rk0z commented Oct 17, 2023

only-cliches commented Oct 17, 2023 •

edited

Loading

jxy commented Oct 18, 2023

ggerganov commented Oct 18, 2023

Ph0rk0z commented Oct 18, 2023

BarfingLemurs commented Oct 20, 2023

Ph0rk0z commented Oct 20, 2023

[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

Comments

zgiles commented Oct 5, 2023 • edited Loading

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

BarfingLemurs commented Oct 5, 2023 • edited Loading

zgiles commented Oct 5, 2023

goerch commented Oct 6, 2023

ggerganov commented Oct 6, 2023

Ph0rk0z commented Oct 15, 2023

BarfingLemurs commented Oct 15, 2023 • edited Loading

groovybits commented Oct 17, 2023

BarfingLemurs commented Oct 17, 2023

Ph0rk0z commented Oct 17, 2023

only-cliches commented Oct 17, 2023 • edited Loading

jxy commented Oct 18, 2023

ggerganov commented Oct 18, 2023

Ph0rk0z commented Oct 18, 2023

BarfingLemurs commented Oct 20, 2023

Ph0rk0z commented Oct 20, 2023

zgiles commented Oct 5, 2023 •

edited

Loading

BarfingLemurs commented Oct 5, 2023 •

edited

Loading

BarfingLemurs commented Oct 15, 2023 •

edited

Loading

only-cliches commented Oct 17, 2023 •

edited

Loading