-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Closed
Labels
bug-unconfirmedcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)
Description
What happened?
llama-cli crashes with an std::out_of_range error for model https://huggingface.co/bartowski/codegeex4-all-9b-GGUF/tree/main
I have tested Q8_0 and Q4_K_M, probably applies to other quantizations as well.
This happens when building from source or when using release b3371, probably also with other releases. Here are steps to reproduce this with Ubuntu 22.04 in Docker:
sudo apt update && sudo apt install wget
wget -nc https://huggingface.co/bartowski/codegeex4-all-9b-GGUF/resolve/main/codegeex4-all-9b-Q4_K_M.gguf?download=true -O codegeex4-all-9b-Q4_K_M.gguf
docker run --rm -ti --volume "$PWD":/data ubuntu:22.04
apt update && apt install wget unzip libgomp1 libcurl4 -yy
wget https://github.com/ggerganov/llama.cpp/releases/download/b3371/llama-b3371-bin-ubuntu-x64.zip
unzip llama-b3371-bin-ubuntu-x64.zip
./build/bin/llama-cli -m /data/codegeex4-all-9b-Q4_K_M.gguf
# crashes hereThe faulty index 18446744073709551615 is equal to 0xffffffffffffffff or -1. My guess would be that someone passed -1 to a function expecting a value of type size_t, which usually is an unsigned 64 bit integer.
Name and Version
version: 3371 (9a55ffe)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
# ./build/bin/llama-cli -m /data/codegeex4-all-9b-Q4_K_M.gguf
Log start
main: build = 3371 (9a55ffe6)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed = 1720699310
llama_model_loader: loaded meta data with 27 key-value pairs and 283 tensors from /data/codegeex4-all-9b-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = chatglm
llama_model_loader: - kv 1: general.name str = codegeex4-all-9b
llama_model_loader: - kv 2: chatglm.context_length u32 = 131072
llama_model_loader: - kv 3: chatglm.embedding_length u32 = 4096
llama_model_loader: - kv 4: chatglm.feed_forward_length u32 = 13696
llama_model_loader: - kv 5: chatglm.block_count u32 = 40
llama_model_loader: - kv 6: chatglm.attention.head_count u32 = 32
llama_model_loader: - kv 7: chatglm.attention.head_count_kv u32 = 2
llama_model_loader: - kv 8: chatglm.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 9: general.file_type u32 = 15
llama_model_loader: - kv 10: chatglm.rope.dimension_count u32 = 64
llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 12: chatglm.rope.freq_base f32 = 5000000.000000
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = chatglm-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151552] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151552] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151073] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 151329
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 151329
llama_model_loader: - kv 20: tokenizer.ggml.eot_token_id u32 = 151336
llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 151329
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - kv 23: quantize.imatrix.file str = /models/codegeex4-all-9b-GGUF/codegee...
llama_model_loader: - kv 24: quantize.imatrix.dataset str = /training_data/calibration_datav3.txt
llama_model_loader: - kv 25: quantize.imatrix.entries_count i32 = 160
llama_model_loader: - kv 26: quantize.imatrix.chunks_count i32 = 125
llama_model_loader: - type f32: 121 tensors
llama_model_loader: - type q5_0: 20 tensors
llama_model_loader: - type q8_0: 20 tensors
llama_model_loader: - type q4_K: 81 tensors
llama_model_loader: - type q5_K: 40 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens cache size = 223
llm_load_vocab: token to piece cache size = 0.9732 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = chatglm
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 151552
llm_load_print_meta: n_merges = 151073
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 2
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 16
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 13696
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 5000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 9B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 9.40 B
llm_load_print_meta: model size = 5.82 GiB (5.31 BPW)
llm_load_print_meta: general.name = codegeex4-all-9b
llm_load_print_meta: EOS token = 151329 '<|endoftext|>'
llm_load_print_meta: UNK token = 151329 '<|endoftext|>'
llm_load_print_meta: PAD token = 151329 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 151336 '<|user|>'
llm_load_print_meta: max token length = 1024
llm_load_tensors: ggml ctx size = 0.14 MiB
llm_load_tensors: CPU buffer size = 5955.59 MiB
........................................................................................
llama_new_context_with_model: n_ctx = 131072
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 5000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 5120.00 MiB
llama_new_context_with_model: KV self size = 5120.00 MiB, K (f16): 2560.00 MiB, V (f16): 2560.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.58 MiB
llama_new_context_with_model: CPU compute buffer size = 8481.01 MiB
llama_new_context_with_model: graph nodes = 1606
llama_new_context_with_model: graph splits = 1
system_info: n_threads = 28 / 56 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 |
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 151552)
Aborted (core dumped)Metadata
Metadata
Assignees
Labels
bug-unconfirmedcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)