-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
Description
Name and Version
version: 5913 (225e7a1)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
libllama (core library)
Command line
llama-cli -m /storage/models/textgen/ibm-granite_granite-4.0-tiny-preview-bf16.gguf --no-mmap --jinja -sys 'You are a helpful assistant'
Problem description & steps to reproduce
Looks like this line to send unified as true was put in the wrong place maybe; should perhaps instead look like this:
mem_attn(new llama_kv_cache_unified(
model,
filter_attn == nullptr ?
[&](int32_t il) { return !hparams.is_recurrent(il); }
: filter_attn,
type_k,
type_v,
v_trans,
offload,
1,
kv_size,
n_seq_max,
n_pad,
n_swa,
swa_type
)),
First Bad Commit
Looks like the breakage comes down with the high-throughput mode PR.
Relevant log output
GDB output from error:
#0 llama_kv_cache_unified::llama_kv_cache_unified(llama_model const&, std::function<bool (int)>&&, ggml_type, ggml_type, bool, bool, bool, unsigned int, unsigned int, unsigned int, unsigned int, llama_swa_type) (this=0x5555585f7e60,
model=..., filter=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, offload=true, unified=true, kv_size=1, n_seq_max=1, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE)
at /home/burger/llama.cpp/src/llama-kv-cache-unified.cpp:35
#1 0x00007ffff7c7be93 in llama_memory_hybrid::llama_memory_hybrid(llama_model const&, ggml_type, ggml_type, bool, unsigned int, unsigned int, unsigned int, llama_swa_type, ggml_type, ggml_type, unsigned int, unsigned int, bool, std::function<bool (int)>&&, std::function<bool (int)>&&) (this=0x55555747dce0, model=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, kv_size=4096, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE, type_r=GGML_TYPE_F32,
type_s=GGML_TYPE_F32, rs_size=1, n_seq_max=1, offload=true, filter_attn=..., filter_recr=...) at /home/burger/llama.cpp/src/llama-memory-hybrid.cpp:47
#2 0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
#3 0x00007ffff7beb321 in llama_context::llama_context (this=0x5555575a5010, model=..., params=...) at /home/burger/llama.cpp/src/llama-context.cpp:208
#4 0x00007ffff7bf4b93 in llama_init_from_model (model=0x555556a08c90, params=...) at /home/burger/llama.cpp/src/llama-context.cpp:2253
#5 0x00005555557901ef in common_init_from_params (params=...) at /home/burger/llama.cpp/common/common.cpp:916
#6 0x00005555555d7231 in main (argc=7, argv=0x7fffffffe0f8) at /home/burger/llama.cpp/tools/main/main.cpp:140
(gdb) f 2
#2 0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
16646 /* filter_recr */ (arch == LLM_ARCH_FALCON_H1) ? [&](int32_t) { return true; } : (llama_memory_hybrid::layer_filter_cb)nullptr);
(gdb)
#2 0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
16646 /* filter_recr */ (arch == LLM_ARCH_FALCON_H1) ? [&](int32_t) { return true; } : (llama_memory_hybrid::layer_filter_cb)nullptr);
(gdb) bt
#0 llama_kv_cache_unified::llama_kv_cache_unified(llama_model const&, std::function<bool (int)>&&, ggml_type, ggml_type, bool, bool, bool, unsigned int, unsigned int, unsigned int, unsigned int, llama_swa_type) (this=0x5555585f7e60,
model=..., filter=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, offload=true, unified=true, kv_size=1, n_seq_max=1, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE)
at /home/burger/llama.cpp/src/llama-kv-cache-unified.cpp:35
#1 0x00007ffff7c7be93 in llama_memory_hybrid::llama_memory_hybrid(llama_model const&, ggml_type, ggml_type, bool, unsigned int, unsigned int, unsigned int, llama_swa_type, ggml_type, ggml_type, unsigned int, unsigned int, bool, std::function<bool (int)>&&, std::function<bool (int)>&&) (this=0x55555747dce0, model=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, kv_size=4096, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE, type_r=GGML_TYPE_F32,
type_s=GGML_TYPE_F32, rs_size=1, n_seq_max=1, offload=true, filter_attn=..., filter_recr=...) at /home/burger/llama.cpp/src/llama-memory-hybrid.cpp:47
#2 0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
#3 0x00007ffff7beb321 in llama_context::llama_context (this=0x5555575a5010, model=..., params=...) at /home/burger/llama.cpp/src/llama-context.cpp:208
#4 0x00007ffff7bf4b93 in llama_init_from_model (model=0x555556a08c90, params=...) at /home/burger/llama.cpp/src/llama-context.cpp:2253
#5 0x00005555557901ef in common_init_from_params (params=...) at /home/burger/llama.cpp/common/common.cpp:916
#6 0x00005555555d7231 in main (argc=7, argv=0x7fffffffe0f8) at /home/burger/llama.cpp/tools/main/main.cpp:140
MaggotHATE