Skip to content

Misc. bug: Hybrid models failing to load with assert GGML_ASSERT(kv_size % n_pad == 0) #14724

@dinerburger

Description

@dinerburger

Name and Version

version: 5913 (225e7a1)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

libllama (core library)

Command line

llama-cli -m /storage/models/textgen/ibm-granite_granite-4.0-tiny-preview-bf16.gguf --no-mmap --jinja -sys 'You are a helpful assistant'

Problem description & steps to reproduce

Looks like this line to send unified as true was put in the wrong place maybe; should perhaps instead look like this:

    mem_attn(new llama_kv_cache_unified(
        model,
        filter_attn == nullptr ?
            [&](int32_t il) { return !hparams.is_recurrent(il); }
            : filter_attn,
        type_k,
        type_v,
        v_trans,
        offload,
        1,
        kv_size,
        n_seq_max,
        n_pad,
        n_swa,
        swa_type
    )),

First Bad Commit

Looks like the breakage comes down with the high-throughput mode PR.

Relevant log output

GDB output from error:


#0  llama_kv_cache_unified::llama_kv_cache_unified(llama_model const&, std::function<bool (int)>&&, ggml_type, ggml_type, bool, bool, bool, unsigned int, unsigned int, unsigned int, unsigned int, llama_swa_type) (this=0x5555585f7e60, 
    model=..., filter=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, offload=true, unified=true, kv_size=1, n_seq_max=1, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE)
    at /home/burger/llama.cpp/src/llama-kv-cache-unified.cpp:35
#1  0x00007ffff7c7be93 in llama_memory_hybrid::llama_memory_hybrid(llama_model const&, ggml_type, ggml_type, bool, unsigned int, unsigned int, unsigned int, llama_swa_type, ggml_type, ggml_type, unsigned int, unsigned int, bool, std::function<bool (int)>&&, std::function<bool (int)>&&) (this=0x55555747dce0, model=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, kv_size=4096, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE, type_r=GGML_TYPE_F32, 
    type_s=GGML_TYPE_F32, rs_size=1, n_seq_max=1, offload=true, filter_attn=..., filter_recr=...) at /home/burger/llama.cpp/src/llama-memory-hybrid.cpp:47
#2  0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
#3  0x00007ffff7beb321 in llama_context::llama_context (this=0x5555575a5010, model=..., params=...) at /home/burger/llama.cpp/src/llama-context.cpp:208
#4  0x00007ffff7bf4b93 in llama_init_from_model (model=0x555556a08c90, params=...) at /home/burger/llama.cpp/src/llama-context.cpp:2253
#5  0x00005555557901ef in common_init_from_params (params=...) at /home/burger/llama.cpp/common/common.cpp:916
#6  0x00005555555d7231 in main (argc=7, argv=0x7fffffffe0f8) at /home/burger/llama.cpp/tools/main/main.cpp:140
(gdb) f 2
#2  0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
16646                         /* filter_recr       */ (arch == LLM_ARCH_FALCON_H1) ? [&](int32_t) { return true; } : (llama_memory_hybrid::layer_filter_cb)nullptr);
(gdb) 
#2  0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
16646                         /* filter_recr       */ (arch == LLM_ARCH_FALCON_H1) ? [&](int32_t) { return true; } : (llama_memory_hybrid::layer_filter_cb)nullptr);
(gdb) bt
#0  llama_kv_cache_unified::llama_kv_cache_unified(llama_model const&, std::function<bool (int)>&&, ggml_type, ggml_type, bool, bool, bool, unsigned int, unsigned int, unsigned int, unsigned int, llama_swa_type) (this=0x5555585f7e60, 
    model=..., filter=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, offload=true, unified=true, kv_size=1, n_seq_max=1, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE)
    at /home/burger/llama.cpp/src/llama-kv-cache-unified.cpp:35
#1  0x00007ffff7c7be93 in llama_memory_hybrid::llama_memory_hybrid(llama_model const&, ggml_type, ggml_type, bool, unsigned int, unsigned int, unsigned int, llama_swa_type, ggml_type, ggml_type, unsigned int, unsigned int, bool, std::function<bool (int)>&&, std::function<bool (int)>&&) (this=0x55555747dce0, model=..., type_k=GGML_TYPE_F16, type_v=GGML_TYPE_F16, v_trans=true, kv_size=4096, n_pad=32, n_swa=0, swa_type=LLAMA_SWA_TYPE_NONE, type_r=GGML_TYPE_F32, 
    type_s=GGML_TYPE_F32, rs_size=1, n_seq_max=1, offload=true, filter_attn=..., filter_recr=...) at /home/burger/llama.cpp/src/llama-memory-hybrid.cpp:47
#2  0x00007ffff7ceeebc in llama_model::create_memory (this=0x555556a08c90, params=..., cparams=...) at /home/burger/llama.cpp/src/llama-model.cpp:16646
#3  0x00007ffff7beb321 in llama_context::llama_context (this=0x5555575a5010, model=..., params=...) at /home/burger/llama.cpp/src/llama-context.cpp:208
#4  0x00007ffff7bf4b93 in llama_init_from_model (model=0x555556a08c90, params=...) at /home/burger/llama.cpp/src/llama-context.cpp:2253
#5  0x00005555557901ef in common_init_from_params (params=...) at /home/burger/llama.cpp/common/common.cpp:916
#6  0x00005555555d7231 in main (argc=7, argv=0x7fffffffe0f8) at /home/burger/llama.cpp/tools/main/main.cpp:140

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions