Skip to content

Conversation

compilade
Copy link
Collaborator

Follow-up from #14725, which didn't really fix the underlying problem of not considering cparams.kv_unified.

Since #14959, inference with hybrid models has been broken (except when using -kvu), due to hybrid memory not passing cparams.kv_unified properly.

Reproduction of the problem: attempt to run llama-perplexity with any hybrid model.

$ ./bin/llama-perplexity -f /workspace/wikitext-2-raw/wiki.test.raw -m /workspace/gguf/LFM2-350M-BF16.gguf --chunks 10

On master, this fails with an assertion

/workspace/llama.cpp/ggml/src/ggml.c:3740: GGML_ASSERT(mask->ne[1] >= a->ne[1]) failed

With this PR, this is no longer a problem. I've tested this with https://huggingface.co/LiquidAI/LFM2-350M.


Make sure to read the contributing guidelines before submitting a PR

@compilade compilade requested a review from ggerganov August 3, 2025 05:05
@compilade compilade added the bugfix fixes an issue or bug label Aug 3, 2025
@CISC CISC merged commit 11a3811 into master Aug 3, 2025
45 of 47 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix fixes an issue or bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants