UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation) by loci-dev · Pull Request #1165 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-12T02:17:59Z

Note

Source pull request: ggml-org/llama.cpp#19531

Make sure to read the contributing guidelines before submitting a PR

The current implementation has incorrect conv state update such that it has state corruption when running parallel in llama-server. This is fixed in this PR.

./build/bin/llama-server -c 16384 --parallel 8 --mmap -m ~/Kimi-Linear-48B-A3B-Instruct-GGUF/Kimi-Linear-48B-A3B-Instruct-jp-imatrix.IQ3_M.gguf -ngl 100

This PR also includes the block implementation that speeds up 20% pp and VRAM saving.

…variable warning

…imiLinear

…t for faster inference. sync'd to b7682

ymcki and others added 30 commits December 2, 2025 08:35

kimi linear model implementation

27baad4

kimi linear convert_hf_to_gguf

84f822c

kimi linear constants.py tensor_mapping.py

57cca52

Kimi Linear ggml.h

6167f39

kimi linear ggml-cpu

26a6553

Kimi Linear ggml-cuda

bf42bc0

Kimi Linear ggml.c

d73d3e5

kimi linear src/llama

e308026

remove "const int64_t n_seq_tokens = q->ne[2];" to get rid of unused …

139548d

…variable warning

remove type mismatch warning

83d328d

read MoE params

772ca88

removed some hard coded code

9f1265f

removed all hard code

a0269af

use DeepseekV2 tokenizer

ef5bc30

removed unnecessary internal methods called by the old set_vocab of K…

ae9771d

…imiLinear

rewrite get_vocab for KimiLinear. Removed all kda_scan code

f9a11d7

removed all traces of kda_scan

776294c

reduce OP count by 1 due to removal of kda_scan

f67a42d

Move KIMI_LINEAR to llm_arch_is_hybrid to enable KV cache

f85e5c7

set n_embd_head_k/v to ensure kv cache works

8bd617e

don't quantize conv1d of Kimi Linear

a4020d8

Kimi Linear backend agnostic

66c0c5d

removed LOG_INFO

aba181e

naive chunking form implemented

cfed14e

fixed some comments

e3542ff

add Kimi-K2 specific tokens to be recognized as EOG

67bee56

sync fork from b7240 to b7243

30d883c

Merge branch 'ggml-org:master' into Kimi-Linear

40f6118

build_kda_autoregressive is implemented to replace build_kda_recurren…

1099cbf

…t for faster inference. sync'd to b7682

replaced Akk and Aqk with mul_mat and clamp

f99913d

loci-dev force-pushed the main branch 9 times, most recently from 6495042 to 61b4303 Compare February 28, 2026 02:16

loci-dev force-pushed the main branch 9 times, most recently from 0db6c47 to 8019888 Compare March 8, 2026 02:17

loci-dev force-pushed the main branch 9 times, most recently from 6fa8e23 to f2637dc Compare March 15, 2026 02:18

loci-dev force-pushed the main branch 3 times, most recently from 3c7b997 to 5ac00d6 Compare March 17, 2026 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation)#1165

UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation)#1165
loci-dev wants to merge 94 commits intomainfrom
loci/pr-19531-Kimi-Linear

loci-dev commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loci-dev commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants