Skip to content
Merged
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
27baad4
kimi linear model implementation
ymcki Dec 2, 2025
84f822c
kimi linear convert_hf_to_gguf
ymcki Dec 2, 2025
57cca52
kimi linear constants.py tensor_mapping.py
ymcki Dec 2, 2025
6167f39
Kimi Linear ggml.h
ymcki Dec 2, 2025
26a6553
kimi linear ggml-cpu
ymcki Dec 2, 2025
bf42bc0
Kimi Linear ggml-cuda
ymcki Dec 2, 2025
d73d3e5
Kimi Linear ggml.c
ymcki Dec 2, 2025
e308026
kimi linear src/llama
ymcki Dec 2, 2025
139548d
remove "const int64_t n_seq_tokens = q->ne[2];" to get rid of unused …
ymcki Dec 2, 2025
83d328d
remove type mismatch warning
ymcki Dec 2, 2025
772ca88
read MoE params
ymcki Dec 2, 2025
9f1265f
removed some hard coded code
ymcki Dec 5, 2025
a0269af
removed all hard code
ymcki Dec 6, 2025
ef5bc30
use DeepseekV2 tokenizer
ymcki Dec 14, 2025
ae9771d
removed unnecessary internal methods called by the old set_vocab of K…
ymcki Dec 18, 2025
f9a11d7
rewrite get_vocab for KimiLinear. Removed all kda_scan code
ymcki Dec 18, 2025
776294c
removed all traces of kda_scan
ymcki Dec 18, 2025
f67a42d
reduce OP count by 1 due to removal of kda_scan
ymcki Dec 18, 2025
f85e5c7
Move KIMI_LINEAR to llm_arch_is_hybrid to enable KV cache
ymcki Jan 2, 2026
8bd617e
set n_embd_head_k/v to ensure kv cache works
ymcki Jan 3, 2026
a4020d8
don't quantize conv1d of Kimi Linear
ymcki Jan 3, 2026
66c0c5d
Kimi Linear backend agnostic
ymcki Jan 5, 2026
aba181e
removed LOG_INFO
ymcki Jan 5, 2026
cfed14e
naive chunking form implemented
ymcki Jan 6, 2026
e3542ff
fixed some comments
ymcki Jan 6, 2026
67bee56
add Kimi-K2 specific tokens to be recognized as EOG
ymcki Jan 6, 2026
30d883c
sync fork from b7240 to b7243
ymcki Jan 6, 2026
40f6118
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 7, 2026
1099cbf
build_kda_autoregressive is implemented to replace build_kda_recurren…
ymcki Jan 7, 2026
f99913d
replaced Akk and Aqk with mul_mat and clamp
ymcki Jan 8, 2026
6977ddb
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 9, 2026
6150bb7
no clamp version
ymcki Jan 9, 2026
d26fe50
Moved Aqk computation out of the loop
ymcki Jan 10, 2026
dce064c
fixed typo and split wkv_b into wk_b and wv_b
ymcki Jan 10, 2026
426a82d
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 11, 2026
b9360c7
MLA KV cache support
ymcki Jan 11, 2026
5f2b8dd
Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear
ymcki Jan 11, 2026
10be797
Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…
ymcki Jan 11, 2026
6ae66fc
fix trailing spaces
ymcki Jan 11, 2026
93afbed
moved const llama_model & model; around to follow qwen3next format an…
ymcki Jan 11, 2026
59182f5
fix trailing whitespace
ymcki Jan 11, 2026
58d1ee5
removed traling whitespaces in empty line + make sure indentation is …
ymcki Jan 11, 2026
4f6ef2c
try to make lint happy
ymcki Jan 11, 2026
719d374
remove blank lines to make lint happy
ymcki Jan 11, 2026
ac85cb1
removed at least blank line containing white space
ymcki Jan 12, 2026
4faf26c
fixed flake8 complaints locally
ymcki Jan 12, 2026
22bc582
return ggml_tensor * pair in kda_autoregressive and kda_chunking as i…
ymcki Jan 12, 2026
217e7ce
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 13, 2026
6ba78d1
removed Kimi-Linear specific change that causes failure at server-win…
ymcki Jan 13, 2026
fe9d248
removed private: from kimi_linear to make build checks happy
ymcki Jan 13, 2026
18ae7f4
removed unnecessary ggml_cont before ggml_reshape
ymcki Jan 13, 2026
2882915
created static function causal_conv1d to abtract similar code for q/k/v
ymcki Jan 14, 2026
c163dff
sync fork and comment fixing in kimi-linear.cpp
ymcki Jan 14, 2026
0aea18e
merged dt_bias to SSM_DT. Do -exp(log_A) in convert_hf_to_gguf.py.
ymcki Jan 16, 2026
f3d118d
reverted to original
ymcki Jan 16, 2026
c26c121
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 16, 2026
e87ac9b
Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear
ymcki Jan 16, 2026
0298731
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 21, 2026
e55caf5
Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear
ymcki Jan 21, 2026
560190a
fixed find_hparam calls. Fixed e_score_correction_bias to use bias in…
ymcki Jan 21, 2026
a8147a1
Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…
ymcki Jan 21, 2026
ae8d710
remove DT_B from constants.py. remove one comment line in llama-model…
ymcki Jan 21, 2026
38c6f5e
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 25, 2026
92f4949
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 26, 2026
7fb54dd
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 26, 2026
bb02b5d
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Jan 26, 2026
f1525b3
new class llm_graph_input_mem_hybrid_k to get around the new MLA chan…
ymcki Jan 27, 2026
0de4680
remove ssm_o_norm_b
ymcki Jan 27, 2026
0444a4f
remove ssm_o_norm_b
ymcki Jan 27, 2026
a6b2c45
changed hparams.kda_head_dim to hparams.n_embd_head_kda. added TODO c…
ymcki Jan 29, 2026
6216273
removed all ggml_cont b4 ggml_reshape_4d
ymcki Jan 29, 2026
005c340
Whitespace
pwilkin Jan 30, 2026
aaf05bd
replaced all hparams.get with find_hparams
ymcki Jan 31, 2026
2a62df6
Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…
ymcki Jan 31, 2026
2c8cd84
added new names for n_experts, n_experts_used and score_func in TextM…
ymcki Feb 1, 2026
11282a0
use is_mla to switch between different mem_hybrid types
ymcki Feb 1, 2026
4bb4286
fixed logical errors in convert_hf_to_gguf.py pointed out by CISC
ymcki Feb 3, 2026
07f9979
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Feb 3, 2026
efaea45
removed if else for required parameters kv_lora_rank and qk_rope_head…
ymcki Feb 3, 2026
000fded
add back ggml_cont for Vcur
ymcki Feb 3, 2026
8ec5b08
minor changes
ymcki Feb 3, 2026
82215a0
removed extra line in llama-vocab.cpp. Added back the comment in llam…
ymcki Feb 3, 2026
a82103e
f16 gguf cannot run without context length
ymcki Feb 4, 2026
6456393
made a mistake of adding back n_ctx parsing
ymcki Feb 5, 2026
17cd6e8
4x4 16x16 blocks computation for Akk and Aqk
ymcki Feb 6, 2026
97f229c
sync to latest plus replace chunkify with get_slice_2d
ymcki Feb 6, 2026
cc16e49
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Feb 6, 2026
06f0728
replace ggml_acc with ggml_set for vulkan compatibility
ymcki Feb 7, 2026
906abc3
Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear
ymcki Feb 7, 2026
3dfebbb
Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…
ymcki Feb 7, 2026
19cf704
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Feb 9, 2026
63a15e3
fix conv state update for llama-server parallel serving
ymcki Feb 12, 2026
b2d02ad
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Feb 12, 2026
6286253
revert back to normal implementation
ymcki Feb 13, 2026
a46782c
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki Feb 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions src/models/kimi-linear.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,11 @@ static ggml_tensor * causal_conv1d(ggml_cgraph * gf, ggml_context * ctx0, ggml_t
conv_x->nb[1], conv_x->nb[2], n_seq_tokens * conv_x->nb[0]);
ggml_build_forward_expand(gf,
ggml_cpy(ctx0, last_conv_x,
ggml_view_1d(ctx0, conv_states_all, conv_state_size * n_seqs,
(kv_head * n_embd_r_total + qkv * conv_state_size) * ggml_element_size(conv_states_all))));
ggml_view_3d(ctx0, conv_states_all,
d_conv - 1, d_inner, n_seqs,
(d_conv - 1) * ggml_element_size(conv_states_all), // nb1: contiguous within one channel's conv taps
n_embd_r_total * ggml_element_size(conv_states_all), // nb2: stride between sequences (skip over K,V states)
(kv_head * n_embd_r_total + qkv * conv_state_size) * ggml_element_size(conv_states_all)))); // offset to first seq's Q/K/V state
// Reshape conv weight: GGUF [d_conv, 1, d_inner, 1] -> ggml_ssm_conv expects [d_conv, d_inner]
// GGUF stores as [d_conv, 1, d_inner, 1] with memory layout w[conv_step + channel * d_conv]
// vLLM stores as [d_inner, d_conv] with memory layout w[channel * d_conv + conv_step]
Expand Down
Loading