-
Notifications
You must be signed in to change notification settings - Fork 20.2k
Kimi-Linear support (backend agnostic + MLA KV cache) #18755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 42 commits
Commits
Show all changes
84 commits
Select commit
Hold shift + click to select a range
27baad4
kimi linear model implementation
ymcki 84f822c
kimi linear convert_hf_to_gguf
ymcki 57cca52
kimi linear constants.py tensor_mapping.py
ymcki 6167f39
Kimi Linear ggml.h
ymcki 26a6553
kimi linear ggml-cpu
ymcki bf42bc0
Kimi Linear ggml-cuda
ymcki d73d3e5
Kimi Linear ggml.c
ymcki e308026
kimi linear src/llama
ymcki 139548d
remove "const int64_t n_seq_tokens = q->ne[2];" to get rid of unused …
ymcki 83d328d
remove type mismatch warning
ymcki 772ca88
read MoE params
ymcki 9f1265f
removed some hard coded code
ymcki a0269af
removed all hard code
ymcki ef5bc30
use DeepseekV2 tokenizer
ymcki ae9771d
removed unnecessary internal methods called by the old set_vocab of K…
ymcki f9a11d7
rewrite get_vocab for KimiLinear. Removed all kda_scan code
ymcki 776294c
removed all traces of kda_scan
ymcki f67a42d
reduce OP count by 1 due to removal of kda_scan
ymcki f85e5c7
Move KIMI_LINEAR to llm_arch_is_hybrid to enable KV cache
ymcki 8bd617e
set n_embd_head_k/v to ensure kv cache works
ymcki a4020d8
don't quantize conv1d of Kimi Linear
ymcki 66c0c5d
Kimi Linear backend agnostic
ymcki aba181e
removed LOG_INFO
ymcki cfed14e
naive chunking form implemented
ymcki e3542ff
fixed some comments
ymcki 67bee56
add Kimi-K2 specific tokens to be recognized as EOG
ymcki 30d883c
sync fork from b7240 to b7243
ymcki 40f6118
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki 1099cbf
build_kda_autoregressive is implemented to replace build_kda_recurren…
ymcki f99913d
replaced Akk and Aqk with mul_mat and clamp
ymcki 6977ddb
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki 6150bb7
no clamp version
ymcki d26fe50
Moved Aqk computation out of the loop
ymcki dce064c
fixed typo and split wkv_b into wk_b and wv_b
ymcki 426a82d
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki b9360c7
MLA KV cache support
ymcki 5f2b8dd
Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear
ymcki 10be797
Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…
ymcki 6ae66fc
fix trailing spaces
ymcki 93afbed
moved const llama_model & model; around to follow qwen3next format an…
ymcki 59182f5
fix trailing whitespace
ymcki 58d1ee5
removed traling whitespaces in empty line + make sure indentation is …
ymcki 4f6ef2c
try to make lint happy
ymcki 719d374
remove blank lines to make lint happy
ymcki ac85cb1
removed at least blank line containing white space
ymcki 4faf26c
fixed flake8 complaints locally
ymcki 22bc582
return ggml_tensor * pair in kda_autoregressive and kda_chunking as i…
ymcki 217e7ce
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki 6ba78d1
removed Kimi-Linear specific change that causes failure at server-win…
ymcki fe9d248
removed private: from kimi_linear to make build checks happy
ymcki 18ae7f4
removed unnecessary ggml_cont before ggml_reshape
ymcki 2882915
created static function causal_conv1d to abtract similar code for q/k/v
ymcki c163dff
sync fork and comment fixing in kimi-linear.cpp
ymcki 0aea18e
merged dt_bias to SSM_DT. Do -exp(log_A) in convert_hf_to_gguf.py.
ymcki f3d118d
reverted to original
ymcki c26c121
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki e87ac9b
Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear
ymcki 0298731
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki e55caf5
Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear
ymcki 560190a
fixed find_hparam calls. Fixed e_score_correction_bias to use bias in…
ymcki a8147a1
Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…
ymcki ae8d710
remove DT_B from constants.py. remove one comment line in llama-model…
ymcki 38c6f5e
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki 92f4949
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki 7fb54dd
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki bb02b5d
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki f1525b3
new class llm_graph_input_mem_hybrid_k to get around the new MLA chan…
ymcki 0de4680
remove ssm_o_norm_b
ymcki 0444a4f
remove ssm_o_norm_b
ymcki a6b2c45
changed hparams.kda_head_dim to hparams.n_embd_head_kda. added TODO c…
ymcki 6216273
removed all ggml_cont b4 ggml_reshape_4d
ymcki 005c340
Whitespace
pwilkin aaf05bd
replaced all hparams.get with find_hparams
ymcki 2a62df6
Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…
ymcki 2c8cd84
added new names for n_experts, n_experts_used and score_func in TextM…
ymcki 11282a0
use is_mla to switch between different mem_hybrid types
ymcki 4bb4286
fixed logical errors in convert_hf_to_gguf.py pointed out by CISC
ymcki 07f9979
Merge branch 'ggml-org:master' into Kimi-Linear
ymcki efaea45
removed if else for required parameters kv_lora_rank and qk_rope_head…
ymcki 000fded
add back ggml_cont for Vcur
ymcki 8ec5b08
minor changes
ymcki 82215a0
removed extra line in llama-vocab.cpp. Added back the comment in llam…
ymcki a82103e
f16 gguf cannot run without context length
ymcki 6456393
made a mistake of adding back n_ctx parsing
ymcki File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.