Skip to content

UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation)#1165

Open
loci-dev wants to merge 94 commits intomainfrom
loci/pr-19531-Kimi-Linear
Open

UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation)#1165
loci-dev wants to merge 94 commits intomainfrom
loci/pr-19531-Kimi-Linear

Conversation

@loci-dev
Copy link

Note

Source pull request: ggml-org/llama.cpp#19531

Make sure to read the contributing guidelines before submitting a PR

The current implementation has incorrect conv state update such that it has state corruption when running parallel in llama-server. This is fixed in this PR.

./build/bin/llama-server -c 16384 --parallel 8 --mmap -m ~/Kimi-Linear-48B-A3B-Instruct-GGUF/Kimi-Linear-48B-A3B-Instruct-jp-imatrix.IQ3_M.gguf -ngl 100

This PR also includes the block implementation that speeds up 20% pp and VRAM saving.

@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 6495042 to 61b4303 Compare February 28, 2026 02:16
@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 0db6c47 to 8019888 Compare March 8, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 6fa8e23 to f2637dc Compare March 15, 2026 02:18
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from 3c7b997 to 5ac00d6 Compare March 17, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants