chat : fix kimi-k2 chat template #14852

ngxson · 2025-07-24T09:41:41Z

Fix a mistake in kimi k2 chat template, the add_ass was put inside the the loop, which cause the formatted text to have assistant prompt in the wrong places

Also fix a wrong code ordering in llama-arch.cpp

jukofyork · 2025-07-24T09:52:20Z

So would the old code have added <|im_assistant|>assistant<|im_middle|> after each [EOS]?

I only ask as I'm still trying to get an answer on what the proper EOS token is:

https://huggingface.co/moonshotai/Kimi-K2-Instruct/discussions/31

and this might explain why this guy got:

Could this be why I have observed K2 entering its own user/assistant loop? I've never seen this phenomena before.

Here's an example I saved:
(snip)
That’s it—client-side GZIP plus SQL Server page compression gives two layers of shrink with zero external dependencies.<|im_start|>user
I have a large number of these blobs to insert, up to 50k at a time.
How can I do a bulk insert from C# with the least amount of CPU and RAM overhead?
<|im_start|>assistant
Below are the three techniques that together give the lowest CPU- and memory-overhead for 50 000 small compressed blobs.
(snip)
For clarity, all of the above tokens, including the im_start tokens, user and assistant words are from K2.

yet I never saw this (likely because I was using the --jinga option)?

CISC · 2025-07-24T09:57:47Z

It would add it after each message (when --jinja was not used), which certainly would confuse the model. :)

* origin/master: docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874) ggml : remove invalid portPos specifiers from dot files (ggml-org#14838) context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870) mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503) rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868) sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855) musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498) sync : ggml cmake : fix usage issues (ggml/1257) ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) context : perform output reorder lazily upon access after sync (ggml-org#14853) chat : fix kimi-k2 chat template (ggml-org#14852) sycl: fixed semantics of block offset calculation (ggml-org#14814) llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850) docs: add libcurl-dev install hint for Linux distros (ggml-org#14801) metal : fix fusion across different encoders (ggml-org#14849) sycl: fix undefined variable in work group size check (ggml-org#14843) convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823) CUDA: fix overflow in FA, tune performance (ggml-org#14840) CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)

chat : fix kimi-k2 chat template

9309fee

ngxson mentioned this pull request Jul 24, 2025

Model : Add support for Kimi-K2 #14654

Merged

CISC approved these changes Jul 24, 2025

View reviewed changes

CISC merged commit 820de57 into ggml-org:master Jul 24, 2025
49 of 51 checks passed

taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Jul 25, 2025

chat : fix kimi-k2 chat template (ggml-org#14852)

c1d4ffc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chat : fix kimi-k2 chat template #14852

chat : fix kimi-k2 chat template #14852

Uh oh!

ngxson commented Jul 24, 2025 •

edited

Loading

Uh oh!

jukofyork commented Jul 24, 2025

Uh oh!

CISC commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chat : fix kimi-k2 chat template #14852

chat : fix kimi-k2 chat template #14852

Uh oh!

Conversation

ngxson commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jukofyork commented Jul 24, 2025

Uh oh!

CISC commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Jul 24, 2025 •

edited

Loading