enforce response_format and json_schema for Kimi K2 by akoumjian · Pull Request #18851 · ggml-org/llama.cpp

akoumjian · 2026-01-15T03:01:19Z

This PR aims to fix an issue I encountered using response_format with Kimi K2 Instruct 0905.
Using the /v1/chat/completions endpoint in llama-server I noticed that I was receiving responses which were not adhereing to the submitted json_schema.

Simplest reproduction:

Build llama.cpp without this PR's changes
Download a version of https://huggingface.co/unsloth/Kimi-K2-Instruct-0905
Start llama-server. Do not manually specify a chat template file.

  ./build/bin/llama-server \
    --host 127.0.0.1 --port 5840 \
    --model /path/to/Kimi-K2-Instruct-0905-...-00001-of-00013.gguf \
    --ctx-size 8192 \
    --n-gpu-layers 0

Send a request that contains a json_schema as part of response_format

curl -sS http://127.0.0.1:5840/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d ' {
      "model": "any",
      "temperature": 0.1,
      "max_tokens": 64,
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "schema": {
            "type": "object",
            "properties": {"ok": {"type": "boolean"}},
            "required": ["ok"],
            "additionalProperties": false
          }
        }
      },
      "messages": [
        {"role": "system", "content": "Return the JSON wrapped in a ```json code fence```."},
        {"role": "user", "content": "Return ok=true as a JSON struct."}
      ]
    } '

You are likely to receive a response like the following, backticks and json declaration included:

  ```json
      {"ok": true}
  ```

This should not be possible if grammar is being created and enforced.

The issue is that for chat completions, when the Kimi format is detected it routes to the kimi based handler (common_chat_params_init_kimi_k2). This handler did not follow the same behavior as the generic handler which would generate a grammar for a schema in response_format, it only handled tool grammars.

This revealed a second issue with the Kimi flags which included open bracket in tool separator and closing bracket in tool end. Due to those characters being in template tag definitions, the trim_suffix call was removing the ending bracket and producing invalid JSON strings, e.g. {"ok": true. I have modified the trim_suffix approach, but it is ugly and I'm hoping someone with better intuition will have a better solution. I see there is an Autoparser PR (#18675) but I have tested it and it does not resolve the original issue.

AI was used in the following ways for this PR:

Locating and describing the source of the issue, after some iterative debugging sessions
Suggesting possible fixes, namely included the grammar creation in the kimi specific path and following that how I might check for which characters to trim.
I did have it create the boilerplate for the regression test, but all it's really doing is determining that the custom kimi path was selected and that a grammar was created

As requested, I ran the whole test suite which passed. Perplexity obviously not affected.

pwilkin · 2026-01-15T12:39:26Z

Thanks for the feedback regarding the autoparser, I'll make sure to verify the json_schema / grammar generation paths.

akoumjian · 2026-02-12T15:46:06Z

Please let me know if there's anything else to consider here. I believe it is highly likely there are other code paths that are not properly enforcing the response_format grammars, given that this logic is implemented in many specific model conditionals.

akoumjian requested review from ggerganov and pwilkin as code owners January 15, 2026 03:01

github-actions bot added the testing Everything test related label Jan 15, 2026

loci-dev mentioned this pull request Jan 15, 2026

UPSTREAM PR #18851: enforce response_format and json_schema for Kimi K2 auroralabs-loci/llama.cpp#925

Open

enforce response_format and json_schema for Kimi K2

3829263

akoumjian force-pushed the fix/kimi-chat-response-format-grammar branch from 53ddc91 to 3829263 Compare January 27, 2026 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enforce response_format and json_schema for Kimi K2#18851

enforce response_format and json_schema for Kimi K2#18851
akoumjian wants to merge 1 commit intoggml-org:masterfrom
akoumjian:fix/kimi-chat-response-format-grammar

akoumjian commented Jan 15, 2026

Uh oh!

pwilkin commented Jan 15, 2026

Uh oh!

akoumjian commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akoumjian commented Jan 15, 2026

Uh oh!

pwilkin commented Jan 15, 2026

Uh oh!

akoumjian commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants