Skip to content

enforce response_format and json_schema for Kimi K2#18851

Open
akoumjian wants to merge 1 commit intoggml-org:masterfrom
akoumjian:fix/kimi-chat-response-format-grammar
Open

enforce response_format and json_schema for Kimi K2#18851
akoumjian wants to merge 1 commit intoggml-org:masterfrom
akoumjian:fix/kimi-chat-response-format-grammar

Conversation

@akoumjian
Copy link

This PR aims to fix an issue I encountered using response_format with Kimi K2 Instruct 0905.
Using the /v1/chat/completions endpoint in llama-server I noticed that I was receiving responses which were not adhereing to the submitted json_schema.

Simplest reproduction:

  1. Build llama.cpp without this PR's changes
  2. Download a version of https://huggingface.co/unsloth/Kimi-K2-Instruct-0905
  3. Start llama-server. Do not manually specify a chat template file.
  ./build/bin/llama-server \
    --host 127.0.0.1 --port 5840 \
    --model /path/to/Kimi-K2-Instruct-0905-...-00001-of-00013.gguf \
    --ctx-size 8192 \
    --n-gpu-layers 0
  1. Send a request that contains a json_schema as part of response_format
curl -sS http://127.0.0.1:5840/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d ' {
      "model": "any",
      "temperature": 0.1,
      "max_tokens": 64,
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "schema": {
            "type": "object",
            "properties": {"ok": {"type": "boolean"}},
            "required": ["ok"],
            "additionalProperties": false
          }
        }
      },
      "messages": [
        {"role": "system", "content": "Return the JSON wrapped in a ```json code fence```."},
        {"role": "user", "content": "Return ok=true as a JSON struct."}
      ]
    } '

You are likely to receive a response like the following, backticks and json declaration included:

  ```json
      {"ok": true}
  ```

This should not be possible if grammar is being created and enforced.

The issue is that for chat completions, when the Kimi format is detected it routes to the kimi based handler (common_chat_params_init_kimi_k2). This handler did not follow the same behavior as the generic handler which would generate a grammar for a schema in response_format, it only handled tool grammars.

This revealed a second issue with the Kimi flags which included open bracket in tool separator and closing bracket in tool end. Due to those characters being in template tag definitions, the trim_suffix call was removing the ending bracket and producing invalid JSON strings, e.g. {"ok": true. I have modified the trim_suffix approach, but it is ugly and I'm hoping someone with better intuition will have a better solution. I see there is an Autoparser PR (#18675) but I have tested it and it does not resolve the original issue.

AI was used in the following ways for this PR:

  • Locating and describing the source of the issue, after some iterative debugging sessions
  • Suggesting possible fixes, namely included the grammar creation in the kimi specific path and following that how I might check for which characters to trim.
  • I did have it create the boilerplate for the regression test, but all it's really doing is determining that the custom kimi path was selected and that a grammar was created

As requested, I ran the whole test suite which passed. Perplexity obviously not affected.

@pwilkin
Copy link
Contributor

pwilkin commented Jan 15, 2026

Thanks for the feedback regarding the autoparser, I'll make sure to verify the json_schema / grammar generation paths.

@akoumjian akoumjian force-pushed the fix/kimi-chat-response-format-grammar branch from 53ddc91 to 3829263 Compare January 27, 2026 14:51
@akoumjian
Copy link
Author

Please let me know if there's anything else to consider here. I believe it is highly likely there are other code paths that are not properly enforcing the response_format grammars, given that this logic is implemented in many specific model conditionals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants