Skip to content

Conversation

@hksdpc255
Copy link
Contributor

Port from ggml-org/llama.cpp#17376

Closes #955 (comment)

@fernandaspets
Copy link

yeah I saw the mainline PR has changed another file (common/chat-parser-xml-toolcall.cpp). Other the rest was merged in another PR a yesterday or the day before

@magikRUKKOLA
Copy link

LGTM

@calvin2021y
Copy link

Unfortunately, this PR still hasn’t solved the problem. I’ve tested it on Zed, Jan.ai, and Claude Code Router. GLM, GPT‑OSS, and the like all work stably, but Kimi still cannot reliably invoke the tool. (not always work)

I used ubergarm/Kimi‑K2‑Thinking‑Q4_X for testing, and it’s the most intelligent model I’ve seen so far.

Zed some time result like this:

Screenshot 2025-11-25 at 02 13 00

Claude Code

Screenshot 2025-11-25 at 02 13 56

glm, gpt-oss, qwen3-coder never show this kink error, even for small size model.

@magikRUKKOLA
Copy link

@calvin2021y

That is, it introduced the ':' in the beginning of the tool call?
I am asking this because I've seen something similar today (but only once), after updating to the latest master. Never noticed such thing before though.

@magikRUKKOLA
Copy link

@calvin2021y

Actually, I just discovered that ':' is a null-operator in bash. It's always result in exit code = 0. So if the model is doing like:

: && echo ok

this is finicky but perfectly fine.

The only problem if the model just outputs the ':' and that's it. That would be an error. But you didn't provide any prompt so we can't debug.

@magikRUKKOLA
Copy link

magikRUKKOLA commented Nov 24, 2025

@calvin2021y

but Kimi still cannot reliably invoke the tool. (not always work)

For starters, I don't see the exact logs. What tool call the LLM is trying to perform? The:

: && ls -la

?

That's a perfectly fine tool call. You can see it for yourself if you would run it via bash.

@ikawrakow
Copy link
Owner

@magikRUKKOLA

@calvin2021y tends to claim that something does not work to then not responds to requests for clarification. If there aren't more details by tomorrow morning I'll just delete the comment. In the meantime you can safely ignore it.

@calvin2021y
Copy link

calvin2021y commented Nov 24, 2025

load/unload it take more than 10m, and test take more time.

It is hard for me to test since I am not sure how to record the request body of zed. ik_llama.cpp --log-enable seems not show raw request body.

@magikRUKKOLA
Copy link

@calvin2021y

load/unload it take more than 10m, and test take more time.

You do know about the --mlock option, right?

@magikRUKKOLA
Copy link

@calvin2021y

--log-enable seems not show raw request body.

Try this:

--verbose-prompt --verbosity 2

@sousekd
Copy link

sousekd commented Nov 24, 2025

I just tested @ubergarm's Kimi-K2-Thinking Q4_X on the latest main branch using a fresh install of Jan.AI with the MCP tools "sequential-thinking" (built-in) and Tavily (search, extract, crawl) enabled, and it was able to reason, search the web multiple times, extract relevant content from the news site, and return the results. It was also able to call the tools in subsequent requests, and the responses did not include the extra <|im_end|>.

I am running ik-llama with:

 --jinja
 --chat-template-file /mnt/templates/Kimi-K2-Thinking.txt
 --special

I took the template file from llama.cpp repo. The original one from Moonshot AI mentioned on @ubergarm's model README file did not work.

I am not at all discounting @calvin2021y's or @Lissanro's findings in #955; I'm just adding another data point.

Many thanks to everyone involved.

@hksdpc255 hksdpc255 deleted the kimi-k2-fix branch November 25, 2025 07:42
@magikRUKKOLA
Copy link

@calvin2021y

Oh now I see what you were talking about. I spent some time with various tool calling and K2-Thinking and I can say that yes, not only the problem of empty tool calls exist but sometimes instead of ':' bash operator it uses the '>' redirection which empties the files it tries to write into which looks like a madness. Sometimes its doing some empty tool calls etc. At this point I am not sure at all if its a problem of ik_llama.cpp implementation or its the problem of the model itself. Do we have a baseline anywhere?

@magikRUKKOLA
Copy link

magikRUKKOLA commented Nov 25, 2025

@calvin2021y

Oh this is crazy. What a stupid tool calls its doing right now?


  Let me write the code now.I'll create a simplified version that removes
  ncurses and interactive mode while preserving streaming syntax highlighting
  and all other functionality. Let me start by removing the UI manager files
  and updating the core components: <tool_call
  id="lgnIv9MzRKFthwmkpKR4bdOVfAqPp7r3" name="execute_root_bash">

    >/root/utils/hilow/high/ui_manager.hpp

  </tool_call>

  <tool_response tool_call_id="lgnIv9MzRKFthwmkpKR4bdOVfAqPp7r3"
  name="execute_root_bash">

    exit code: 0

  </tool_response>

[EDIT]: sometimes instead of '> ' its doing ': && '. WTF is that? This is embarrassing.

@magikRUKKOLA
Copy link

@calvin2021y

Uh oh! You were completely right! I do see the stupid empty tool calls. Example:


  But the error says line 27 has an issue. That means the actual file on disk
  might be different from what was shown. Let me check.

  Actually, looking at the  fix.log  more carefully:

    In file included from main.cpp:5:
    ui_manager.hpp:27:9: error: 'SyntaxHighlighter' does not name a type
       27 |         SyntaxHighlighter::StreamingHighlighter highlighter;

  This suggests that  ui_manager.hpp  is NOT empty on disk, but contains
  something at line 27. The user might have shown me the intended content, but
  the actual file on disk is different.

  Let me check what's actually in these files:Looking at the error logs, I can
  see two main issues:

  1. Config class mismatch:  config.cpp  implements  is_interactive()  and
  set_interactive() , but they're not declared in  config.hpp
  2. SyntaxHighlighter errors: The compiler can't find
  SyntaxHighlighter::StreamingHighlighter

  Let me check what's actually on disk: <tool_call
  id="anUAOx3Rf0POFUQrC4I4I56SDqLvuNWO" name="execute_root_bash">



  </tool_call>

  <tool_response tool_call_id="anUAOx3Rf0POFUQrC4I4I56SDqLvuNWO"
  name="execute_root_bash">

    Error: Empty root_bash code provided

  </tool_response>

  Looking at the error messages in  bug.log :

    config.cpp:113:6: error: no declaration matches ‘bool
  Config::is_interactive(

@magikRUKKOLA
Copy link

Okay. I suggest we do the following.

There is two approaches.

  1. Collecting the input data (seed including) and manually observing the bugs like an empty tool calls during the flow etc.
  2. Agreeing on the benchmark of some kind to automatically test if a certain commit introduced a bug.

What should we pursue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support for Kimi K2 Thinking tool calling

7 participants