[Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks by karanb192 · Pull Request #37384 · vllm-project/vllm

karanb192 · 2026-03-18T04:29:24Z

Summary

Fixes three critical issues in the Kimi-K2.5 tool parser (gh-37184):

8 KB hard limit -> 512 KB configurable ([Bug]: kimi k2 tool parser has a 8k buffer limit for args #34442): The old max_section_chars = 8192 blocked tool calls with large JSON payloads (e.g., file content > 8 KB). The new default is 512 KB, configurable via the KIMI_PARSER_SECTION_MAX environment variable.
Token leakage ([Bug]: Kimi-K2.5 outputs only '!!!!!!!!!!' in reasoning field, content is always null #36763, [Bug]: Kimi-K2.5 output a single </think> tag in the content when "enable_thinking" is set to false #36969): Text between section markers and tool-call markers leaked into the reasoning/content channel, causing the !!!!! noise in reasoning output and </think> tag leaks. The rewrite properly suppresses all inter-section noise.
87% -> near-100% accuracy ([Bug]: Kimi-K2.5 Tool Parser Critical Issues - 87% Accuracy, 8KB Limit, Token Leakage #37184): The old diff logic relied on fragile delta-text heuristics (e.g., searching for "} in delta_text) that broke with special characters, nested JSON, and split tokens. The rewrite rebuilds arguments from current_text on every delta using _diff_and_emit_args, making the diff always correct regardless of token boundaries.

Key changes

Area	Before	After
Buffer limit	8 KB hardcoded	512 KB, env-var configurable
Marker buffer	1024 bytes, unbounded growth	256 bytes, capped
Argument diffing	Delta-text heuristics	Rebuilt from `current_text`
State helpers	Inline, scattered	`_diff_and_emit_args`, `_try_emit_name`, `_handle_call_end`
Section-begin check (non-streaming)	Only plural variant	All variants (singular + plural)
Streaming regex	No `re.DOTALL` on portion regex	`re.DOTALL` for multi-line args
Single-delta tool calls	Silently dropped	Handled via Phase B late setup

Files changed

vllm/tool_parsers/kimi_k2_tool_parser.py - Parser rewrite
tests/tool_parsers/test_kimi_k2_tool_parser.py - Updated tests for new buffer limits + safety valve test

Closes #37184
Related: #34442, #36763, #36969

Test plan

mergify · 2026-03-18T04:30:00Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @karanb192.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

The pull request refactors the KimiK2ToolParser to enhance its correctness and support larger payloads, addressing several known issues. Key changes include implementing a single source of truth for streaming state, introducing a configurable safety valve for tool section size (default 512 KB), and improving marker handling to prevent token leakage. The streaming logic has been rewritten into a phased approach for clearer state management and more robust tool call parsing. A critical review comment highlights that the new argument diffing logic in _diff_and_emit_args does not correctly handle non-monotonic changes, which could lead to client-side JSON corruption, and suggests logging an error and returning None in such scenarios.

gemini-code-assist · 2026-03-18T04:37:06Z

vllm/tool_parsers/kimi_k2_tool_parser.py

+        else:
+            new_part = cur_args


The current diffing logic for tool arguments does not correctly handle non-monotonic changes, which can occur if the model backtracks or revises its output. In such cases, cur_args may no longer start with already. The current implementation sends the full cur_args as a delta, which will likely corrupt the client's state if the client simply appends argument deltas, leading to malformed JSON.

For example, if the client has accumulated {"a": 1 and the model backtracks to produce {"b": 2, this logic would send {"b": 2 as the delta. The client would append this, resulting in the invalid JSON {"a": 1{"b": 2.

To prevent client-side corruption, it is safer to avoid sending a potentially corrupting delta. I suggest logging an error and returning None in this scenario.

else: # This indicates a non-monotonic change in the tool arguments, which # can happen if the model backtracks. Simply sending the new # arguments as a delta will likely corrupt the client's state, as # clients typically append argument deltas. logger.error( "Non-monotonic change detected in tool call arguments. " "Previous: %r, Current: %r. Aborting delta to prevent " "client-side corruption.", already, cur_args, ) # Returning None to prevent sending a corrupting delta. A more # advanced implementation might compute a proper diff or signal a # reset to the client. return None

…nd token leaks Rewrite the streaming logic in KimiK2ToolParser to fix three critical issues reported in vllm-projectgh-37184. Increases buffer limit from 8KB to 512KB (configurable), fixes token leakage between reasoning and tool sections, and rebuilds argument diffing from current_text for near-100% accuracy. Closes: vllm-projectgh-37184 Related: vllm-projectgh-34442, vllm-projectgh-36763, vllm-projectgh-36969 Signed-off-by: karanb192 <karan@example.com>

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

…ewline in tool call ID (vllm-project#38441) The model occasionally emits a stray \n between <|tool_call_begin|> and the function name, e.g.: <|tool_call_begin|> functions.edit:15<|tool_call_argument_begin|>{...} Because Python regex does not match \n with . by default, both stream_tool_call_portion_regex and stream_tool_call_name_regex silently failed to match, causing the entire tool call to be dropped during streaming. Fix: - Add a leading \s* to both streaming regexes so any leading whitespace/newlines before the tool_call_id are consumed. - Compile both regexes with re.DOTALL so . inside the capture group spans newlines. This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without leading \s*) to the portion regex and does not fix stream_tool_call_name_regex. Tests added: - test_stream_tool_call_portion_regex_handles_leading_newline: unit test that both regexes match inputs with a leading \n. - test_streaming_tool_call_with_newline_after_begin_token: end-to-end streaming simulation reproducing the exact scenario in the issue. Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504, vllm-project#24847, vllm-project#26918, vllm-project#36891. None add the leading \s* prefix to handle whitespace/newlines preceding the tool_call_id capture group, and none fix stream_tool_call_name_regex with re.DOTALL. Co-authored-by: GitHub Copilot

…ewline in tool call ID (vllm-project#38441) The model occasionally emits a stray \n between <|tool_call_begin|> and the function name, e.g.: <|tool_call_begin|> functions.edit:15<|tool_call_argument_begin|>{...} Because Python regex does not match \n with . by default, both stream_tool_call_portion_regex and stream_tool_call_name_regex silently failed to match, causing the entire tool call to be dropped during streaming. Fix: - Add a leading \s* to both streaming regexes so any leading whitespace/newlines before the tool_call_id are consumed. - Compile both regexes with re.DOTALL so . inside the capture group spans newlines. This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without leading \s*) to the portion regex and does not fix stream_tool_call_name_regex. Tests added: - test_stream_tool_call_portion_regex_handles_leading_newline: unit test that both regexes match inputs with a leading \n. - test_streaming_tool_call_with_newline_after_begin_token: end-to-end streaming simulation reproducing the exact scenario in the issue. Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504, whitespace/newlines preceding the tool_call_id capture group, and none fix stream_tool_call_name_regex with re.DOTALL. Co-authored-by: GitHub Copilot Signed-off-by: saif <contact@saifmb.com>

mergify bot added needs-rebase bug Something isn't working labels Mar 18, 2026

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

mergify bot removed the needs-rebase label Mar 18, 2026

mergify bot added the tool-calling label Mar 20, 2026

github-project-automation bot added this to Tool Calling Mar 20, 2026

saifmb0 mentioned this pull request Mar 28, 2026

[Bugfix][Tool Parser] Fix Kimi-K2 streaming regex to handle leading newline before tool call ID #38443

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks#37384

[Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks#37384
karanb192 wants to merge 1 commit intovllm-project:mainfrom
karanb192:fix/kimi-k25-parser-critical-fixes

karanb192 commented Mar 18, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

karanb192 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Files changed

Test plan

Uh oh!

mergify bot commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

karanb192 commented Mar 18, 2026 •

edited

Loading