Skip to content

[Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks#37384

Open
karanb192 wants to merge 1 commit intovllm-project:mainfrom
karanb192:fix/kimi-k25-parser-critical-fixes
Open

[Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks#37384
karanb192 wants to merge 1 commit intovllm-project:mainfrom
karanb192:fix/kimi-k25-parser-critical-fixes

Conversation

@karanb192
Copy link
Copy Markdown

@karanb192 karanb192 commented Mar 18, 2026

Summary

Fixes three critical issues in the Kimi-K2.5 tool parser (gh-37184):

Key changes

Area Before After
Buffer limit 8 KB hardcoded 512 KB, env-var configurable
Marker buffer 1024 bytes, unbounded growth 256 bytes, capped
Argument diffing Delta-text heuristics Rebuilt from current_text
State helpers Inline, scattered _diff_and_emit_args, _try_emit_name, _handle_call_end
Section-begin check (non-streaming) Only plural variant All variants (singular + plural)
Streaming regex No re.DOTALL on portion regex re.DOTALL for multi-line args
Single-delta tool calls Silently dropped Handled via Phase B late setup

Files changed

  • vllm/tool_parsers/kimi_k2_tool_parser.py - Parser rewrite
  • tests/tool_parsers/test_kimi_k2_tool_parser.py - Updated tests for new buffer limits + safety valve test

Closes #37184
Related: #34442, #36763, #36969

Test plan

  • All existing unit tests pass (updated for new behavior)
  • test_large_tool_args_no_forced_exit - verifies 10 KB payloads are NOT truncated
  • test_malformed_tool_section_safety_valve - verifies configurable safety valve works
  • test_complete_tool_call_single_delta - verifies complete tool calls in single delta are emitted
  • test_token_leak_between_section_and_tool_begin - verifies no token leakage
  • test_streaming_tool_call_markers_not_leaked - verifies no marker leakage
  • test_streaming_multiple_tool_calls_not_leaked - verifies multi-tool scenarios
  • Pre-commit hooks pass (ruff check, ruff format, mypy, SPDX headers)
  • Rebased on latest main (no merge conflicts)
  • Integration test with K2-Vendor-Verifier (requires GPU)

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 18, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @karanb192.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase bug Something isn't working labels Mar 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors the KimiK2ToolParser to enhance its correctness and support larger payloads, addressing several known issues. Key changes include implementing a single source of truth for streaming state, introducing a configurable safety valve for tool section size (default 512 KB), and improving marker handling to prevent token leakage. The streaming logic has been rewritten into a phased approach for clearer state management and more robust tool call parsing. A critical review comment highlights that the new argument diffing logic in _diff_and_emit_args does not correctly handle non-monotonic changes, which could lead to client-side JSON corruption, and suggests logging an error and returning None in such scenarios.

Comment on lines +228 to +229
else:
new_part = cur_args
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current diffing logic for tool arguments does not correctly handle non-monotonic changes, which can occur if the model backtracks or revises its output. In such cases, cur_args may no longer start with already. The current implementation sends the full cur_args as a delta, which will likely corrupt the client's state if the client simply appends argument deltas, leading to malformed JSON.

For example, if the client has accumulated {"a": 1 and the model backtracks to produce {"b": 2, this logic would send {"b": 2 as the delta. The client would append this, resulting in the invalid JSON {"a": 1{"b": 2.

To prevent client-side corruption, it is safer to avoid sending a potentially corrupting delta. I suggest logging an error and returning None in this scenario.

        else:
            # This indicates a non-monotonic change in the tool arguments, which
            # can happen if the model backtracks. Simply sending the new
            # arguments as a delta will likely corrupt the client's state, as
            # clients typically append argument deltas.
            logger.error(
                "Non-monotonic change detected in tool call arguments. "
                "Previous: %r, Current: %r. Aborting delta to prevent "
                "client-side corruption.",
                already,
                cur_args,
            )
            # Returning None to prevent sending a corrupting delta. A more
            # advanced implementation might compute a proper diff or signal a
            # reset to the client.
            return None

@mergify mergify bot removed the needs-rebase label Mar 18, 2026
…nd token leaks

Rewrite the streaming logic in KimiK2ToolParser to fix three critical
issues reported in vllm-projectgh-37184. Increases buffer limit from 8KB to 512KB
(configurable), fixes token leakage between reasoning and tool sections,
and rebuilds argument diffing from current_text for near-100% accuracy.

Closes: vllm-projectgh-37184
Related: vllm-projectgh-34442, vllm-projectgh-36763, vllm-projectgh-36969
Signed-off-by: karanb192 <karan@example.com>
erkintelnyx pushed a commit to erkintelnyx/vllm that referenced this pull request Mar 18, 2026
erkintelnyx pushed a commit to erkintelnyx/vllm that referenced this pull request Mar 19, 2026
@mergify mergify bot added the tool-calling label Mar 20, 2026
saifmb0 added a commit to saifmb0/vllm that referenced this pull request Mar 28, 2026
…ewline in tool call ID (vllm-project#38441)

The model occasionally emits a stray \n between <|tool_call_begin|>
and the function name, e.g.:

    <|tool_call_begin|>
    functions.edit:15<|tool_call_argument_begin|>{...}

Because Python regex does not match \n with . by default, both
stream_tool_call_portion_regex and stream_tool_call_name_regex
silently failed to match, causing the entire tool call to be dropped
during streaming.

Fix:
- Add a leading \s* to both streaming regexes so any leading
  whitespace/newlines before the tool_call_id are consumed.
- Compile both regexes with re.DOTALL so . inside the capture group
  spans newlines.

This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without
leading \s*) to the portion regex and does not fix stream_tool_call_name_regex.

Tests added:
- test_stream_tool_call_portion_regex_handles_leading_newline: unit
  test that both regexes match inputs with a leading \n.
- test_streaming_tool_call_with_newline_after_begin_token: end-to-end
  streaming simulation reproducing the exact scenario in the issue.

Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504,
vllm-project#24847, vllm-project#26918, vllm-project#36891. None add the leading \s* prefix to handle
whitespace/newlines preceding the tool_call_id capture group, and none
fix stream_tool_call_name_regex with re.DOTALL.

Co-authored-by: GitHub Copilot
saifmb0 added a commit to saifmb0/vllm that referenced this pull request Mar 28, 2026
…ewline in tool call ID (vllm-project#38441)

The model occasionally emits a stray \n between <|tool_call_begin|>
and the function name, e.g.:

    <|tool_call_begin|>
    functions.edit:15<|tool_call_argument_begin|>{...}

Because Python regex does not match \n with . by default, both
stream_tool_call_portion_regex and stream_tool_call_name_regex
silently failed to match, causing the entire tool call to be dropped
during streaming.

Fix:
- Add a leading \s* to both streaming regexes so any leading
  whitespace/newlines before the tool_call_id are consumed.
- Compile both regexes with re.DOTALL so . inside the capture group
  spans newlines.

This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without
leading \s*) to the portion regex and does not fix stream_tool_call_name_regex.

Tests added:
- test_stream_tool_call_portion_regex_handles_leading_newline: unit
  test that both regexes match inputs with a leading \n.
- test_streaming_tool_call_with_newline_after_begin_token: end-to-end
  streaming simulation reproducing the exact scenario in the issue.

Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504,
vllm-project#24847, vllm-project#26918, vllm-project#36891. None add the leading \s* prefix to handle
whitespace/newlines preceding the tool_call_id capture group, and none
fix stream_tool_call_name_regex with re.DOTALL.

Co-authored-by: GitHub Copilot
saifmb0 added a commit to saifmb0/vllm that referenced this pull request Mar 28, 2026
…ewline in tool call ID (vllm-project#38441)

The model occasionally emits a stray \n between <|tool_call_begin|>
and the function name, e.g.:

    <|tool_call_begin|>
    functions.edit:15<|tool_call_argument_begin|>{...}

Because Python regex does not match \n with . by default, both
stream_tool_call_portion_regex and stream_tool_call_name_regex
silently failed to match, causing the entire tool call to be dropped
during streaming.

Fix:
- Add a leading \s* to both streaming regexes so any leading
  whitespace/newlines before the tool_call_id are consumed.
- Compile both regexes with re.DOTALL so . inside the capture group
  spans newlines.

This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without
leading \s*) to the portion regex and does not fix stream_tool_call_name_regex.

Tests added:
- test_stream_tool_call_portion_regex_handles_leading_newline: unit
  test that both regexes match inputs with a leading \n.
- test_streaming_tool_call_with_newline_after_begin_token: end-to-end
  streaming simulation reproducing the exact scenario in the issue.

Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504,
vllm-project#24847, vllm-project#26918, vllm-project#36891. None add the leading \s* prefix to handle
whitespace/newlines preceding the tool_call_id capture group, and none
fix stream_tool_call_name_regex with re.DOTALL.

Co-authored-by: GitHub Copilot
saifmb0 added a commit to saifmb0/vllm that referenced this pull request Mar 28, 2026
…ewline in tool call ID (vllm-project#38441)

The model occasionally emits a stray \n between <|tool_call_begin|>
and the function name, e.g.:

    <|tool_call_begin|>
    functions.edit:15<|tool_call_argument_begin|>{...}

Because Python regex does not match \n with . by default, both
stream_tool_call_portion_regex and stream_tool_call_name_regex
silently failed to match, causing the entire tool call to be dropped
during streaming.

Fix:
- Add a leading \s* to both streaming regexes so any leading
  whitespace/newlines before the tool_call_id are consumed.
- Compile both regexes with re.DOTALL so . inside the capture group
  spans newlines.

This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without
leading \s*) to the portion regex and does not fix stream_tool_call_name_regex.

Tests added:
- test_stream_tool_call_portion_regex_handles_leading_newline: unit
  test that both regexes match inputs with a leading \n.
- test_streaming_tool_call_with_newline_after_begin_token: end-to-end
  streaming simulation reproducing the exact scenario in the issue.

Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504,
whitespace/newlines preceding the tool_call_id capture group, and none
fix stream_tool_call_name_regex with re.DOTALL.

Co-authored-by: GitHub Copilot
Signed-off-by: saif <contact@saifmb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug]: Kimi-K2.5 Tool Parser Critical Issues - 87% Accuracy, 8KB Limit, Token Leakage

1 participant