[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode by jscaldwell55 · Pull Request #28543 · vllm-project/vllm

jscaldwell55 · 2025-11-12T10:09:37Z

Summary

This PR fixes a bug in the streaming output routing for MoonshotAI/Kimi-K2#89 where special tokens and intermediate text can leak into the reasoning_delta field during streaming mode (stream=True) before the tool section is fully detected.

As identified in the issue and comment, agent frameworks like LangChain and AutoGPT fail when internal markers (e.g., <|tool_calls_section_begin|>) appear in reasoning_delta. This fix ensures reasoning_delta contains only natural-language reasoning text, aligning with the expected format from the Kimi K2 paper (Appendix B) and downstream SDKs.

The Problem

The parser's extract_tool_calls_streaming method lacked section-level state management, causing text between <|tool_calls_section_begin|> and <|tool_call_begin|> to emit as reasoning content when it should be suppressed.

Specific Leak Scenario

When the model streams output like:

"Reasoning... <|tool_calls_section_begin|> spurious text <|tool_call_begin|>..."

The deltas are processed as:

✅ "Reasoning... " → DeltaMessage(content="Reasoning... ") (correct)
✅ "<|tool_calls_section_begin|>" → stripped, no output (correct)
❌ " spurious text " → DeltaMessage(content=" spurious text ") (LEAKED!)
✅ "<|tool_call_begin|>..." → starts tool call parsing (correct)

Prior Behavior

The original condition at line 156-162 checked if tool call counts were balanced:

if cur_tool_start_count == cur_tool_end_count:  # Both are 0!
    return DeltaMessage(content=delta_text)  # ← LEAKS TEXT

This incorrectly assumed "balanced counts = reasoning mode", failing to account for being inside the tool section but before any tool call begins.

Solution

Core Changes

1. Added Section-Level State Machine

in_tool_section: bool flag to track if we're between <|tool_calls_section_begin|> and <|tool_calls_section_end|>
Explicit state transitions: REASONING ↔ TOOL_SECTION ↔ TOOL_CALL_ACTIVE

2. Implemented Rolling Buffer for Split Markers

token_buffer: str accumulates deltas to detect markers split across chunks
Example: "<|tool_calls_sec" + "tion_begin|>" → correctly detected
Buffer size limited to 1024 bytes with overflow protection (empirical worst-case for longest marker (~30 chars) * 2 + safety margin for unicode + partial overlap)
Added _buffer_overflow_logged flag to log overflow warning only once

3. Content Suppression Logic

if self.in_tool_section and cur_tool_start_count == 0:
    logger.debug("In tool section but no tool calls started yet. Suppressing: %s", delta_text)
    return DeltaMessage(content="")  # Suppresses leak while preserving return type

4. Marker Variant Support

Supports both <|tool_calls_section_begin|> (plural) and <|tool_call_section_begin|> (singular)
Handles potential format variations from model output

5. Error Recovery

Tracks section_char_count to detect malformed tool sections
Force-exits tool section if it exceeds 8192 chars without proper structure
Prevents indefinite content suppression from malformed output

6. State Reset Mechanism

Added reset_streaming_state() public method
Clears all state between requests to prevent leakage when parser is reused

7. Function Contract Preservation

Changed suppression logic to return DeltaMessage(content="") instead of None
Maintains consistent return type for downstream iterator patterns
Prevents potential breaking changes for consumers expecting DeltaMessage

Files Changed

Modified

vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py

Added section-level state variables (in_tool_section, token_buffer, section_char_count, buffer_max_size, max_section_chars, _buffer_overflow_logged)
Implemented _check_and_strip_markers() helper for buffer processing
Added _reset_section_state() and reset_streaming_state() methods
Updated extract_tool_calls_streaming() with:
- Buffer management (lines 187-266)
- Section state transitions (lines 222-236)
- Content suppression checks (lines 254-260, 343-346)
- Error recovery (lines 255-266)

tests/tool_use/test_kimi_k2_tool_parser.py

Added 9 new test cases covering:
- test_token_leak_between_section_and_tool_begin() - Main bug: leak prevention
- test_split_markers_across_deltas() - Buffer functionality
- test_marker_variants() - Singular/plural support
- test_reentry_to_reasoning_after_tool_section() - State transitions
- test_empty_tool_section() - Edge case
- test_malformed_tool_section_recovery() - Error recovery
- test_state_reset() - State management
- test_section_begin_noise_tool_begin_same_chunk() - Same-chunk suppression
- test_stream_ends_without_section_end_marker() - EOF handling

Testing

Added 9 new unit tests in tests/tool_use/test_kimi_k2_tool_parser.py (pytest-based)
Ran targeted tests locally: pytest -s -v tests/tool_use/test_kimi_k2_tool_parser.py – all pass
Ran broader suite: pytest -s -v tests/tool_use/ – no regressions
Verified on CPU; awaiting CI for GPU validation (as not all tests pass locally on CPU per guidelines)
Tests cover main leak bug, split markers, variants, state transitions, edge cases (empty/malformed), same-chunk suppression, EOF handling, and error recovery

Scope

Affects: Only KimiK2ToolParser in streaming mode
No impact: Other tool parsers, non-streaming mode, non-K2 models

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py

gemini-code-assist

Code Review

This pull request effectively addresses the token leakage bug by introducing a state machine and a buffer to manage streaming output. The changes are well-structured and accompanied by a comprehensive new test suite. However, I've identified a critical issue in the new state transition logic. It fails to handle cases where a tool section both begins and ends within the same data chunk, which can leave the parser in a corrupted state. I've provided a comment with a suggested fix for this logic and recommended adding a new test case to cover this scenario. Addressing this is crucial to ensure the fix is fully robust.

vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py

github-actions · 2025-11-12T10:17:26Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2025-11-12T10:28:27Z

Documentation preview: https://vllm--28543.org.readthedocs.build/en/28543/

- Add section-level state machine (in_tool_section flag) - Implement rolling buffer for split marker detection (1KB cap) - Suppress content between section_begin and tool_call_begin - Support marker variants (plural/singular) - Add error recovery for malformed sections (8KB limit) - Preserve function contract (always return DeltaMessage) - Fix critical bug vllm-project#1: Handle both begin/end markers in same chunk (Changed elif to if on line 237 to prevent state corruption) - Fix critical bug vllm-project#2: Defer section exit when tool_call_end present (Prevents dropping final tool arguments and token leakage) - Include 12 comprehensive tests (3 new tests for edge cases) Fixes bug where text between <|tool_calls_section_begin|> and <|tool_call_begin|> leaks into reasoning_delta during streaming mode. Also fixes two critical edge cases: 1. Section begin and end markers appearing in same chunk would leave parser stuck in in_tool_section=True, causing subsequent content to be incorrectly suppressed. 2. Tool_call_end and section_end in same chunk would cause early return before tool parsing, dropping final tool arguments and leaking special tokens into reasoning channel. Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>

chaunceyjiang · 2025-11-13T06:45:39Z

@jscaldwell55 Thanks~. LGTM.

Could you help test #24847 again?

jscaldwell55 · 2025-11-13T15:42:06Z

@chaunceyjiang

Tests were successful.

Run on Python 3.12.7:

pytest tests/tool_use/test_kimi_k2_tool_parser.py -v

Result: All 12 tests passed ✅
- All existing tests continue to pass
- 4 new tests for concatenated tool calls work correctly
- No regressions detected

✅ Compatibility Testing with PR #28543

Created a temporary branch merging both PRs to test for conflicts/interactions:
- PR #24847 Fixes concatenated tool calls regex
- PR #28543 Prevents token leakage in streaming mode

Result: All 24 combined tests passed ✅

As far as I can tell, both PRs can be safely merged.

chaunceyjiang · 2025-11-17T02:35:20Z

@jscaldwell55 Thanks~

chaunceyjiang

Thanks~

…eaming mode (vllm-project#28543) Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>

regismesquita · 2026-02-12T15:10:30Z

Seems like it is also happening on Kimi K2.5

jscaldwell55 · 2026-02-12T18:43:49Z

@regismesquita Interesting, thanks for heads-up. Will take a look

jscaldwell55 · 2026-02-13T00:42:05Z

Looks like they fixed it on the kimi end, might be caused from an issue somewhere else in the pipeline

…y streaming state The original streaming fix (PR vllm-project#28543) introduced a hard-coded 8KB section limit that truncates large tool call arguments, breaking coding use cases with Kimi-K2 and K2.5 models. This rewrite addresses the regression while preserving all existing behavior. Changes: - Replace hard-coded 8KB limit with configurable 512KB default via VLLM_KIMI_TOOL_PARSER_MAX_SECTION_CHARS environment variable - Consolidate 6 scattered instance variables into _StreamState dataclass - Replace 7 copy-pasted deferred section exit checks with single try/finally cleanup - Reduce rolling buffer from 1KB to 256 bytes (longest marker is 28 chars) - Add regression tests for large arguments, configurable limits, multi-turn reentry, and thinking+tools interleaving Signed-off-by: Jay Caldwell <jay.s.caldwell@gmail.com>

jscaldwell55 requested review from aarnphm and chaunceyjiang as code owners November 12, 2025 10:09

mergify bot added frontend tool-calling labels Nov 12, 2025

github-project-automation bot added this to Tool Calling Nov 12, 2025

chatgpt-codex-connector bot reviewed Nov 12, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py Outdated Show resolved Hide resolved

jscaldwell55 force-pushed the fix/kimi-k2-tool-parser-token-leak branch from d193e17 to 0427a6c Compare November 12, 2025 10:28

mergify bot added the documentation Improvements or additions to documentation label Nov 12, 2025

jscaldwell55 force-pushed the fix/kimi-k2-tool-parser-token-leak branch from 0427a6c to e188c9b Compare November 12, 2025 10:30

jscaldwell55 force-pushed the fix/kimi-k2-tool-parser-token-leak branch from e188c9b to c3a801a Compare November 12, 2025 10:36

chaunceyjiang self-assigned this Nov 13, 2025

This was referenced Nov 14, 2025

[BUG] Thinking shows xml (kimi-k2-thinking) RooCodeInc/Roo-Code#9172

Closed

Feature Request: Support for Kimi K2 Thinking tool calling ikawrakow/ik_llama.cpp#955

Closed

Bug: conversion to BF16 fails for Kimi K2 Thinking ikawrakow/ik_llama.cpp#942

Closed

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 17, 2025

chaunceyjiang enabled auto-merge (squash) November 17, 2025 05:54

chaunceyjiang approved these changes Nov 17, 2025

View reviewed changes

chaunceyjiang merged commit 6f37419 into vllm-project:main Nov 17, 2025
47 checks passed

github-project-automation bot moved this to Done in Tool Calling Nov 17, 2025

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser str…

7addd79

…eaming mode (vllm-project#28543) Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser str…

d9bf7c1

…eaming mode (vllm-project#28543) Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>

graham33 mentioned this pull request Jan 18, 2026

Kimi-K2 via OpenRouter fails with tool call tokens inside thinking block anomalyco/opencode#8851

Open

jscaldwell55 mentioned this pull request Mar 18, 2026

[Bugfix][Model] Fix Kimi K2 tool parser 8KB section limit and simplify streaming state #37445

Open

4 tasks

Uh oh!

Conversation

jscaldwell55 commented Nov 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem

Specific Leak Scenario

Prior Behavior

Solution

Files Changed

Testing

Scope

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

mergify bot commented Nov 12, 2025

Uh oh!

chaunceyjiang commented Nov 13, 2025

Uh oh!

jscaldwell55 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaunceyjiang commented Nov 17, 2025

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

regismesquita commented Feb 12, 2026

Uh oh!

jscaldwell55 commented Feb 12, 2026

Uh oh!

jscaldwell55 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jscaldwell55 commented Nov 12, 2025 •

edited by github-actions bot

Loading

jscaldwell55 commented Nov 13, 2025 •

edited

Loading