[Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks#37384
[Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks#37384karanb192 wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Code Review
The pull request refactors the KimiK2ToolParser to enhance its correctness and support larger payloads, addressing several known issues. Key changes include implementing a single source of truth for streaming state, introducing a configurable safety valve for tool section size (default 512 KB), and improving marker handling to prevent token leakage. The streaming logic has been rewritten into a phased approach for clearer state management and more robust tool call parsing. A critical review comment highlights that the new argument diffing logic in _diff_and_emit_args does not correctly handle non-monotonic changes, which could lead to client-side JSON corruption, and suggests logging an error and returning None in such scenarios.
| else: | ||
| new_part = cur_args |
There was a problem hiding this comment.
The current diffing logic for tool arguments does not correctly handle non-monotonic changes, which can occur if the model backtracks or revises its output. In such cases, cur_args may no longer start with already. The current implementation sends the full cur_args as a delta, which will likely corrupt the client's state if the client simply appends argument deltas, leading to malformed JSON.
For example, if the client has accumulated {"a": 1 and the model backtracks to produce {"b": 2, this logic would send {"b": 2 as the delta. The client would append this, resulting in the invalid JSON {"a": 1{"b": 2.
To prevent client-side corruption, it is safer to avoid sending a potentially corrupting delta. I suggest logging an error and returning None in this scenario.
else:
# This indicates a non-monotonic change in the tool arguments, which
# can happen if the model backtracks. Simply sending the new
# arguments as a delta will likely corrupt the client's state, as
# clients typically append argument deltas.
logger.error(
"Non-monotonic change detected in tool call arguments. "
"Previous: %r, Current: %r. Aborting delta to prevent "
"client-side corruption.",
already,
cur_args,
)
# Returning None to prevent sending a corrupting delta. A more
# advanced implementation might compute a proper diff or signal a
# reset to the client.
return None…nd token leaks Rewrite the streaming logic in KimiK2ToolParser to fix three critical issues reported in vllm-projectgh-37184. Increases buffer limit from 8KB to 512KB (configurable), fixes token leakage between reasoning and tool sections, and rebuilds argument diffing from current_text for near-100% accuracy. Closes: vllm-projectgh-37184 Related: vllm-projectgh-34442, vllm-projectgh-36763, vllm-projectgh-36969 Signed-off-by: karanb192 <karan@example.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…ewline in tool call ID (vllm-project#38441) The model occasionally emits a stray \n between <|tool_call_begin|> and the function name, e.g.: <|tool_call_begin|> functions.edit:15<|tool_call_argument_begin|>{...} Because Python regex does not match \n with . by default, both stream_tool_call_portion_regex and stream_tool_call_name_regex silently failed to match, causing the entire tool call to be dropped during streaming. Fix: - Add a leading \s* to both streaming regexes so any leading whitespace/newlines before the tool_call_id are consumed. - Compile both regexes with re.DOTALL so . inside the capture group spans newlines. This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without leading \s*) to the portion regex and does not fix stream_tool_call_name_regex. Tests added: - test_stream_tool_call_portion_regex_handles_leading_newline: unit test that both regexes match inputs with a leading \n. - test_streaming_tool_call_with_newline_after_begin_token: end-to-end streaming simulation reproducing the exact scenario in the issue. Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504, vllm-project#24847, vllm-project#26918, vllm-project#36891. None add the leading \s* prefix to handle whitespace/newlines preceding the tool_call_id capture group, and none fix stream_tool_call_name_regex with re.DOTALL. Co-authored-by: GitHub Copilot
…ewline in tool call ID (vllm-project#38441) The model occasionally emits a stray \n between <|tool_call_begin|> and the function name, e.g.: <|tool_call_begin|> functions.edit:15<|tool_call_argument_begin|>{...} Because Python regex does not match \n with . by default, both stream_tool_call_portion_regex and stream_tool_call_name_regex silently failed to match, causing the entire tool call to be dropped during streaming. Fix: - Add a leading \s* to both streaming regexes so any leading whitespace/newlines before the tool_call_id are consumed. - Compile both regexes with re.DOTALL so . inside the capture group spans newlines. This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without leading \s*) to the portion regex and does not fix stream_tool_call_name_regex. Tests added: - test_stream_tool_call_portion_regex_handles_leading_newline: unit test that both regexes match inputs with a leading \n. - test_streaming_tool_call_with_newline_after_begin_token: end-to-end streaming simulation reproducing the exact scenario in the issue. Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504, vllm-project#24847, vllm-project#26918, vllm-project#36891. None add the leading \s* prefix to handle whitespace/newlines preceding the tool_call_id capture group, and none fix stream_tool_call_name_regex with re.DOTALL. Co-authored-by: GitHub Copilot
…ewline in tool call ID (vllm-project#38441) The model occasionally emits a stray \n between <|tool_call_begin|> and the function name, e.g.: <|tool_call_begin|> functions.edit:15<|tool_call_argument_begin|>{...} Because Python regex does not match \n with . by default, both stream_tool_call_portion_regex and stream_tool_call_name_regex silently failed to match, causing the entire tool call to be dropped during streaming. Fix: - Add a leading \s* to both streaming regexes so any leading whitespace/newlines before the tool_call_id are consumed. - Compile both regexes with re.DOTALL so . inside the capture group spans newlines. This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without leading \s*) to the portion regex and does not fix stream_tool_call_name_regex. Tests added: - test_stream_tool_call_portion_regex_handles_leading_newline: unit test that both regexes match inputs with a leading \n. - test_streaming_tool_call_with_newline_after_begin_token: end-to-end streaming simulation reproducing the exact scenario in the issue. Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504, vllm-project#24847, vllm-project#26918, vllm-project#36891. None add the leading \s* prefix to handle whitespace/newlines preceding the tool_call_id capture group, and none fix stream_tool_call_name_regex with re.DOTALL. Co-authored-by: GitHub Copilot
…ewline in tool call ID (vllm-project#38441) The model occasionally emits a stray \n between <|tool_call_begin|> and the function name, e.g.: <|tool_call_begin|> functions.edit:15<|tool_call_argument_begin|>{...} Because Python regex does not match \n with . by default, both stream_tool_call_portion_regex and stream_tool_call_name_regex silently failed to match, causing the entire tool call to be dropped during streaming. Fix: - Add a leading \s* to both streaming regexes so any leading whitespace/newlines before the tool_call_id are consumed. - Compile both regexes with re.DOTALL so . inside the capture group spans newlines. This is distinct from PR vllm-project#37384 which only adds re.DOTALL (without leading \s*) to the portion regex and does not fix stream_tool_call_name_regex. Tests added: - test_stream_tool_call_portion_regex_handles_leading_newline: unit test that both regexes match inputs with a leading \n. - test_streaming_tool_call_with_newline_after_begin_token: end-to-end streaming simulation reproducing the exact scenario in the issue. Why this is not a duplicate: checked open PRs vllm-project#37384, vllm-project#37445, vllm-project#32504, whitespace/newlines preceding the tool_call_id capture group, and none fix stream_tool_call_name_regex with re.DOTALL. Co-authored-by: GitHub Copilot Signed-off-by: saif <contact@saifmb.com>
Summary
Fixes three critical issues in the Kimi-K2.5 tool parser (gh-37184):
max_section_chars = 8192blocked tool calls with large JSON payloads (e.g., file content > 8 KB). The new default is 512 KB, configurable via theKIMI_PARSER_SECTION_MAXenvironment variable.!!!!!noise in reasoning output and</think>tag leaks. The rewrite properly suppresses all inter-section noise."}in delta_text) that broke with special characters, nested JSON, and split tokens. The rewrite rebuilds arguments fromcurrent_texton every delta using_diff_and_emit_args, making the diff always correct regardless of token boundaries.Key changes
current_text_diff_and_emit_args,_try_emit_name,_handle_call_endre.DOTALLon portion regexre.DOTALLfor multi-line argsFiles changed
vllm/tool_parsers/kimi_k2_tool_parser.py- Parser rewritetests/tool_parsers/test_kimi_k2_tool_parser.py- Updated tests for new buffer limits + safety valve testCloses #37184
Related: #34442, #36763, #36969
Test plan
test_large_tool_args_no_forced_exit- verifies 10 KB payloads are NOT truncatedtest_malformed_tool_section_safety_valve- verifies configurable safety valve workstest_complete_tool_call_single_delta- verifies complete tool calls in single delta are emittedtest_token_leak_between_section_and_tool_begin- verifies no token leakagetest_streaming_tool_call_markers_not_leaked- verifies no marker leakagetest_streaming_multiple_tool_calls_not_leaked- verifies multi-tool scenarios