Skip to content

[Misc] Add 20 regression tests for 11 tool parser bug fixes#38172

Open
bbrowning wants to merge 1 commit intovllm-project:mainfrom
bbrowning:tool-parser-regression-tests
Open

[Misc] Add 20 regression tests for 11 tool parser bug fixes#38172
bbrowning wants to merge 1 commit intovllm-project:mainfrom
bbrowning:tool-parser-regression-tests

Conversation

@bbrowning
Copy link
Copy Markdown
Contributor

Purpose

Claude Code and I audited recent tool parser bug-fix PRs (Sept 2025 until now) and found that several landed without corresponding test coverage. This is purely additive test coverage to prevent regressions as we refactor, cleanup, and redesign some of these areas.

Test Plan

pytest -sv tests/tool_parsers

Test Result

All the new tests passed, and all the old ones continue to pass.

510 passed, 1 skipped, 42 xfailed, 2 warnings

Audited recent tool parser bug-fix PRs and found that several
landed without corresponding test coverage. Added unit tests
for each fix to prevent regressions.

- Mistral: fast detokenization text detection (PR vllm-project#37209)
- Qwen3Coder: malformed XML crash, anyOf double-encoding,
  speculative decode streaming (PRs vllm-project#36774, vllm-project#36032, vllm-project#35615)
- DeepSeekV32: delimiter preservation with fast detokenization,
  skip_special_tokens adjustment (PR vllm-project#33964)
- GLM-4 MoE: zero-argument tool calls, transformers 5.x delimiter
  handling, Unicode character preservation (PRs vllm-project#32321, vllm-project#31622, vllm-project#30920)
- MiniMax M2: anyOf nullable parameter handling for non-null and
  null values (PR vllm-project#32342)
- Step3p5: MTP-style variable-chunk and multi-token streaming
  (PR vllm-project#33690)
- Kimi K2: native tool call ID extraction and multi-turn ID
  continuity (PR vllm-project#32768)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models tool-calling bug Something isn't working labels Mar 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive suite of regression tests across multiple tool parsers, including DeepSeekV32, GLM4-MoE, KimiK2, MinimaxM2, Mistral, Qwen3, and Step3p5. These tests address various edge cases and potential issues such as delimiter preservation, skip_special_tokens logic, handling of zero-argument and malformed tool calls, Unicode character preservation, native tool call ID extraction, anyOf nullable parameter parsing, fast detokenization, and streaming behavior with multi-parameter and variable-sized chunks. A review comment highlights a malformed JSON string in a MinimaxM2 test, which needs to be corrected to ensure the test functions as intended.

@bbrowning
Copy link
Copy Markdown
Contributor Author

Gemini got confused counting JSON braces within the xml tags within the Python strings of the test, but I double-checked the test it highlighted just to be sure. And then gave it a thumbs-down for good measure, in case they use that to improve training.

Copy link
Copy Markdown
Contributor

@sfeng33 sfeng33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the thorough work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models qwen Related to Qwen models tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants