[CI/Build] Add common tool call parser test suite by bbrowning · Pull Request #27599 · vllm-project/vllm

bbrowning · 2025-10-27T18:57:45Z

Purpose

This adds a common test suite for tool call parsers and wires all of the existing tool call parsers that had no tests into the common suite. It doesn't yet adapt existing tool call parser tests to fit into the common suite nor augment tool call parsers that already had tests with the new set of common tests. Those tasks can come later, as this PR is already quite large.

Not all of the existing tool call parsers can pass every test in the common test suite. The ones that are not passing today are marked as xfail, and those represent opportunities to identify and fix gaps in all of these tool call parsers in the future until we get down to zero expected to fail tests within the common suite for each parser.

Given how many tests are here, the default_tokenizer fixture used by tests when tokenizing strings was also adjusted to be module-scoped, so we don't create a new version of that for every single test. That keeps test execution fast, and avoids the need to instantiate a new identical tokenizer for every individual test function.

I used Claude Code to help me write the example model outputs for every added test and to help write the initial version of the common set of tests based on existing patterns in our other tool call parser tests.

Test Plan

Run all the newly added tool parser tests:

pytest \
  tests/entrypoints/openai/tool_parsers/test_deepseekv3_tool_parser.py \
  tests/entrypoints/openai/tool_parsers/test_granite_20b_fc_tool_parser.py \
  tests/entrypoints/openai/tool_parsers/test_granite_tool_parser.py \
  tests/entrypoints/openai/tool_parsers/test_internlm2_tool_parser.py \
  tests/entrypoints/openai/tool_parsers/test_longcat_tool_parser.py \
  tests/entrypoints/openai/tool_parsers/test_phi4mini_tool_parser.py \
  tests/entrypoints/openai/tool_parsers/test_qwen3xml_tool_parser.py \
  tests/entrypoints/openai/tool_parsers/test_step3_tool_parser.py

Test Result

tests/entrypoints/openai/tool_parsers/test_deepseekv3_tool_parser.py ...............x.                                                                                                                                                                                                   [ 12%]
tests/entrypoints/openai/tool_parsers/test_granite_20b_fc_tool_parser.py ..........x......                                                                                                                                                                                               [ 24%]
tests/entrypoints/openai/tool_parsers/test_granite_tool_parser.py ..........xx..x.x....                                                                                                                                                                                                  [ 39%]
tests/entrypoints/openai/tool_parsers/test_internlm2_tool_parser.py ..x.x.x.x.x.x..xx                                                                                                                                                                                                    [ 51%]
tests/entrypoints/openai/tool_parsers/test_longcat_tool_parser.py ..............x..                                                                                                                                                                                                      [ 63%]
tests/entrypoints/openai/tool_parsers/test_phi4mini_tool_parser.py x.x.x.xxx.x.x..xx                                                                                                                                                                                                     [ 75%]
tests/entrypoints/openai/tool_parsers/test_qwen3xml_tool_parser.py ..x.x.x.x.x.x.x.x                                                                                                                                                                                                     [ 87%]
tests/entrypoints/openai/tool_parsers/test_step3_tool_parser.py ...xxx.x.x.x.x..x                                                                                                                                                                                                        [100%]

===================== 99 passed, 41 xfailed, 2 warnings in 9.91s =====================

Each of those xfailed tests is a bug in one of the tool call parsers we'll want to track down. The expected failures are marked as strict, so that the test will fail if one of them unexpectedly passes so that we can keep the list of expected failures accurate with the real state of things.

gemini-code-assist

Code Review

This pull request introduces a valuable common test suite for tool call parsers, which will greatly improve testing consistency and help identify gaps in different parser implementations. The structure with a configuration dataclass and a test mixin is well-designed. My review focuses on strengthening some of the new common tests to make them more robust and comprehensive. Specifically, I've suggested improvements to test_various_data_types to validate parsed values and to test_streaming_reconstruction for a more complete comparison between streaming and non-streaming outputs.

tests/entrypoints/openai/tool_parsers/common_tests.py

DarkLight1337 · 2025-10-28T04:34:22Z

cc @aarnphm @chaunceyjiang

bbrowning · 2025-10-28T14:11:22Z

I had tests for the mistral tool parser as part of this as well locally, but decided to wait on adding tests for that parser until #19425 lands since that PR adds an initial mistral tool parser test suite and I didn't want to cause a rebase headache there since that other PR is already quite large.

bbrowning · 2025-10-28T15:31:41Z

I created #27661 to track the overall arc I'm working towards here for broader context as to why I'm adding a common test suite and expanding the tests across all parsers. To briefly recap, these tests serve double duty of identifying existing bugs across parsers and de-risking a future refactor of tool call parsers by ensuring we have comprehensive test coverage.

bbrowning · 2025-10-30T16:16:02Z

The precommit failed here due to #27811 . I confirmed the ruff failures there were unrelated to this change.

bbrowning · 2025-10-31T15:49:26Z

Looks like this CI run failed with the same flake I previously reported as #27576 .

bbrowning · 2025-10-31T17:27:47Z

Adding a note to myself and any future reviewers that if #27747 lands before this PR, this PR needs to update the location of the new tests it's adding to align with the test reorganization in 27747.

This adds a common test suite for tool call parsers and wires all of the existing tool call parsers that had no tests into the common suite. It doesn't yet adapt existing tool call parser tests to fit into the common suite nor augment tool call parsers that already had tests with the new set of common tests. Those tasks can come later, as this PR is already quite large. Not all of the existing tool call parsers can pass every test in the common test suite. The ones that are not passing today are marked as xfail, and those represent opportunities to identify and fix gaps in all of these tool call parsers in the future until we get down to zero expected to fail tests within the common suite for each parser. Given how many tests are here, the default_tokenizer fixture used by tests when tokenizing strings was also adjusted to be module-scoped, so we don't create a new version of that for every single test. Signed-off-by: Ben Browning <bbrownin@redhat.com>

This tightens up the data type checking in the common tool call parser test suite to ensure parsers are not only parsing various data types of function arguments, but also that they are parsed into the expected Python type. The XML-based parsers do not support parsing into any data type but string, so there's flag added to control this stricter behavior so that tool call parsers that cannot deal with parsing different data types into their non-string native types are excluded from this checking. Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning · 2025-12-03T18:39:33Z

Rebased this on top of latest main to fix a small conflict in conftest.py. All added tests still pass.

…r-tests

This just updates a few import paths to match how the location of ToolParserManager and renaming/moving of AnyTokenizer to TokenizerLike have happened since this PR was initially opened. Signed-off-by: Ben Browning <bbrownin@redhat.com>

mergify · 2026-02-12T20:37:41Z

Hi @bbrowning, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning · 2026-02-23T20:11:00Z

@sfeng33 This is an older PR I have that adds some minimal "standard" tool calling test suite, initially focused only on adding tests for the parsers that have none today. I'm happy to rethink this or pivot to another way if you have input on how you think we should best try and add some minimal standard of tool parser unit tests to verify basic functionality in expected scenarios.

bbrowning requested review from DarkLight1337, NickLucche, aarnphm, robertgshaw2-redhat and simon-mo as code owners October 27, 2025 18:57

bbrowning force-pushed the 20251006-tool-parser-tests branch from bc9a4d0 to 3dfa4d8 Compare October 27, 2025 18:58

mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models tool-calling labels Oct 27, 2025

github-project-automation bot added this to Tool Calling Oct 27, 2025

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

tests/entrypoints/openai/tool_parsers/common_tests.py Show resolved Hide resolved

tests/entrypoints/openai/tool_parsers/common_tests.py Show resolved Hide resolved

chaunceyjiang self-assigned this Oct 28, 2025

bbrowning mentioned this pull request Oct 28, 2025

[RFC]: Consolidated tool call parser implementations by type (JSON, Python, XML, Harmony) #27661

Open

10 tasks

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025

bbrowning added 2 commits December 3, 2025 13:35

bbrowning force-pushed the 20251006-tool-parser-tests branch from ffdc985 to 6588078 Compare December 3, 2025 18:38

bbrowning added 2 commits February 12, 2026 15:21

Merge remote-tracking branch 'upstream/main' into 20251006-tool-parse…

9dc4946

…r-tests

Update import paths in tool parser tests

27cd24d

This just updates a few import paths to match how the location of ToolParserManager and renaming/moving of AnyTokenizer to TokenizerLike have happened since this PR was initially opened. Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning added 2 commits February 12, 2026 15:49

Adjust import order for pre-commit

814eace

Signed-off-by: Ben Browning <bbrownin@redhat.com>

Merge branch 'main' into 20251006-tool-parser-tests

1475486

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI/Build] Add common tool call parser test suite#27599

[CI/Build] Add common tool call parser test suite#27599
bbrowning wants to merge 6 commits intovllm-project:mainfrom
bbrowning:20251006-tool-parser-tests

bbrowning commented Oct 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Oct 28, 2025

Uh oh!

bbrowning commented Oct 28, 2025

Uh oh!

bbrowning commented Oct 28, 2025

Uh oh!

bbrowning commented Oct 30, 2025

Uh oh!

bbrowning commented Oct 31, 2025

Uh oh!

bbrowning commented Oct 31, 2025

Uh oh!

bbrowning commented Dec 3, 2025 •

edited

Loading

Uh oh!

mergify bot commented Feb 12, 2026

Uh oh!

bbrowning commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bbrowning commented Oct 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Oct 28, 2025

Uh oh!

bbrowning commented Oct 28, 2025

Uh oh!

bbrowning commented Oct 28, 2025

Uh oh!

bbrowning commented Oct 30, 2025

Uh oh!

bbrowning commented Oct 31, 2025

Uh oh!

bbrowning commented Oct 31, 2025

Uh oh!

bbrowning commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Feb 12, 2026

Uh oh!

bbrowning commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bbrowning commented Oct 27, 2025 •

edited by github-actions bot

Loading

bbrowning commented Dec 3, 2025 •

edited

Loading