fix: clear response_format when tool_choice is auto to allow tool calls by he-yufeng · Pull Request #39969 · vllm-project/vllm

he-yufeng · 2026-04-16T05:40:44Z

Summary

When a request includes both tools and response_format (e.g. json_object) with tool_choice: "auto" (the default), constrained JSON decoding from response_format forces the model to produce JSON content and prevents it from generating tool call tokens. The model returns tool_calls: [] and answers directly as JSON.

This was already fixed for tool_choice: "required" in #32006, but the tool_choice: "auto" case was missed.

Fix

In adjust_request(), when get_json_schema_from_tools returns None (the "auto" case) but tools are present, clear response_format so the model can freely choose between tool calls and text output. The tool parser handles extraction regardless of output format.

Fixes #39929

Changes

1 file changed — vllm/tool_parsers/abstract_tool_parser.py

# Before: response_format only cleared for "required"/"forced function"
if json_schema_from_tool is not None:
    request.response_format = None

# After: also cleared for "auto" when tools are present
if json_schema_from_tool is not None:
    request.response_format = None
elif isinstance(request, ChatCompletionRequest):
    # tool_choice: "auto" -- clear response_format so constrained
    # decoding doesn't prevent the model from generating tool calls.
    request.response_format = None

Test plan

Code review: the elif only triggers when json_schema_from_tool is None (i.e. tool_choice="auto") and request.tools is non-empty (checked at line 80)
With tool_choice="auto" + response_format=json_object + tools → model can now return tool_calls
Without tools, adjust_request returns early (line 80-81), so response_format is preserved
tool_choice="required" path unchanged (still goes through the if branch)

gemini-code-assist

Code Review

This pull request updates the adjust_request method in abstract_tool_parser.py to clear the response_format for ChatCompletionRequest instances, ensuring that constrained decoding does not interfere with tool call generation. A review comment identifies a potential regression where this change would incorrectly clear the response format even when tool_choice is set to 'none', and suggests refining the condition to specifically target 'auto' tool selection.

gemini-code-assist · 2026-04-16T05:42:09Z

                    description="Response format for tool calling",
                    strict=True,
                )
+        elif isinstance(request, ChatCompletionRequest):


The current implementation clears response_format for any ChatCompletionRequest where json_schema_from_tool is None. However, json_schema_from_tool is None for both tool_choice="auto" and tool_choice="none" (as defined in get_json_schema_from_tools in vllm/tool_parsers/utils.py).

If a user provides tools but explicitly sets tool_choice="none" while also requesting a specific response_format (e.g., json_object), this change will incorrectly clear their response_format. This results in a regression where the model's output is no longer constrained to JSON even though tool calling is disabled. The condition should explicitly check for tool_choice == "auto".

Suggested change

elif isinstance(request, ChatCompletionRequest):

elif isinstance(request, ChatCompletionRequest) and request.tool_choice == "auto":

sfeng33 · 2026-04-16T20:01:05Z

                    description="Response format for tool calling",
                    strict=True,
                )
+        elif isinstance(request, ChatCompletionRequest):


nit - can be more explicit here

elif isinstance(request, ChatCompletionRequest) and request.tool_choice in ("auto", None):

sfeng33 · 2026-04-16T20:01:40Z

Can you please fix DCO?

chaunceyjiang · 2026-04-17T05:57:07Z

                    description="Response format for tool calling",
                    strict=True,
                )
+        elif isinstance(request, ChatCompletionRequest):


I think an E2E test is needed.

When both tools and response_format are set with tool_choice=auto, constrained JSON decoding prevents the model from generating tool call tokens. Already fixed for required in vllm-project#32006 but auto was missed. tool_choice=none is deliberately left untouched because the caller explicitly opted out of tool calls. Fixes vllm-project#39929 Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

he-yufeng · 2026-04-22T12:43:29Z

Thanks for the reviews! Addressed:

@sfeng33 — tightened the condition to tool_choice in ("auto", None) so tool_choice="none" (caller explicitly opted out of tools) keeps response_format intact. DCO fixed via sign-off.
@chaunceyjiang — added unit tests covering all four cases in tests/tool_parsers/test_openai_tool_parser.py:
- tool_choice="auto" + response_format → cleared
- tool_choice unset + tools → cleared (auto is the default)
- tool_choice="none" + response_format → preserved
- no tools + response_format → preserved

Since adjust_request lives in the base ToolParser class, the OpenAIToolParser exercises it directly. Let me know if you'd like the same check added as a protocol-level test in tests/tool_use/ — happy to move it.

he-yufeng · 2026-05-13T23:18:01Z

Hi, checking in on this after the latest update. I addressed the DCO issue and the review feedback by preserving response_format for tool_choice="none", then added the focused parser tests for the covered cases. GitHub currently shows DCO, pre-commit, RTD, and the summary check passing. Happy to adjust further if you prefer a protocol-level test instead.

sfeng33 · 2026-05-14T00:56:25Z

Thanks for the work! The change LGTM, deferring to @chaunceyjiang for feedback on the test coverage.

ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing StructuredOutputsParams via in-place attribute assignment and returns without nulling response_format. The in-place set bypasses StructuredOutputsParams.__post_init__, so the params keep a prior mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one lowered from response_format) next to the new structural_tag. On the next re-validation this trips the one-constraint invariant, so a strict-mode request that also carries a structured-output constraint or a response_format fails with: ValueError: You can only use one kind of structured outputs constraint but multiple are specified This affects any parser that installs a structural tag -- currently DeepSeek-V4 and Qwen3-Coder via get_structural_tag. The env var is off by default, and a request with no pre-existing constraint is unaffected. Fix: rebuild structured_outputs with only the structural tag (preserving the whitespace / additional-properties knobs) and null response_format, mirroring Step 2 of the same method. This "tool constraint wins, response_format dropped" resolution already exists in Step 2, the DeepSeek-V3.2 override (vllm-project#41178), and for required/auto in vllm-project#32006 / vllm-project#39969; the in-place-vs-rebuild trade-off was discussed on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds). Repro / regression test (CPU, no model required): pytest tests/tool_use/test_strict_tool_calling_adjust_request.py The added tests enable strict mode, give a parser a structural tag, and send tools together with a response_format or a structured_outputs.json constraint (tool_choice auto and required). On the pre-fix code adjust_request leaves two constraints, and to_sampling_params raises the ValueError above; with this change structured_outputs holds only the structural tag, response_format is None, and the user's whitespace knobs are preserved. The conflict tests fail without this patch and pass with it; the no-pre-existing-constraint case passes either way. Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that also sets response_format returns HTTP 400 (the error above) before this change and a normal tool call after; a required-tool request is unaffected because that path already rebuilds. Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>

ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing StructuredOutputsParams via in-place attribute assignment and returns without nulling response_format. The in-place set bypasses StructuredOutputsParams.__post_init__, so the params keep a prior mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one lowered from response_format) next to the new structural_tag. On the next re-validation this trips the one-constraint invariant, so a strict-mode request that also carries a structured-output constraint or a response_format fails with: ValueError: You can only use one kind of structured outputs constraint but multiple are specified This affects any parser that installs a structural tag -- currently DeepSeek-V4 and Qwen3-Coder via get_structural_tag. The env var is off by default, and a request with no pre-existing constraint is unaffected. Fix: rebuild structured_outputs with only the structural tag (preserving the whitespace / additional-properties knobs) and null response_format, mirroring Step 2 of the same method. This "tool constraint wins, response_format dropped" resolution already exists in Step 2 and the DeepSeek-V3.2 override (vllm-project#41178), and is the intent of the open auto-path fix vllm-project#39969; the in-place-vs-rebuild trade-off was discussed on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds). Repro / regression test (CPU, no model required): pytest tests/tool_use/test_strict_tool_calling_adjust_request.py The added tests enable strict mode, give a parser a structural tag, and send tools together with a response_format or a structured_outputs.json constraint (tool_choice auto and required). On the pre-fix code adjust_request leaves two constraints, and to_sampling_params raises the ValueError above; with this change structured_outputs holds only the structural tag, response_format is None, and the user's whitespace knobs are preserved. The conflict tests fail without this patch and pass with it; the no-pre-existing-constraint case passes either way. Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that also sets response_format returns HTTP 400 (the error above) before this change and a normal tool call after; a required-tool request is unaffected because that path already rebuilds. Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>

he-yufeng requested review from aarnphm and chaunceyjiang as code owners April 16, 2026 05:40

mergify Bot added the tool-calling label Apr 16, 2026

github-project-automation Bot added this to Tool Calling Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

sfeng33 reviewed Apr 16, 2026

View reviewed changes

chaunceyjiang reviewed Apr 17, 2026

View reviewed changes

he-yufeng force-pushed the fix/auto-tool-choice-response-format branch from 6340873 to 8cef40c Compare April 22, 2026 12:43

he-yufeng requested a review from bbrowning as a code owner April 22, 2026 12:43

alexeldeib mentioned this pull request May 31, 2026

[Bugfix] Clear conflicting structured outputs in strict tool calling #44134

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: clear response_format when tool_choice is auto to allow tool calls#39969

fix: clear response_format when tool_choice is auto to allow tool calls#39969
he-yufeng wants to merge 1 commit into
vllm-project:mainfrom
he-yufeng:fix/auto-tool-choice-response-format

he-yufeng commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

sfeng33 Apr 16, 2026

Uh oh!

sfeng33 commented Apr 16, 2026

Uh oh!

chaunceyjiang Apr 17, 2026

Uh oh!

he-yufeng commented Apr 22, 2026

Uh oh!

he-yufeng commented May 13, 2026

Uh oh!

sfeng33 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	elif isinstance(request, ChatCompletionRequest):
	elif isinstance(request, ChatCompletionRequest) and request.tool_choice == "auto":

Uh oh!

Conversation

he-yufeng commented Apr 16, 2026

Summary

Fix

Changes

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

sfeng33 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

sfeng33 commented Apr 16, 2026

Uh oh!

chaunceyjiang Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Apr 22, 2026

Uh oh!

he-yufeng commented May 13, 2026

Uh oh!

sfeng33 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants