Skip to content

fix: clear response_format when tool_choice is auto to allow tool calls#39969

Open
he-yufeng wants to merge 1 commit into
vllm-project:mainfrom
he-yufeng:fix/auto-tool-choice-response-format
Open

fix: clear response_format when tool_choice is auto to allow tool calls#39969
he-yufeng wants to merge 1 commit into
vllm-project:mainfrom
he-yufeng:fix/auto-tool-choice-response-format

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

Summary

When a request includes both tools and response_format (e.g. json_object) with tool_choice: "auto" (the default), constrained JSON decoding from response_format forces the model to produce JSON content and prevents it from generating tool call tokens. The model returns tool_calls: [] and answers directly as JSON.

This was already fixed for tool_choice: "required" in #32006, but the tool_choice: "auto" case was missed.

Fix

In adjust_request(), when get_json_schema_from_tools returns None (the "auto" case) but tools are present, clear response_format so the model can freely choose between tool calls and text output. The tool parser handles extraction regardless of output format.

Fixes #39929

Changes

1 file changed — vllm/tool_parsers/abstract_tool_parser.py

# Before: response_format only cleared for "required"/"forced function"
if json_schema_from_tool is not None:
    request.response_format = None

# After: also cleared for "auto" when tools are present
if json_schema_from_tool is not None:
    request.response_format = None
elif isinstance(request, ChatCompletionRequest):
    # tool_choice: "auto" -- clear response_format so constrained
    # decoding doesn't prevent the model from generating tool calls.
    request.response_format = None

Test plan

  • Code review: the elif only triggers when json_schema_from_tool is None (i.e. tool_choice="auto") and request.tools is non-empty (checked at line 80)
  • With tool_choice="auto" + response_format=json_object + tools → model can now return tool_calls
  • Without tools, adjust_request returns early (line 80-81), so response_format is preserved
  • tool_choice="required" path unchanged (still goes through the if branch)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the adjust_request method in abstract_tool_parser.py to clear the response_format for ChatCompletionRequest instances, ensuring that constrained decoding does not interfere with tool call generation. A review comment identifies a potential regression where this change would incorrectly clear the response format even when tool_choice is set to 'none', and suggests refining the condition to specifically target 'auto' tool selection.

description="Response format for tool calling",
strict=True,
)
elif isinstance(request, ChatCompletionRequest):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation clears response_format for any ChatCompletionRequest where json_schema_from_tool is None. However, json_schema_from_tool is None for both tool_choice="auto" and tool_choice="none" (as defined in get_json_schema_from_tools in vllm/tool_parsers/utils.py).

If a user provides tools but explicitly sets tool_choice="none" while also requesting a specific response_format (e.g., json_object), this change will incorrectly clear their response_format. This results in a regression where the model's output is no longer constrained to JSON even though tool calling is disabled. The condition should explicitly check for tool_choice == "auto".

Suggested change
elif isinstance(request, ChatCompletionRequest):
elif isinstance(request, ChatCompletionRequest) and request.tool_choice == "auto":

description="Response format for tool calling",
strict=True,
)
elif isinstance(request, ChatCompletionRequest):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can be more explicit here

elif isinstance(request, ChatCompletionRequest) and request.tool_choice in ("auto", None):

@sfeng33
Copy link
Copy Markdown
Collaborator

sfeng33 commented Apr 16, 2026

Can you please fix DCO?

description="Response format for tool calling",
strict=True,
)
elif isinstance(request, ChatCompletionRequest):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an E2E test is needed.

When both tools and response_format are set with tool_choice=auto,
constrained JSON decoding prevents the model from generating tool
call tokens. Already fixed for required in vllm-project#32006 but auto was missed.
tool_choice=none is deliberately left untouched because the caller
explicitly opted out of tool calls.

Fixes vllm-project#39929

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
@he-yufeng he-yufeng force-pushed the fix/auto-tool-choice-response-format branch from 6340873 to 8cef40c Compare April 22, 2026 12:43
@he-yufeng he-yufeng requested a review from bbrowning as a code owner April 22, 2026 12:43
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews! Addressed:

  • @sfeng33 — tightened the condition to tool_choice in ("auto", None) so tool_choice="none" (caller explicitly opted out of tools) keeps response_format intact. DCO fixed via sign-off.
  • @chaunceyjiang — added unit tests covering all four cases in tests/tool_parsers/test_openai_tool_parser.py:
    • tool_choice="auto" + response_format → cleared
    • tool_choice unset + tools → cleared (auto is the default)
    • tool_choice="none" + response_format → preserved
    • no tools + response_format → preserved

Since adjust_request lives in the base ToolParser class, the OpenAIToolParser exercises it directly. Let me know if you'd like the same check added as a protocol-level test in tests/tool_use/ — happy to move it.

@he-yufeng
Copy link
Copy Markdown
Contributor Author

Hi, checking in on this after the latest update. I addressed the DCO issue and the review feedback by preserving response_format for tool_choice="none", then added the focused parser tests for the covered cases. GitHub currently shows DCO, pre-commit, RTD, and the summary check passing. Happy to adjust further if you prefer a protocol-level test instead.

@sfeng33
Copy link
Copy Markdown
Collaborator

sfeng33 commented May 14, 2026

Thanks for the work! The change LGTM, deferring to @chaunceyjiang for feedback on the test coverage.

alexeldeib added a commit to alexeldeib/vllm that referenced this pull request May 31, 2026
ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by
VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing
StructuredOutputsParams via in-place attribute assignment and returns without
nulling response_format. The in-place set bypasses
StructuredOutputsParams.__post_init__, so the params keep a prior
mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one
lowered from response_format) next to the new structural_tag. On the next
re-validation this trips the one-constraint invariant, so a strict-mode request
that also carries a structured-output constraint or a response_format fails with:

    ValueError: You can only use one kind of structured outputs constraint
    but multiple are specified

This affects any parser that installs a structural tag -- currently DeepSeek-V4
and Qwen3-Coder via get_structural_tag. The env var is off by default, and a
request with no pre-existing constraint is unaffected.

Fix: rebuild structured_outputs with only the structural tag (preserving the
whitespace / additional-properties knobs) and null response_format, mirroring
Step 2 of the same method. This "tool constraint wins, response_format dropped"
resolution already exists in Step 2, the DeepSeek-V3.2 override (vllm-project#41178), and for
required/auto in vllm-project#32006 / vllm-project#39969; the in-place-vs-rebuild trade-off was discussed
on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds).

Repro / regression test (CPU, no model required):

    pytest tests/tool_use/test_strict_tool_calling_adjust_request.py

The added tests enable strict mode, give a parser a structural tag, and send
tools together with a response_format or a structured_outputs.json constraint
(tool_choice auto and required). On the pre-fix code adjust_request leaves two
constraints, and to_sampling_params raises the ValueError above; with this change
structured_outputs holds only the structural tag, response_format is None, and
the user's whitespace knobs are preserved. The conflict tests fail without this
patch and pass with it; the no-pre-existing-constraint case passes either way.

Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that
also sets response_format returns HTTP 400 (the error above) before this change
and a normal tool call after; a required-tool request is unaffected because that
path already rebuilds.

Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
alexeldeib added a commit to alexeldeib/vllm that referenced this pull request May 31, 2026
ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by
VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing
StructuredOutputsParams via in-place attribute assignment and returns without
nulling response_format. The in-place set bypasses
StructuredOutputsParams.__post_init__, so the params keep a prior
mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one
lowered from response_format) next to the new structural_tag. On the next
re-validation this trips the one-constraint invariant, so a strict-mode request
that also carries a structured-output constraint or a response_format fails with:

    ValueError: You can only use one kind of structured outputs constraint
    but multiple are specified

This affects any parser that installs a structural tag -- currently DeepSeek-V4
and Qwen3-Coder via get_structural_tag. The env var is off by default, and a
request with no pre-existing constraint is unaffected.

Fix: rebuild structured_outputs with only the structural tag (preserving the
whitespace / additional-properties knobs) and null response_format, mirroring
Step 2 of the same method. This "tool constraint wins, response_format dropped"
resolution already exists in Step 2 and the DeepSeek-V3.2 override (vllm-project#41178), and
is the intent of the open auto-path fix vllm-project#39969; the in-place-vs-rebuild trade-off
was discussed on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds).

Repro / regression test (CPU, no model required):

    pytest tests/tool_use/test_strict_tool_calling_adjust_request.py

The added tests enable strict mode, give a parser a structural tag, and send
tools together with a response_format or a structured_outputs.json constraint
(tool_choice auto and required). On the pre-fix code adjust_request leaves two
constraints, and to_sampling_params raises the ValueError above; with this change
structured_outputs holds only the structural tag, response_format is None, and
the user's whitespace knobs are preserved. The conflict tests fail without this
patch and pass with it; the no-pre-existing-constraint case passes either way.

Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that
also sets response_format returns HTTP 400 (the error above) before this change
and a normal tool call after; a required-tool request is unaffected because that
path already rebuilds.

Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug]: response_format suppresses tool calls when tool_choice: "auto" — constrained decoding prevents tool generation

3 participants