fix: clear response_format when tool_choice is auto to allow tool calls#39969
fix: clear response_format when tool_choice is auto to allow tool calls#39969he-yufeng wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the adjust_request method in abstract_tool_parser.py to clear the response_format for ChatCompletionRequest instances, ensuring that constrained decoding does not interfere with tool call generation. A review comment identifies a potential regression where this change would incorrectly clear the response format even when tool_choice is set to 'none', and suggests refining the condition to specifically target 'auto' tool selection.
| description="Response format for tool calling", | ||
| strict=True, | ||
| ) | ||
| elif isinstance(request, ChatCompletionRequest): |
There was a problem hiding this comment.
The current implementation clears response_format for any ChatCompletionRequest where json_schema_from_tool is None. However, json_schema_from_tool is None for both tool_choice="auto" and tool_choice="none" (as defined in get_json_schema_from_tools in vllm/tool_parsers/utils.py).
If a user provides tools but explicitly sets tool_choice="none" while also requesting a specific response_format (e.g., json_object), this change will incorrectly clear their response_format. This results in a regression where the model's output is no longer constrained to JSON even though tool calling is disabled. The condition should explicitly check for tool_choice == "auto".
| elif isinstance(request, ChatCompletionRequest): | |
| elif isinstance(request, ChatCompletionRequest) and request.tool_choice == "auto": |
| description="Response format for tool calling", | ||
| strict=True, | ||
| ) | ||
| elif isinstance(request, ChatCompletionRequest): |
There was a problem hiding this comment.
nit - can be more explicit here
elif isinstance(request, ChatCompletionRequest) and request.tool_choice in ("auto", None):
|
Can you please fix DCO? |
| description="Response format for tool calling", | ||
| strict=True, | ||
| ) | ||
| elif isinstance(request, ChatCompletionRequest): |
There was a problem hiding this comment.
I think an E2E test is needed.
When both tools and response_format are set with tool_choice=auto, constrained JSON decoding prevents the model from generating tool call tokens. Already fixed for required in vllm-project#32006 but auto was missed. tool_choice=none is deliberately left untouched because the caller explicitly opted out of tool calls. Fixes vllm-project#39929 Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
6340873 to
8cef40c
Compare
|
Thanks for the reviews! Addressed:
Since |
|
Hi, checking in on this after the latest update. I addressed the DCO issue and the review feedback by preserving |
|
Thanks for the work! The change LGTM, deferring to @chaunceyjiang for feedback on the test coverage. |
ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing StructuredOutputsParams via in-place attribute assignment and returns without nulling response_format. The in-place set bypasses StructuredOutputsParams.__post_init__, so the params keep a prior mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one lowered from response_format) next to the new structural_tag. On the next re-validation this trips the one-constraint invariant, so a strict-mode request that also carries a structured-output constraint or a response_format fails with: ValueError: You can only use one kind of structured outputs constraint but multiple are specified This affects any parser that installs a structural tag -- currently DeepSeek-V4 and Qwen3-Coder via get_structural_tag. The env var is off by default, and a request with no pre-existing constraint is unaffected. Fix: rebuild structured_outputs with only the structural tag (preserving the whitespace / additional-properties knobs) and null response_format, mirroring Step 2 of the same method. This "tool constraint wins, response_format dropped" resolution already exists in Step 2, the DeepSeek-V3.2 override (vllm-project#41178), and for required/auto in vllm-project#32006 / vllm-project#39969; the in-place-vs-rebuild trade-off was discussed on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds). Repro / regression test (CPU, no model required): pytest tests/tool_use/test_strict_tool_calling_adjust_request.py The added tests enable strict mode, give a parser a structural tag, and send tools together with a response_format or a structured_outputs.json constraint (tool_choice auto and required). On the pre-fix code adjust_request leaves two constraints, and to_sampling_params raises the ValueError above; with this change structured_outputs holds only the structural tag, response_format is None, and the user's whitespace knobs are preserved. The conflict tests fail without this patch and pass with it; the no-pre-existing-constraint case passes either way. Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that also sets response_format returns HTTP 400 (the error above) before this change and a normal tool call after; a required-tool request is unaffected because that path already rebuilds. Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing StructuredOutputsParams via in-place attribute assignment and returns without nulling response_format. The in-place set bypasses StructuredOutputsParams.__post_init__, so the params keep a prior mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one lowered from response_format) next to the new structural_tag. On the next re-validation this trips the one-constraint invariant, so a strict-mode request that also carries a structured-output constraint or a response_format fails with: ValueError: You can only use one kind of structured outputs constraint but multiple are specified This affects any parser that installs a structural tag -- currently DeepSeek-V4 and Qwen3-Coder via get_structural_tag. The env var is off by default, and a request with no pre-existing constraint is unaffected. Fix: rebuild structured_outputs with only the structural tag (preserving the whitespace / additional-properties knobs) and null response_format, mirroring Step 2 of the same method. This "tool constraint wins, response_format dropped" resolution already exists in Step 2 and the DeepSeek-V3.2 override (vllm-project#41178), and is the intent of the open auto-path fix vllm-project#39969; the in-place-vs-rebuild trade-off was discussed on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds). Repro / regression test (CPU, no model required): pytest tests/tool_use/test_strict_tool_calling_adjust_request.py The added tests enable strict mode, give a parser a structural tag, and send tools together with a response_format or a structured_outputs.json constraint (tool_choice auto and required). On the pre-fix code adjust_request leaves two constraints, and to_sampling_params raises the ValueError above; with this change structured_outputs holds only the structural tag, response_format is None, and the user's whitespace knobs are preserved. The conflict tests fail without this patch and pass with it; the no-pre-existing-constraint case passes either way. Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that also sets response_format returns HTTP 400 (the error above) before this change and a normal tool call after; a required-tool request is unaffected because that path already rebuilds. Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
Summary
When a request includes both
toolsandresponse_format(e.g.json_object) withtool_choice: "auto"(the default), constrained JSON decoding fromresponse_formatforces the model to produce JSON content and prevents it from generating tool call tokens. The model returnstool_calls: []and answers directly as JSON.This was already fixed for
tool_choice: "required"in #32006, but thetool_choice: "auto"case was missed.Fix
In
adjust_request(), whenget_json_schema_from_toolsreturnsNone(the"auto"case) but tools are present, clearresponse_formatso the model can freely choose between tool calls and text output. The tool parser handles extraction regardless of output format.Fixes #39929
Changes
1 file changed —
vllm/tool_parsers/abstract_tool_parser.pyTest plan
elifonly triggers whenjson_schema_from_tool is None(i.e.tool_choice="auto") andrequest.toolsis non-empty (checked at line 80)tool_choice="auto"+response_format=json_object+ tools → model can now return tool_callsadjust_requestreturns early (line 80-81), so response_format is preservedtool_choice="required"path unchanged (still goes through theifbranch)