Skip to content

[Responses API] Unified tool_choice + structured output via triggered tags#1

Closed
will-deines wants to merge 0 commit intomainfrom
feat/tool-choice-required
Closed

[Responses API] Unified tool_choice + structured output via triggered tags#1
will-deines wants to merge 0 commit intomainfrom
feat/tool-choice-required

Conversation

@will-deines
Copy link
Copy Markdown
Owner

@will-deines will-deines commented Mar 17, 2026

Supersedes #35904 by absorbing all of its changes and extending with tool_choice + function tool support via triggered tags.

Summary

Unifies tool_choice enforcement, function tool tags, and structured output content constraints into a single prepare_structured_tag() code path for GPT-OSS Harmony models in the Responses API. Previously, tool_choice="required" raised NotImplementedError, content constraints (json_schema) clobbered reasoning output, and json_object format was silently ignored.

Addresses upstream: #33966 (tool_choice=required) | #23120 (structured output not enforced with GPT-OSS) | #26288 (streaming schema alias bug) | #34857 (Responses API roadmap)


Approach: TriggeredTagsFormat structural tags

Rather than using a separate EBNF grammar (see Alternatives below), we extend prepare_structured_tag() to be the single authority for all generation constraints. xgrammar's TriggeredTagsFormat natively supports:

  • Channel structure — analysis, commentary, final channels
  • Tool enforcement — function tool tags on both commentary and analysis channels
  • Argument validation — JSON schema constraints on tool call arguments
  • Content constraints — json_schema/regex/grammar scoped to the <|channel|>final region
  • Channel blocking — omitting <|channel|>final from triggers forces tool calls

This means StructuredOutputsParams keeps its existing mutual-exclusivity invariant (one constraint type = structural_tag), and no custom grammar composition is needed.

This is the same TriggeredTagsFormat pattern used by #36891 (Kimi K2 guided decoding, 75%→100% schema accuracy) and #28148 (GPT-OSS chat format GD). The pattern is converging as the preferred mechanism for guided tool calling in vLLM.

How channel blocking works

xgrammar's triggered_tags use prefix matching. When the model generates <|channel|>:

  • If triggers include <|channel|>analysis and <|channel|>commentary to= but NOT <|channel|>final, the model can only continue with analysis... or commentary to=...
  • final is rejected because no trigger matches that continuation
  • The model must call a tool or hit max_tokens

Relationship to other PRs

PRs we supersede or absorb

PR Title Status How we relate
#35904 Structured output + reasoning via structural tag embedding Open Fully absorbed. All changes integrated: content constraint embedding, _constraint_to_content_format(), struct_out is None branch, json_object handling, streaming fix, developer message injection.

PRs we align with (same TriggeredTagsFormat approach)

PR Title Status Relationship
#28148 Chat format GD for tool calling with GPT-OSS Open Closest aligned PR. Also uses TriggeredTagsFormat on the reasoning parser to guide Harmony's chat format for tool calling. Our PR is narrower in scope (tool_choice enforcement + content constraints) while vllm-project#28148 constrains the entire chat format. If vllm-project#28148 merges, it would subsume our tool tag logic but our content constraint embedding and struct_out is None fixes would still be needed.
#36891 Kimi K2 guided decoding for tool_choice=auto Open Validates approach. First tool parser to use TriggeredTagsFormat for tool_choice="auto" — achieved 75%→100% schema accuracy. Demonstrates the pattern works for tool calling. Different model family, no file overlap.
#25515 Structure_Tag support for GPT-OSS tool-call in CoT Merged Foundation we build on. Introduced prepare_structured_tag() on the reasoning parser and the no_func_reaonsing_tag base structure.

PRs we diverge from (EBNF grammar approach)

PR Title Status Why we diverge
#33306 tool_choice=required for GPT-OSS Harmony via EBNF Open Uses EBNF grammar in tool_parser.adjust_request() which sets .grammar on structured_outputs. This conflicts with the reasoning parser's .structural_tag — only one can win. Additionally, BFCL eval showed 3.25% overall on gpt-oss-20b (vs. 20.88% baseline), suggesting the EBNF grammar over-constrains the model. Our triggered_tags approach avoids this conflict entirely.

Complementary PRs (different model families / code paths)

PR Title Status Notes
#35936 tool_choice=required fallback to tool_parser Open Fixes generic required code path for non-GPT-OSS models (Qwen3 XML). No overlap.
#32202 Structural tag support for Hermes tool calling Open Same pattern applied to Hermes parser. Complementary.
#37081 Mistral guidance via Lark grammar Open Different grammar backend (llguidance). Low overlap.
#36841 Fix crash when required exceeds max_tokens Merged Defensive fix we benefit from.
#37258 Same fix for Responses API Merged Same.

What changed (relative to main)

tool_choice + function tools (new in this PR)

Change File
from_function_tool_to_tag() — creates commentary + analysis tags per function tool with JSON schema content constraints gptoss_reasoning_parser.py
tag_with_function_tools() — deep-copies base tag, adds function tool triggers + tags gptoss_reasoning_parser.py
prepare_structured_tag() extended with tool_choice, function_tools, final_content_format params gptoss_reasoning_parser.py
Base class signature updated with new params (backward-compatible defaults) abs_reasoning_parsers.py
Function tool extraction from request, wired to reasoning parser serving.py
Removed tool_choice != "auto" NotImplementedError in _make_request_with_harmony() serving.py

Absorbed from #35904 (structured output + reasoning)

Change File
_constraint_to_content_format() — converts json/regex/grammar/choice/json_object to xgrammar content format serving.py
_extract_response_format_schema() — extracts JSON schema from request serving.py
Content constraint embedding in `< channel
struct_out is None branch — reasoning tags always applied, even without structured output serving.py
inject_response_formats() — Harmony cookbook developer message injection harmony_utils.py
Developer message injection when json_schema requested serving.py
json_object format handling (was silently ignored) protocol.py
Streaming .model_dump() fix (schema alias bug) serving.py

Combination matrix

tool_choice function_tools final_content_format Result
required yes any Tool tags (both channels, JSON schema args). No final.
required no any Builtin tool tags only. No final.
auto yes yes Tool tags + final with content constraint
auto yes no Tool tags + final with any_text
auto no yes Analysis + final with content constraint
auto/none no no Analysis only (existing default)
none any any Analysis + optional final. No tool tags.

Decisions we made

1. Why not EBNF grammar?

What we chose: Extend prepare_structured_tag() with TriggeredTagsFormat to handle function tools and tool_choice natively — the same mechanism already used for channel structure and builtin tools.

Alternative: #33306 uses an EBNF grammar set via tool_parser.adjust_request(). This sets .grammar on structured_outputs, which conflicts with the reasoning parser's .structural_tag — two systems fight over structured_outputs and only one field can be active.

Why we chose this:

2. Why supersede vllm-project#35904?

What we chose: Absorb all changes from #35904 into this PR.

Why: Both PRs modify prepare_structured_tag() and the structural tag preparation block in serving.py. They'd conflict mechanically on every overlapping file. The changes are complementary (content constraints + tool_choice), and merging them produces a cleaner result. This PR is a strict superset.

3. Embed content constraints inside structural tag

(From vllm-project#35904) When a content constraint (json_schema, regex, etc.) is present alongside a reasoning parser, convert it to an xgrammar content format dict, embed it in the <|channel|>final tag, and clear the original constraint fields.

Why: xgrammar's TagFormat.content already supports this composition natively. The mutual-exclusivity invariant on StructuredOutputsParams is load-bearing across the entire structured output stack — relaxing it has a large blast radius.

4. Apply reasoning channel tags unconditionally

(From vllm-project#35904) When struct_out is None and a reasoning parser is active, create StructuredOutputsParams(structural_tag=...) with reasoning channel tags.

Why critical for us: Without this, tool_choice and function_tools params never reach prepare_structured_tag() when no structured output is requested — the most common case for function tool calling.

5. Developer message injection per Harmony cookbook

(From vllm-project#35904) When json_schema is requested, auto-inject a # Response Formats section into the Harmony developer message.

Why: The Harmony cookbook requires BOTH grammar enforcement AND prompt guidance for structured output.


Files changed

File Change
vllm/reasoning/gptoss_reasoning_parser.py New function tool helpers + extended prepare_structured_tag()
vllm/reasoning/abs_reasoning_parsers.py Updated base class signature with new params
vllm/entrypoints/openai/responses/serving.py Removed NotImplementedError; 3-branch structural tag block; new helpers; streaming fix
vllm/entrypoints/openai/responses/protocol.py json_object format handling
vllm/entrypoints/openai/parser/harmony_utils.py inject_response_formats()
tests/v1/structured_output/test_gptoss_structural_tags.py 10 existing + 13 new test cases
tests/entrypoints/openai/responses/test_structured_output.py NEW_constraint_to_content_format tests
tests/entrypoints/openai/responses/test_response_formats.py NEW_extract_response_format_schema tests
tests/entrypoints/openai/responses/test_sampling_params.py Extended with json_object test
tests/entrypoints/openai/parser/test_harmony_utils.py Extended with inject_response_formats tests

Test plan

  • Unit tests pass: 48/48 passing
  • No regressions in existing structural tag tests
  • Backward compatible — all new params default to None
  • E2E: Responses API + Harmony + tool_choice="required" + function tools
  • E2E: Responses API + Harmony + tool_choice="auto" (regression)
  • E2E: Responses API + Harmony + text.format.type=json_schema + reasoning
  • BFCL eval: validate tool calling accuracy (motivated by [Frontend] Add tool_choice=required support for GPT-OSS Harmony models vllm-project/vllm#33306's poor results)

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@will-deines will-deines force-pushed the feat/tool-choice-required branch from c811b09 to e58c4f4 Compare March 17, 2026 14:32
@will-deines will-deines changed the base branch from garrio-release to main March 17, 2026 14:32
@will-deines will-deines force-pushed the feat/tool-choice-required branch from e58c4f4 to a97954b Compare March 17, 2026 16:16
@will-deines will-deines changed the title feat(tools): implement tool_choice=required for GPT-OSS Harmony + Responses API [Responses API] Unified tool_choice + structured output via triggered tags Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant