[Responses API] Unified tool_choice + structured output via triggered tags by will-deines · Pull Request #1 · will-deines/vllm

will-deines · 2026-03-17T14:03:32Z

Supersedes #35904 by absorbing all of its changes and extending with tool_choice + function tool support via triggered tags.

Summary

Unifies tool_choice enforcement, function tool tags, and structured output content constraints into a single prepare_structured_tag() code path for GPT-OSS Harmony models in the Responses API. Previously, tool_choice="required" raised NotImplementedError, content constraints (json_schema) clobbered reasoning output, and json_object format was silently ignored.

Addresses upstream: #33966 (tool_choice=required) | #23120 (structured output not enforced with GPT-OSS) | #26288 (streaming schema alias bug) | #34857 (Responses API roadmap)

Approach: TriggeredTagsFormat structural tags

Rather than using a separate EBNF grammar (see Alternatives below), we extend prepare_structured_tag() to be the single authority for all generation constraints. xgrammar's TriggeredTagsFormat natively supports:

Channel structure — analysis, commentary, final channels
Tool enforcement — function tool tags on both commentary and analysis channels
Argument validation — JSON schema constraints on tool call arguments
Content constraints — json_schema/regex/grammar scoped to the <|channel|>final region
Channel blocking — omitting <|channel|>final from triggers forces tool calls

This means StructuredOutputsParams keeps its existing mutual-exclusivity invariant (one constraint type = structural_tag), and no custom grammar composition is needed.

This is the same TriggeredTagsFormat pattern used by #36891 (Kimi K2 guided decoding, 75%→100% schema accuracy) and #28148 (GPT-OSS chat format GD). The pattern is converging as the preferred mechanism for guided tool calling in vLLM.

How channel blocking works

xgrammar's triggered_tags use prefix matching. When the model generates <|channel|>:

If triggers include <|channel|>analysis and <|channel|>commentary to= but NOT <|channel|>final, the model can only continue with analysis... or commentary to=...
final is rejected because no trigger matches that continuation
The model must call a tool or hit max_tokens

Relationship to other PRs

PRs we supersede or absorb

PR	Title	Status	How we relate
#35904	Structured output + reasoning via structural tag embedding	Open	Fully absorbed. All changes integrated: content constraint embedding, `_constraint_to_content_format()`, `struct_out is None` branch, `json_object` handling, streaming fix, developer message injection.

PRs we align with (same TriggeredTagsFormat approach)

PR	Title	Status	Relationship
#28148	Chat format GD for tool calling with GPT-OSS	Open	Closest aligned PR. Also uses `TriggeredTagsFormat` on the reasoning parser to guide Harmony's chat format for tool calling. Our PR is narrower in scope (tool_choice enforcement + content constraints) while vllm-project#28148 constrains the entire chat format. If vllm-project#28148 merges, it would subsume our tool tag logic but our content constraint embedding and `struct_out is None` fixes would still be needed.
#36891	Kimi K2 guided decoding for tool_choice=auto	Open	Validates approach. First tool parser to use `TriggeredTagsFormat` for `tool_choice="auto"` — achieved 75%→100% schema accuracy. Demonstrates the pattern works for tool calling. Different model family, no file overlap.
#25515	Structure_Tag support for GPT-OSS tool-call in CoT	Merged	Foundation we build on. Introduced `prepare_structured_tag()` on the reasoning parser and the `no_func_reaonsing_tag` base structure.

PRs we diverge from (EBNF grammar approach)

PR	Title	Status	Why we diverge
#33306	tool_choice=required for GPT-OSS Harmony via EBNF	Open	Uses EBNF grammar in `tool_parser.adjust_request()` which sets `.grammar` on `structured_outputs`. This conflicts with the reasoning parser's `.structural_tag` — only one can win. Additionally, BFCL eval showed 3.25% overall on gpt-oss-20b (vs. 20.88% baseline), suggesting the EBNF grammar over-constrains the model. Our triggered_tags approach avoids this conflict entirely.

Complementary PRs (different model families / code paths)

PR	Title	Status	Notes
#35936	tool_choice=required fallback to tool_parser	Open	Fixes generic `required` code path for non-GPT-OSS models (Qwen3 XML). No overlap.
#32202	Structural tag support for Hermes tool calling	Open	Same pattern applied to Hermes parser. Complementary.
#37081	Mistral guidance via Lark grammar	Open	Different grammar backend (llguidance). Low overlap.
#36841	Fix crash when required exceeds max_tokens	Merged	Defensive fix we benefit from.
#37258	Same fix for Responses API	Merged	Same.

What changed (relative to main)

tool_choice + function tools (new in this PR)

Change	File
`from_function_tool_to_tag()` — creates commentary + analysis tags per function tool with JSON schema content constraints	`gptoss_reasoning_parser.py`
`tag_with_function_tools()` — deep-copies base tag, adds function tool triggers + tags	`gptoss_reasoning_parser.py`
`prepare_structured_tag()` extended with `tool_choice`, `function_tools`, `final_content_format` params	`gptoss_reasoning_parser.py`
Base class signature updated with new params (backward-compatible defaults)	`abs_reasoning_parsers.py`
Function tool extraction from request, wired to reasoning parser	`serving.py`
Removed `tool_choice != "auto"` NotImplementedError in `_make_request_with_harmony()`	`serving.py`

Absorbed from #35904 (structured output + reasoning)

Change	File
`_constraint_to_content_format()` — converts json/regex/grammar/choice/json_object to xgrammar content format	`serving.py`
`_extract_response_format_schema()` — extracts JSON schema from request	`serving.py`
Content constraint embedding in `<	channel
`struct_out is None` branch — reasoning tags always applied, even without structured output	`serving.py`
`inject_response_formats()` — Harmony cookbook developer message injection	`harmony_utils.py`
Developer message injection when json_schema requested	`serving.py`
`json_object` format handling (was silently ignored)	`protocol.py`
Streaming `.model_dump()` fix (schema alias bug)	`serving.py`

Combination matrix

tool_choice	function_tools	final_content_format	Result
required	yes	any	Tool tags (both channels, JSON schema args). No final.
required	no	any	Builtin tool tags only. No final.
auto	yes	yes	Tool tags + final with content constraint
auto	yes	no	Tool tags + final with any_text
auto	no	yes	Analysis + final with content constraint
auto/none	no	no	Analysis only (existing default)
none	any	any	Analysis + optional final. No tool tags.

Decisions we made

1. Why not EBNF grammar?

What we chose: Extend prepare_structured_tag() with TriggeredTagsFormat to handle function tools and tool_choice natively — the same mechanism already used for channel structure and builtin tools.

Alternative: #33306 uses an EBNF grammar set via tool_parser.adjust_request(). This sets .grammar on structured_outputs, which conflicts with the reasoning parser's .structural_tag — two systems fight over structured_outputs and only one field can be active.

Why we chose this:

Triggered tags are already the mechanism for channel structure and builtin tools ([GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot vllm-project/vllm#25515). Adding function tools and tool_choice extends the same system rather than introducing a competing one.
xgrammar handles the compilation natively — no hand-written EBNF grammar to maintain.
[Frontend] Add tool_choice=required support for GPT-OSS Harmony models vllm-project/vllm#33306's BFCL eval showed 3.25% overall on gpt-oss-20b (vs. 20.88% baseline), suggesting the EBNF grammar may over-constrain the model.
[Tool Parser] Kimi K2: guided decoding for tool_choice="auto" — 75% → 100% schema accuracy vllm-project/vllm#36891 (Kimi K2) and [Frontend] [gpt-oss] Chat format GD for tool calling with gptoss vllm-project/vllm#28148 (GPT-OSS chat format) demonstrate this is the direction the ecosystem is converging toward.

2. Why supersede vllm-project#35904?

What we chose: Absorb all changes from #35904 into this PR.

Why: Both PRs modify prepare_structured_tag() and the structural tag preparation block in serving.py. They'd conflict mechanically on every overlapping file. The changes are complementary (content constraints + tool_choice), and merging them produces a cleaner result. This PR is a strict superset.

3. Embed content constraints inside structural tag

(From vllm-project#35904) When a content constraint (json_schema, regex, etc.) is present alongside a reasoning parser, convert it to an xgrammar content format dict, embed it in the <|channel|>final tag, and clear the original constraint fields.

Why: xgrammar's TagFormat.content already supports this composition natively. The mutual-exclusivity invariant on StructuredOutputsParams is load-bearing across the entire structured output stack — relaxing it has a large blast radius.

4. Apply reasoning channel tags unconditionally

(From vllm-project#35904) When struct_out is None and a reasoning parser is active, create StructuredOutputsParams(structural_tag=...) with reasoning channel tags.

Why critical for us: Without this, tool_choice and function_tools params never reach prepare_structured_tag() when no structured output is requested — the most common case for function tool calling.

5. Developer message injection per Harmony cookbook

(From vllm-project#35904) When json_schema is requested, auto-inject a # Response Formats section into the Harmony developer message.

Why: The Harmony cookbook requires BOTH grammar enforcement AND prompt guidance for structured output.

Files changed

File	Change
`vllm/reasoning/gptoss_reasoning_parser.py`	New function tool helpers + extended `prepare_structured_tag()`
`vllm/reasoning/abs_reasoning_parsers.py`	Updated base class signature with new params
`vllm/entrypoints/openai/responses/serving.py`	Removed NotImplementedError; 3-branch structural tag block; new helpers; streaming fix
`vllm/entrypoints/openai/responses/protocol.py`	`json_object` format handling
`vllm/entrypoints/openai/parser/harmony_utils.py`	`inject_response_formats()`
`tests/v1/structured_output/test_gptoss_structural_tags.py`	10 existing + 13 new test cases
`tests/entrypoints/openai/responses/test_structured_output.py`	NEW — `_constraint_to_content_format` tests
`tests/entrypoints/openai/responses/test_response_formats.py`	NEW — `_extract_response_format_schema` tests
`tests/entrypoints/openai/responses/test_sampling_params.py`	Extended with `json_object` test
`tests/entrypoints/openai/parser/test_harmony_utils.py`	Extended with `inject_response_formats` tests

Test plan

Unit tests pass: 48/48 passing
No regressions in existing structural tag tests
Backward compatible — all new params default to None
E2E: Responses API + Harmony + tool_choice="required" + function tools
E2E: Responses API + Harmony + tool_choice="auto" (regression)
E2E: Responses API + Harmony + text.format.type=json_schema + reasoning
BFCL eval: validate tool calling accuracy (motivated by [Frontend] Add tool_choice=required support for GPT-OSS Harmony models vllm-project/vllm#33306's poor results)

🤖 Generated with Claude Code

github-actions · 2026-03-17T14:03:47Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

will-deines force-pushed the feat/tool-choice-required branch from c811b09 to e58c4f4 Compare March 17, 2026 14:32

will-deines changed the base branch from garrio-release to main March 17, 2026 14:32

will-deines closed this Mar 17, 2026

will-deines force-pushed the feat/tool-choice-required branch from e58c4f4 to a97954b Compare March 17, 2026 16:16

will-deines changed the title ~~feat(tools): implement tool_choice=required for GPT-OSS Harmony + Responses API~~ [Responses API] Unified tool_choice + structured output via triggered tags Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Responses API] Unified tool_choice + structured output via triggered tags#1

[Responses API] Unified tool_choice + structured output via triggered tags#1
will-deines wants to merge 0 commit intomainfrom
feat/tool-choice-required

will-deines commented Mar 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

will-deines commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach: TriggeredTagsFormat structural tags

How channel blocking works

Relationship to other PRs

PRs we supersede or absorb

PRs we align with (same TriggeredTagsFormat approach)

PRs we diverge from (EBNF grammar approach)

Complementary PRs (different model families / code paths)

What changed (relative to main)

tool_choice + function tools (new in this PR)

Absorbed from #35904 (structured output + reasoning)

Combination matrix

Decisions we made

1. Why not EBNF grammar?

2. Why supersede vllm-project#35904?

3. Embed content constraints inside structural tag

4. Apply reasoning channel tags unconditionally

5. Developer message injection per Harmony cookbook

Files changed

Test plan

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

will-deines commented Mar 17, 2026 •

edited

Loading