[Responses API] Unified tool_choice + structured output via triggered tags#1
[Responses API] Unified tool_choice + structured output via triggered tags#1will-deines wants to merge 0 commit intomainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
c811b09 to
e58c4f4
Compare
e58c4f4 to
a97954b
Compare
Summary
Unifies
tool_choiceenforcement, function tool tags, and structured output content constraints into a singleprepare_structured_tag()code path for GPT-OSS Harmony models in the Responses API. Previously,tool_choice="required"raisedNotImplementedError, content constraints (json_schema) clobbered reasoning output, andjson_objectformat was silently ignored.Addresses upstream: #33966 (tool_choice=required) | #23120 (structured output not enforced with GPT-OSS) | #26288 (streaming schema alias bug) | #34857 (Responses API roadmap)
Approach: TriggeredTagsFormat structural tags
Rather than using a separate EBNF grammar (see Alternatives below), we extend
prepare_structured_tag()to be the single authority for all generation constraints. xgrammar'sTriggeredTagsFormatnatively supports:<|channel|>finalregion<|channel|>finalfrom triggers forces tool callsThis means
StructuredOutputsParamskeeps its existing mutual-exclusivity invariant (one constraint type =structural_tag), and no custom grammar composition is needed.This is the same
TriggeredTagsFormatpattern used by #36891 (Kimi K2 guided decoding, 75%→100% schema accuracy) and #28148 (GPT-OSS chat format GD). The pattern is converging as the preferred mechanism for guided tool calling in vLLM.How channel blocking works
xgrammar's triggered_tags use prefix matching. When the model generates
<|channel|>:<|channel|>analysisand<|channel|>commentary to=but NOT<|channel|>final, the model can only continue withanalysis...orcommentary to=...finalis rejected because no trigger matches that continuationRelationship to other PRs
PRs we supersede or absorb
_constraint_to_content_format(),struct_out is Nonebranch,json_objecthandling, streaming fix, developer message injection.PRs we align with (same TriggeredTagsFormat approach)
TriggeredTagsFormaton the reasoning parser to guide Harmony's chat format for tool calling. Our PR is narrower in scope (tool_choice enforcement + content constraints) while vllm-project#28148 constrains the entire chat format. If vllm-project#28148 merges, it would subsume our tool tag logic but our content constraint embedding andstruct_out is Nonefixes would still be needed.TriggeredTagsFormatfortool_choice="auto"— achieved 75%→100% schema accuracy. Demonstrates the pattern works for tool calling. Different model family, no file overlap.prepare_structured_tag()on the reasoning parser and theno_func_reaonsing_tagbase structure.PRs we diverge from (EBNF grammar approach)
tool_parser.adjust_request()which sets.grammaronstructured_outputs. This conflicts with the reasoning parser's.structural_tag— only one can win. Additionally, BFCL eval showed 3.25% overall on gpt-oss-20b (vs. 20.88% baseline), suggesting the EBNF grammar over-constrains the model. Our triggered_tags approach avoids this conflict entirely.Complementary PRs (different model families / code paths)
requiredcode path for non-GPT-OSS models (Qwen3 XML). No overlap.What changed (relative to main)
tool_choice + function tools (new in this PR)
from_function_tool_to_tag()— creates commentary + analysis tags per function tool with JSON schema content constraintsgptoss_reasoning_parser.pytag_with_function_tools()— deep-copies base tag, adds function tool triggers + tagsgptoss_reasoning_parser.pyprepare_structured_tag()extended withtool_choice,function_tools,final_content_formatparamsgptoss_reasoning_parser.pyabs_reasoning_parsers.pyserving.pytool_choice != "auto"NotImplementedError in_make_request_with_harmony()serving.pyAbsorbed from #35904 (structured output + reasoning)
_constraint_to_content_format()— converts json/regex/grammar/choice/json_object to xgrammar content formatserving.py_extract_response_format_schema()— extracts JSON schema from requestserving.pystruct_out is Nonebranch — reasoning tags always applied, even without structured outputserving.pyinject_response_formats()— Harmony cookbook developer message injectionharmony_utils.pyserving.pyjson_objectformat handling (was silently ignored)protocol.py.model_dump()fix (schema alias bug)serving.pyCombination matrix
Decisions we made
1. Why not EBNF grammar?
What we chose: Extend
prepare_structured_tag()withTriggeredTagsFormatto handle function tools and tool_choice natively — the same mechanism already used for channel structure and builtin tools.Alternative: #33306 uses an EBNF grammar set via
tool_parser.adjust_request(). This sets.grammaronstructured_outputs, which conflicts with the reasoning parser's.structural_tag— two systems fight overstructured_outputsand only one field can be active.Why we chose this:
2. Why supersede vllm-project#35904?
What we chose: Absorb all changes from #35904 into this PR.
Why: Both PRs modify
prepare_structured_tag()and the structural tag preparation block inserving.py. They'd conflict mechanically on every overlapping file. The changes are complementary (content constraints + tool_choice), and merging them produces a cleaner result. This PR is a strict superset.3. Embed content constraints inside structural tag
(From vllm-project#35904) When a content constraint (json_schema, regex, etc.) is present alongside a reasoning parser, convert it to an xgrammar content format dict, embed it in the
<|channel|>finaltag, and clear the original constraint fields.Why: xgrammar's
TagFormat.contentalready supports this composition natively. The mutual-exclusivity invariant onStructuredOutputsParamsis load-bearing across the entire structured output stack — relaxing it has a large blast radius.4. Apply reasoning channel tags unconditionally
(From vllm-project#35904) When
struct_out is Noneand a reasoning parser is active, createStructuredOutputsParams(structural_tag=...)with reasoning channel tags.Why critical for us: Without this,
tool_choiceandfunction_toolsparams never reachprepare_structured_tag()when no structured output is requested — the most common case for function tool calling.5. Developer message injection per Harmony cookbook
(From vllm-project#35904) When json_schema is requested, auto-inject a
# Response Formatssection into the Harmony developer message.Why: The Harmony cookbook requires BOTH grammar enforcement AND prompt guidance for structured output.
Files changed
vllm/reasoning/gptoss_reasoning_parser.pyprepare_structured_tag()vllm/reasoning/abs_reasoning_parsers.pyvllm/entrypoints/openai/responses/serving.pyvllm/entrypoints/openai/responses/protocol.pyjson_objectformat handlingvllm/entrypoints/openai/parser/harmony_utils.pyinject_response_formats()tests/v1/structured_output/test_gptoss_structural_tags.pytests/entrypoints/openai/responses/test_structured_output.py_constraint_to_content_formatteststests/entrypoints/openai/responses/test_response_formats.py_extract_response_format_schemateststests/entrypoints/openai/responses/test_sampling_params.pyjson_objecttesttests/entrypoints/openai/parser/test_harmony_utils.pyinject_response_formatstestsTest plan
tool_choice="required"+ function toolstool_choice="auto"(regression)text.format.type=json_schema+ reasoning🤖 Generated with Claude Code