[Responses API] Structured output + reasoning via structural tag embedding by will-deines · Pull Request #35904 · vllm-project/vllm

will-deines · 2026-03-03T20:17:02Z

Recreated from #35873, which was closed when the fork was temporarily made private.

Summary

Embed content constraints in structural tags: When a user requests JSON schema enforcement (text.format.type=json_schema) with a GPT-OSS reasoning model, the grammar constraint is now scoped to the <|channel|>final region via xgrammar's TriggeredTagsFormat. Previously, grammar bitmasks were applied from token 0, clobbering reasoning output.
Handle json_object format: text.format.type=json_object was silently ignored in the Responses API. Now produces StructuredOutputsParams(json_object=True), matching chat completions behavior.
Fix streaming + json_schema alias bug: Remove .model_dump() in the streaming path that dropped the schema → schema_ Pydantic alias, causing ResponseCreatedEvent deserialization failures.
Apply reasoning channel tags unconditionally: When a reasoning parser is active but no structured output is requested, reasoning channel tags are still applied (the struct_out is None branch).
Auto-inject # Response Formats into developer message: Per the Harmony cookbook, structured output requires both grammar enforcement (structural tags) and prompt guidance (a # Response Formats section in the developer message telling the model what schema to produce). When json_schema is requested, the schema is now automatically injected into the Harmony developer message, creating one even if no custom tools are present.
Fix structural tag block unreachable for GPT-OSS: The structural tag setup block was nested inside the else branch of if self.use_harmony:, making it dead code for GPT-OSS models (the primary target). Dedented so it runs unconditionally after context selection.
Preserve response format schema in system-instructions mode: The # Response Formats section was lost when VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS was enabled, because get_developer_message() dropped all instructions in that mode. Added a separate response_format_section parameter so the schema always reaches the developer message independently of instructions.

Approach

Rather than modifying StructuredOutputsParams to allow multiple simultaneous constraint types (which would require deep changes to validation, backends, and dispatch), we embed the content constraint inside the structural tag's <|channel|>final tag.

xgrammar's TagFormat.content field already accepts a discriminated union of JSONSchemaFormat, GrammarFormat, RegexFormat, etc. (defined in xgrammar/structural_tag.py). The infrastructure to "apply JSON schema grammar only within the <|channel|>final region" already exists — we just wire it up from the Responses API.

This means:

StructuredOutputsParams keeps its existing mutual-exclusivity invariant (one constraint type)
The constraint type used is structural_tag, which internally contains both reasoning channel enforcement AND the content constraint scoped to the final channel
xgrammar handles the compilation natively — no custom grammar composition needed
User-specified options (disable_any_whitespace, disable_fallback, etc.) are preserved via dataclasses.replace()

Prompt guidance via `# Response Formats`

The Harmony cookbook is explicit that structured output requires two complementary mechanisms:

Prompt guidance: A # Response Formats section in the developer message telling the model what schema to follow
Grammar enforcement: Constrained decoding ensuring output matches the schema

The cookbook states: "This prompt alone will, however, only influence the model's behavior but doesn't guarantee the full adherence to the schema." — grammar enforcement is the complement. This PR implements both sides: structural tags handle grammar enforcement (path 2), and inject_response_formats() handles prompt guidance (path 1).

Per the cookbook's role specification, # Response Formats belongs in the developer message (which holds instructions, function tools, and output format schemas), not the system message. When json_schema is requested but no custom tools are present, we now create a developer message specifically for the response format section.

Decisions We Made That Can Be Debated

1. Embed constraint inside structural tag vs. allow multiple constraint types on `StructuredOutputsParams`

What we chose: When a reasoning parser is active and a content constraint (json_schema, regex, grammar, choice) is present, we convert the content constraint into an xgrammar content format dict, embed it in the <|channel|>final tag within the structural tag, then clear the original constraint fields. The final StructuredOutputsParams has only structural_tag set.

Alternative: Modify StructuredOutputsParams to support multiple simultaneous constraint types (e.g. structural_tag + json). This would avoid the mid-pipeline mutation pattern where we clear fields after embedding them, but requires changes to validation logic, backend dispatch in StructuredOutputManager, and every guided decoding backend's understanding of what "one constraint" means.

Why we chose this: xgrammar's TagFormat.content field already supports this composition natively — the infrastructure exists and is tested. The mutual-exclusivity invariant on StructuredOutputsParams is load-bearing across the entire structured output stack, and relaxing it has a large blast radius.

What reviewers might disagree with: The mid-pipeline mutation (clearing json/regex/etc. after embedding) means StructuredOutputsParams no longer reflects what the user originally requested. If downstream code inspects these fields (e.g., for logging, metrics, or error messages), it will see None instead of the original constraint. An alternative could be to construct a fresh StructuredOutputsParams(structural_tag=...) rather than mutating via dataclasses.replace().

2. Fix `text.format` path rather than redirecting users to `structured_outputs` field

What we chose: We fix the standard OpenAI text.format path so that json_schema, json_object, and streaming all work correctly. Users can use either text.format (OpenAI-compatible) or the vLLM-specific structured_outputs field (#33709).

Alternative: Only support structured output through the vLLM-specific structured_outputs field and treat text.format as a passthrough/echo-only field (the status quo before this PR, where json_object was silently ignored).

Context: This is an area of active debate. In #33709, @yeqcharlotte and @chaunceyjiang questioned why structured output wasn't going through text.format instead of a separate field. In #33381, @chaunceyjiang argued vLLM-specific extensions should go through the OpenResponses extension mechanism. Meanwhile, @alecsolder defended the separate field for cross-provider reusability and separation of concerns. In #19097, vllm_-prefixed types were proposed but the RFC was auto-closed without implementation.

Why we chose this: Users coming from the OpenAI SDK will naturally use text.format.type=json_schema — it should just work. The structured_outputs field is additive for vLLM-specific capabilities (grammar, regex, choice) that text.format can't express. Fixing both paths costs little and prevents user confusion.

3. Remove `.model_dump()` vs. add `by_alias=True` for streaming alias bug

What we chose: Remove the .model_dump() call in the streaming path and pass the ResponsesResponse Pydantic object directly to ResponseCreatedEvent, matching how ResponseCompletedEvent already works. This is the approach from #34611.

Alternative: Keep .model_dump() but add by_alias=True so Pydantic serializes schema_ as "schema". This is the approach from #26356, which has community confirmation of working.

Why we chose this: Removing the unnecessary dict round-trip eliminates the entire class of alias bugs rather than patching one instance. This is consistent with @qandrew's own #26185 which previously removed a .model_dump() call on the ResponseCompletedEvent path for the same category of issue. The by_alias=True approach is fragile — any future alias field would break again if someone forgets the flag.

4. Apply reasoning channel tags even when no structured output is requested

What we chose: When struct_out is None and a reasoning parser is active, we now create a StructuredOutputsParams(structural_tag=...) with just the reasoning channel tags. Previously, the prepare_structured_tag() block was only entered when struct_out was already a StructuredOutputsParams instance.

Alternative: Keep the existing behavior where reasoning channel tags are only applied when the user explicitly requests some form of structured output.

Why we chose this: Without structural tags, GPT-OSS models emit raw Harmony format (<|channel|>analysis<|message|>...) that the reasoning parser must post-hoc parse. With structural tags, xgrammar enforces the channel structure at decode time, which is more robust and enables future optimizations. This also means the reasoning parser's is_reasoning_end state machine (which has had multi-turn bugs per #34454) is supplemented by grammar-level enforcement.

What reviewers might disagree with: This changes default behavior for all GPT-OSS requests that don't request structured output. If a model produces valid output without structural tags but would be over-constrained with them, this could cause regressions. We don't have e2e validation of this path yet.

5. `json_object` mapped to `{"type": "object"}` in structural tag content

What we chose: In _constraint_to_content_format(), json_object=True is converted to {"type": "json_schema", "json_schema": {"type": "object"}} for embedding in the structural tag.

Alternative: Map it to a dedicated json_object content format type if xgrammar supports one, or skip embedding entirely and let the existing json_object handling in the structured output backend handle it outside the structural tag.

Why we chose this: xgrammar's TagFormat.content expects one of its known format types (json_schema, regex, grammar, etc.). {"type": "object"} is the minimal JSON schema that enforces "output must be a JSON object" — semantically equivalent to json_object mode. This ensures the constraint is properly scoped to the <|channel|>final region for reasoning models rather than being applied globally.

6. Adding `final_content_format` parameter to the base class `prepare_structured_tag()`

What we chose: We added final_content_format: dict | None = None as an optional parameter on ReasoningParser.prepare_structured_tag() in the base class, with a default of None that preserves backward compatibility.

Alternative: Only add the parameter on GPTOSSReasoningParser and handle the dispatch in serving.py with a type check or capability flag. Or create a separate method like prepare_structured_tag_with_constraint().

Why we chose this: The base class change is backward-compatible (default None, existing implementations don't need changes). The concept of "scope this content constraint to the model's final output region" is generic — it's not GPT-OSS-specific. Other reasoning models (Qwen3, DeepSeek-R1, future models) with structural tag support would benefit from the same interface. Keeping it on the base class establishes a clean contract.

What reviewers might disagree with: This couples content constraint format knowledge (xgrammar dict format) to the reasoning parser interface. If vLLM ever supports a non-xgrammar structured output backend, this dict format may not apply. A more abstract interface (e.g., passing StructuredOutputsParams directly) might be more future-proof.

Related Issues, PRs, and RFCs

Directly Addressed by This PR

#	Title	Status	How This PR Relates
#34857	Responses API & Tool Calling H1 2026 roadmap	Open	Explicitly lists "guided decode and structured outputs" as focus area. This PR delivers that.
#23120	Structured output not correctly enforced with GPT-OSS	Open	Root cause: grammar bitmasks applied from token 0 without structural tag channel separation. This PR fixes the Responses API path.
#26288	`schema` field becomes `None` in streaming with json_schema	Closed	Root-cause analysis of the `schema_`/`schema` alias bug in streaming. The `.model_dump()` removal in this PR fixes it.
#34611	Fix ResponseCreatedEvent ValidationError for json_schema in streaming	Open	Proposes removing `.model_dump()` in streaming path. We adopt this approach.
#26356	Fix json schema alias serializing when streaming	Open	Alternative fix (add `by_alias=True`). We prefer #34611's approach (pass objects directly).
#26822	Fix crash when `text` type response_format received	Merged	Added validation for `type: "text"` passthrough. Our `json_object` handling follows the same pattern.
#26639	ValueError: No valid structured output parameter found	Closed (by #26822)	The `json_object` gap in the Responses API could produce similar errors. Our Step 1 prevents this.

Foundation This PR Builds On

#	Title	Status	Relevance
#33709	Enable generic `structured_outputs` for responses API	Merged	Added the `structured_outputs` field to ResponsesRequest. Our work builds on this.
#32609	Add sampling parameters to Responses API	Merged	Established `to_sampling_params()` infrastructure on ResponsesRequest.
#32712	Initial Parser for Responses API	Merged	Introduced `Parser`/`ParserManager` and the structural tag preparation block we're extending.
#34454	Fix structured output in multi-turn GPT-OSS	Merged	Fixed premature grammar bitmask activation from previous-turn markers. Our structural tag approach inherently avoids this class of bug by constraining grammar to the `<
#32791	chat.completions returns null for GPT-OSS multi-turn with json_object	Closed (by #34454)	Same root cause as #23120. Our approach prevents this by design.

Related PRs (same problem space)

#	Title	Status	Relevance
#37388	Fix structural_tag bitmask not applied on reasoning models	Closed (not merged)	Attempted to fix the `reasoning_ended` gate that prevents structural tag bitmasks from being applied during reasoning. Our approach inherently avoids this problem — the grammar handles reasoning/content boundaries internally via triggers, so the external `reasoning_ended` gate doesn't interfere.
#36915	Consolidate GPT-OSS reasoning parser tests	Merged	Reorganized test file structure; our new `test_gptoss_structural_tags.py` follows the consolidated layout.
#37433	[Responses API] tool_choice support for GPT-OSS	Draft (ours)	Downstream dependent — extends `prepare_structured_tag()` with `tool_choice` + `function_tools` params, building on the `final_content_format` infrastructure introduced here.

Related RFCs

#	Title	Status	Design Decision
#19097	RFC: Response format extensions for structured outputs	Closed	Led to the `structured_outputs` field. We reuse `StructuredOutputsParams` rather than creating new types.
#33381	RFC: Align with openresponses spec	Open	Argues vLLM-specific extensions should go through extension mechanism. Decision: We keep the existing `structured_outputs` field (already merged in #33709) and also fix the standard `text.format` path. No new protocol extensions.
#29632	RFC: Force EOS when grammar terminates	Open	When grammar is satisfied, model may not produce EOS immediately. Out of scope for this PR but noted as a follow-up.
#16313	Support structured output + tool call together	Open	Tool calls + JSON schema in one request. Our structural tag approach naturally supports this since tool channels and the final channel are independent tags.
#33249	Add `structured_outputs` as instance field on ResponsesRequest	Open	Promotes `structured_outputs` from local var to field for tool parser mutation. Compatible with our changes; we support both the field path and the `text.format` path.

Changes

File	Change
`vllm/entrypoints/openai/responses/protocol.py`	Add `json_object` handling in `to_sampling_params()`
`vllm/entrypoints/openai/parser/harmony_utils.py`	Add `inject_response_formats()` helper; add `response_format_section` param to `get_developer_message()` so schema is preserved independently of instructions (fixes system-instructions mode)
`vllm/entrypoints/openai/responses/serving.py`	Add `_extract_response_format_schema()` and `_constraint_to_content_format()` helpers; inject response formats into developer message; dedent structural tag block out of `else` branch so it runs for both Harmony and non-Harmony paths; split response format from instructions at developer message call site; fix streaming `.model_dump()`
`vllm/reasoning/abs_reasoning_parsers.py`	Add `final_content_format` param to `prepare_structured_tag()` base class
`vllm/reasoning/gptoss_reasoning_parser.py`	Implement `final_content_format` — append `<
`tests/entrypoints/openai/responses/test_structured_output.py`	New — unit tests for `_constraint_to_content_format`
`tests/v1/structured_output/test_gptoss_structural_tags.py`	New — structural tag tests including constraint embedding
`tests/entrypoints/openai/responses/test_sampling_params.py`	Extend with `json_object` test
`tests/entrypoints/openai/responses/test_response_formats.py`	New — tests for `_extract_response_format_schema()`
`tests/entrypoints/openai/parser/test_harmony_utils.py`	Add `TestInjectResponseFormats` — tests for `inject_response_formats()`; add `TestGetDeveloperMessageResponseFormats` — tests for `response_format_section` param behavior with/without system-instructions mode

Test plan

Unit tests pass: pytest tests/entrypoints/openai/responses/test_structured_output.py tests/v1/structured_output/test_gptoss_structural_tags.py tests/entrypoints/openai/responses/test_sampling_params.py tests/entrypoints/openai/responses/test_response_formats.py tests/entrypoints/openai/parser/test_harmony_utils.py::TestInjectResponseFormats tests/entrypoints/openai/parser/test_harmony_utils.py::TestGetDeveloperMessageResponseFormats -v
TestGetDeveloperMessageResponseFormats: verifies response format preserved/dropped correctly with and without system-instructions mode
No regressions in responses unit tests
Pre-commit passes on all changed files
E2e with GPT-OSS model: verify json_schema + reasoning produces valid JSON with reasoning properly separated
E2e with Qwen3: verify json_schema, json_object, and streaming all work (non-regression)

Out of Scope (follow-ups)

Chat Completions has the same gap (#23120) — serving_chat.py never calls prepare_structured_tag(). Same fix pattern applies but is a separate PR targeting the chat completions path.
strict field forwarding from ResponseFormatTextJSONSchemaConfig — low priority, vLLM always enforces strictly.
Force EOS when grammar terminates (#29632) — separate design discussion affecting all APIs.
OpenResponses alignment (#33381) — policy decision about whether structured_outputs should go through extension mechanism.
structured_outputs as instance field (#33249) — promotes structured_outputs from local variable to field for tool parser mutation. Compatible with our changes but independent concern.

gemini-code-assist

Code Review

This pull request significantly enhances the structured output capabilities of the Responses API, particularly for reasoning models, by embedding content constraints within structural tags, introducing json_object format support, fixing a streaming bug related to Pydantic model serialization, and improving robustness with reasoning channel tags. A security review found no vulnerabilities. The changes are well-designed, thoroughly tested, and well-documented, with no high or critical severity issues identified by the code review.

mergify · 2026-03-05T17:27:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…d tags Extend prepare_structured_tag() to be the single authority for all generation constraints in GPT-OSS Harmony models: channel structure, tool enforcement, argument validation, and content constraints. tool_choice=required support: - New from_function_tool_to_tag() and tag_with_function_tools() helpers - prepare_structured_tag() extended with tool_choice, function_tools params - Channel blocking: omit <|channel|>final trigger to force tool calls - Remove NotImplementedError for non-auto tool_choice in Harmony path Absorbed from upstream PR vllm-project#35904 (structured output + reasoning): - Content constraint embedding in <|channel|>final tag - _constraint_to_content_format() and _extract_response_format_schema() - struct_out is None branch (reasoning tags always applied) - inject_response_formats() for Harmony cookbook compliance - json_object format handling (was silently ignored) - Streaming .model_dump() alias bug fix

…d tags Extend prepare_structured_tag() to be the single authority for all generation constraints in GPT-OSS Harmony models: channel structure, tool enforcement, argument validation, and content constraints. tool_choice=required support: - New from_function_tool_to_tag() and tag_with_function_tools() helpers - prepare_structured_tag() extended with tool_choice, function_tools params - Channel blocking: omit <|channel|>final trigger to force tool calls - Remove NotImplementedError for non-auto tool_choice in Harmony path Absorbed from upstream PR vllm-project#35904 (structured output + reasoning): - Content constraint embedding in <|channel|>final tag - _constraint_to_content_format() and _extract_response_format_schema() - struct_out is None branch (reasoning tags always applied) - inject_response_formats() for Harmony cookbook compliance - json_object format handling (was silently ignored) - Streaming .model_dump() alias bug fix Signed-off-by: Will Deines <will@garr.io>

…dding When a user requests JSON schema enforcement (text.format.type=json_schema) with a reasoning model (GPT-OSS), the grammar constraint was never scoped to the final output channel. This caused grammar bitmasks to be applied from token 0, clobbering reasoning output. Fix by embedding content constraints (json_schema, json_object, regex, grammar, choice) inside the structural tag's <|channel|>final region using xgrammar's native TriggeredTagsFormat support. This ensures grammar enforcement only applies within the final output region, not during reasoning. Also: - Handle text.format.type=json_object (was silently ignored) - Fix streaming + json_schema alias bug (.model_dump() dropped schema alias) - Apply reasoning channel tags even when no structured output is requested Signed-off-by: Will Deines <will@garr.io>

…nstraints When creating a new StructuredOutputsParams with the structural_tag, use dataclasses.replace() to clear content constraint fields while preserving user-specified options like disable_any_whitespace, disable_fallback, disable_additional_properties, and whitespace_pattern. Signed-off-by: Will Deines <will@garr.io>

… message Per the Harmony cookbook, structured output requires both grammar enforcement (structural tags) and prompt guidance (a # Response Formats section in the developer message). This injects the response format schema into the developer message when json_schema is requested, creating a developer message even without custom tools if needed. Signed-off-by: Will Deines <will@garr.io>

Handled by .git/info/exclude on feature branches, force-added on production/garrio-release. Signed-off-by: Will Deines <will@garr.io>

…d tags Extend prepare_structured_tag() to be the single authority for all generation constraints in GPT-OSS Harmony models: channel structure, tool enforcement, argument validation, and content constraints. tool_choice=required support: - New from_function_tool_to_tag() and tag_with_function_tools() helpers - prepare_structured_tag() extended with tool_choice, function_tools params - Channel blocking: omit <|channel|>final trigger to force tool calls - Remove NotImplementedError for non-auto tool_choice in Harmony path Absorbed from upstream PR vllm-project#35904 (structured output + reasoning): - Content constraint embedding in <|channel|>final tag - _constraint_to_content_format() and _extract_response_format_schema() - struct_out is None branch (reasoning tags always applied) - inject_response_formats() for Harmony cookbook compliance - json_object format handling (was silently ignored) - Streaming .model_dump() alias bug fix Signed-off-by: Will Deines <will@garr.io>

StructuredOutputsParams.json can be str | dict | None but the return type is dict | None. Parse the string case with json.loads so mypy is satisfied and string schemas work correctly at runtime. Signed-off-by: Will Deines <will@garr.io>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fa603e0a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T17:13:29Z

vllm/entrypoints/openai/responses/serving.py

+                if self.use_harmony:
+                    if request.stream:
+                        context = StreamingHarmonyContext(messages, available_tools)
+                    else:
+                        context = HarmonyContext(messages, available_tools)


Move structural-tag setup out of the non-Harmony branch

For GPT-OSS Responses requests, self.use_harmony is always true (vllm/entrypoints/openai/responses/serving.py:278), so execution takes this branch and never reaches the new reasoning_parser.prepare_structured_tag(...) logic under else. In the main target scenario of this change, text.format=json_schema/json_object therefore stays as a normal guided-decoding constraint on sampling_params instead of being embedded into <|channel|>final, so reasoning tokens are still constrained from token 0 exactly as before.

Useful? React with 👍 / 👎.

Fixed in a22f008. Dedented the structural tag block out of the else branch so it now runs unconditionally after context selection, for both Harmony and non-Harmony paths.

chatgpt-codex-connector · 2026-03-18T17:13:29Z

vllm/entrypoints/openai/responses/serving.py

+                dev_instructions = request.instructions
+                if response_format_schema is not None:
+                    dev_instructions = inject_response_formats(
+                        dev_instructions, response_format_schema
+                    )


Preserve injected schema when system instructions are enabled

The new schema guidance is only appended to dev_instructions here, but get_developer_message() drops instructions whenever VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS is set (vllm/entrypoints/openai/parser/harmony_utils.py:127). _construct_harmony_system_input_message() still sends the unmodified request.instructions to get_system_message() (serving.py:1211-1218), so in that deployment mode the # Response Formats section never reaches either prompt. GPT-OSS users running with system-instruction mode therefore lose the prompt-side schema guidance this patch is supposed to add.

Useful? React with 👍 / 👎.

Fixed in a22f008. Added a separate response_format_section parameter to get_developer_message() so the schema is passed independently of instructions. When VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS is enabled, user instructions are routed to the system message as before, but the # Response Formats section still reaches the developer message. Added TestGetDeveloperMessageResponseFormats with 4 tests covering both modes.

…em-instructions mode Fix two bugs identified in PR review: 1. The structural tag setup block was nested inside the `else` branch of `if self.use_harmony:`, making it unreachable for GPT-OSS (the primary target). Dedent the block so it runs unconditionally after context selection. 2. The `# Response Formats` schema section was lost when VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS was enabled, because get_developer_message() dropped all instructions in that mode. Add a separate response_format_section parameter so the schema is always included in the developer message regardless of the system-instructions flag. Signed-off-by: Will Deines <will@garr.io>

…structured-output Signed-off-by: Will Deines <will@garr.io>

mergify · 2026-03-25T18:56:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added frontend gpt-oss Related to GPT-OSS models structured-output v1 labels Mar 3, 2026

github-project-automation bot added this to Structured Output and gpt-oss Issues & Enhancements Mar 3, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 3, 2026

merge: structured output + reasoning (PR vllm-project#35904)

61d4a3c

will-deines force-pushed the worktree-responses-structured-output branch from 6b69e1a to 76faff3 Compare March 4, 2026 20:12

mergify bot added the needs-rebase label Mar 5, 2026

will-deines mentioned this pull request Mar 17, 2026

[Responses API] Unified tool_choice + structured output via triggered tags will-deines/vllm#1

Closed

7 tasks

will-deines mentioned this pull request Mar 18, 2026

[Responses API] tool_choice support (auto / required / none) for GPT-OSS #37433

Open

12 tasks

garrio-1 added 4 commits March 18, 2026 09:40

fix: remove local_test/ from .gitignore

9314187

Handled by .git/info/exclude on feature branches, force-added on production/garrio-release. Signed-off-by: Will Deines <will@garr.io>

will-deines force-pushed the worktree-responses-structured-output branch from 76faff3 to c85af4a Compare March 18, 2026 13:48

mergify bot removed the needs-rebase label Mar 18, 2026

will-deines force-pushed the worktree-responses-structured-output branch from c85af4a to a2893f6 Compare March 18, 2026 14:33

Merge branch 'main' into worktree-responses-structured-output

4fa603e

will-deines marked this pull request as ready for review March 18, 2026 17:08

will-deines requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, mgoin, robertgshaw2-redhat and russellb as code owners March 18, 2026 17:08

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

will-deines force-pushed the worktree-responses-structured-output branch from a22f008 to dd23bae Compare March 18, 2026 18:31

Merge remote-tracking branch 'upstream/main' into worktree-responses-…

a1c82a9

…structured-output Signed-off-by: Will Deines <will@garr.io>

mergify bot added the needs-rebase label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Responses API] Structured output + reasoning via structural tag embedding#35904

[Responses API] Structured output + reasoning via structural tag embedding#35904
will-deines wants to merge 8 commits intovllm-project:mainfrom
will-deines:worktree-responses-structured-output

will-deines commented Mar 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

will-deines Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

will-deines Mar 18, 2026

Uh oh!

mergify bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

will-deines commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Prompt guidance via # Response Formats

Decisions We Made That Can Be Debated

1. Embed constraint inside structural tag vs. allow multiple constraint types on StructuredOutputsParams

2. Fix text.format path rather than redirecting users to structured_outputs field

3. Remove .model_dump() vs. add by_alias=True for streaming alias bug

4. Apply reasoning channel tags even when no structured output is requested

5. json_object mapped to {"type": "object"} in structural tag content

6. Adding final_content_format parameter to the base class prepare_structured_tag()

Related Issues, PRs, and RFCs

Directly Addressed by This PR

Foundation This PR Builds On

Related PRs (same problem space)

Related RFCs

Changes

Test plan

Out of Scope (follow-ups)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

will-deines commented Mar 3, 2026 •

edited

Loading

Prompt guidance via `# Response Formats`

1. Embed constraint inside structural tag vs. allow multiple constraint types on `StructuredOutputsParams`

2. Fix `text.format` path rather than redirecting users to `structured_outputs` field

3. Remove `.model_dump()` vs. add `by_alias=True` for streaming alias bug

5. `json_object` mapped to `{"type": "object"}` in structural tag content

6. Adding `final_content_format` parameter to the base class `prepare_structured_tag()`