Skip to content

[Responses API] Structured output + reasoning via structural tag embedding#35904

Open
will-deines wants to merge 8 commits intovllm-project:mainfrom
will-deines:worktree-responses-structured-output
Open

[Responses API] Structured output + reasoning via structural tag embedding#35904
will-deines wants to merge 8 commits intovllm-project:mainfrom
will-deines:worktree-responses-structured-output

Conversation

@will-deines
Copy link
Copy Markdown

@will-deines will-deines commented Mar 3, 2026

Recreated from #35873, which was closed when the fork was temporarily made private.

Summary

  • Embed content constraints in structural tags: When a user requests JSON schema enforcement (text.format.type=json_schema) with a GPT-OSS reasoning model, the grammar constraint is now scoped to the <|channel|>final region via xgrammar's TriggeredTagsFormat. Previously, grammar bitmasks were applied from token 0, clobbering reasoning output.
  • Handle json_object format: text.format.type=json_object was silently ignored in the Responses API. Now produces StructuredOutputsParams(json_object=True), matching chat completions behavior.
  • Fix streaming + json_schema alias bug: Remove .model_dump() in the streaming path that dropped the schemaschema_ Pydantic alias, causing ResponseCreatedEvent deserialization failures.
  • Apply reasoning channel tags unconditionally: When a reasoning parser is active but no structured output is requested, reasoning channel tags are still applied (the struct_out is None branch).
  • Auto-inject # Response Formats into developer message: Per the Harmony cookbook, structured output requires both grammar enforcement (structural tags) and prompt guidance (a # Response Formats section in the developer message telling the model what schema to produce). When json_schema is requested, the schema is now automatically injected into the Harmony developer message, creating one even if no custom tools are present.
  • Fix structural tag block unreachable for GPT-OSS: The structural tag setup block was nested inside the else branch of if self.use_harmony:, making it dead code for GPT-OSS models (the primary target). Dedented so it runs unconditionally after context selection.
  • Preserve response format schema in system-instructions mode: The # Response Formats section was lost when VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS was enabled, because get_developer_message() dropped all instructions in that mode. Added a separate response_format_section parameter so the schema always reaches the developer message independently of instructions.

Approach

Rather than modifying StructuredOutputsParams to allow multiple simultaneous constraint types (which would require deep changes to validation, backends, and dispatch), we embed the content constraint inside the structural tag's <|channel|>final tag.

xgrammar's TagFormat.content field already accepts a discriminated union of JSONSchemaFormat, GrammarFormat, RegexFormat, etc. (defined in xgrammar/structural_tag.py). The infrastructure to "apply JSON schema grammar only within the <|channel|>final region" already exists — we just wire it up from the Responses API.

This means:

  • StructuredOutputsParams keeps its existing mutual-exclusivity invariant (one constraint type)
  • The constraint type used is structural_tag, which internally contains both reasoning channel enforcement AND the content constraint scoped to the final channel
  • xgrammar handles the compilation natively — no custom grammar composition needed
  • User-specified options (disable_any_whitespace, disable_fallback, etc.) are preserved via dataclasses.replace()

Prompt guidance via # Response Formats

The Harmony cookbook is explicit that structured output requires two complementary mechanisms:

  1. Prompt guidance: A # Response Formats section in the developer message telling the model what schema to follow
  2. Grammar enforcement: Constrained decoding ensuring output matches the schema

The cookbook states: "This prompt alone will, however, only influence the model's behavior but doesn't guarantee the full adherence to the schema." — grammar enforcement is the complement. This PR implements both sides: structural tags handle grammar enforcement (path 2), and inject_response_formats() handles prompt guidance (path 1).

Per the cookbook's role specification, # Response Formats belongs in the developer message (which holds instructions, function tools, and output format schemas), not the system message. When json_schema is requested but no custom tools are present, we now create a developer message specifically for the response format section.

Decisions We Made That Can Be Debated

1. Embed constraint inside structural tag vs. allow multiple constraint types on StructuredOutputsParams

What we chose: When a reasoning parser is active and a content constraint (json_schema, regex, grammar, choice) is present, we convert the content constraint into an xgrammar content format dict, embed it in the <|channel|>final tag within the structural tag, then clear the original constraint fields. The final StructuredOutputsParams has only structural_tag set.

Alternative: Modify StructuredOutputsParams to support multiple simultaneous constraint types (e.g. structural_tag + json). This would avoid the mid-pipeline mutation pattern where we clear fields after embedding them, but requires changes to validation logic, backend dispatch in StructuredOutputManager, and every guided decoding backend's understanding of what "one constraint" means.

Why we chose this: xgrammar's TagFormat.content field already supports this composition natively — the infrastructure exists and is tested. The mutual-exclusivity invariant on StructuredOutputsParams is load-bearing across the entire structured output stack, and relaxing it has a large blast radius.

What reviewers might disagree with: The mid-pipeline mutation (clearing json/regex/etc. after embedding) means StructuredOutputsParams no longer reflects what the user originally requested. If downstream code inspects these fields (e.g., for logging, metrics, or error messages), it will see None instead of the original constraint. An alternative could be to construct a fresh StructuredOutputsParams(structural_tag=...) rather than mutating via dataclasses.replace().

2. Fix text.format path rather than redirecting users to structured_outputs field

What we chose: We fix the standard OpenAI text.format path so that json_schema, json_object, and streaming all work correctly. Users can use either text.format (OpenAI-compatible) or the vLLM-specific structured_outputs field (#33709).

Alternative: Only support structured output through the vLLM-specific structured_outputs field and treat text.format as a passthrough/echo-only field (the status quo before this PR, where json_object was silently ignored).

Context: This is an area of active debate. In #33709, @yeqcharlotte and @chaunceyjiang questioned why structured output wasn't going through text.format instead of a separate field. In #33381, @chaunceyjiang argued vLLM-specific extensions should go through the OpenResponses extension mechanism. Meanwhile, @alecsolder defended the separate field for cross-provider reusability and separation of concerns. In #19097, vllm_-prefixed types were proposed but the RFC was auto-closed without implementation.

Why we chose this: Users coming from the OpenAI SDK will naturally use text.format.type=json_schema — it should just work. The structured_outputs field is additive for vLLM-specific capabilities (grammar, regex, choice) that text.format can't express. Fixing both paths costs little and prevents user confusion.

3. Remove .model_dump() vs. add by_alias=True for streaming alias bug

What we chose: Remove the .model_dump() call in the streaming path and pass the ResponsesResponse Pydantic object directly to ResponseCreatedEvent, matching how ResponseCompletedEvent already works. This is the approach from #34611.

Alternative: Keep .model_dump() but add by_alias=True so Pydantic serializes schema_ as "schema". This is the approach from #26356, which has community confirmation of working.

Why we chose this: Removing the unnecessary dict round-trip eliminates the entire class of alias bugs rather than patching one instance. This is consistent with @qandrew's own #26185 which previously removed a .model_dump() call on the ResponseCompletedEvent path for the same category of issue. The by_alias=True approach is fragile — any future alias field would break again if someone forgets the flag.

4. Apply reasoning channel tags even when no structured output is requested

What we chose: When struct_out is None and a reasoning parser is active, we now create a StructuredOutputsParams(structural_tag=...) with just the reasoning channel tags. Previously, the prepare_structured_tag() block was only entered when struct_out was already a StructuredOutputsParams instance.

Alternative: Keep the existing behavior where reasoning channel tags are only applied when the user explicitly requests some form of structured output.

Why we chose this: Without structural tags, GPT-OSS models emit raw Harmony format (<|channel|>analysis<|message|>...) that the reasoning parser must post-hoc parse. With structural tags, xgrammar enforces the channel structure at decode time, which is more robust and enables future optimizations. This also means the reasoning parser's is_reasoning_end state machine (which has had multi-turn bugs per #34454) is supplemented by grammar-level enforcement.

What reviewers might disagree with: This changes default behavior for all GPT-OSS requests that don't request structured output. If a model produces valid output without structural tags but would be over-constrained with them, this could cause regressions. We don't have e2e validation of this path yet.

5. json_object mapped to {"type": "object"} in structural tag content

What we chose: In _constraint_to_content_format(), json_object=True is converted to {"type": "json_schema", "json_schema": {"type": "object"}} for embedding in the structural tag.

Alternative: Map it to a dedicated json_object content format type if xgrammar supports one, or skip embedding entirely and let the existing json_object handling in the structured output backend handle it outside the structural tag.

Why we chose this: xgrammar's TagFormat.content expects one of its known format types (json_schema, regex, grammar, etc.). {"type": "object"} is the minimal JSON schema that enforces "output must be a JSON object" — semantically equivalent to json_object mode. This ensures the constraint is properly scoped to the <|channel|>final region for reasoning models rather than being applied globally.

6. Adding final_content_format parameter to the base class prepare_structured_tag()

What we chose: We added final_content_format: dict | None = None as an optional parameter on ReasoningParser.prepare_structured_tag() in the base class, with a default of None that preserves backward compatibility.

Alternative: Only add the parameter on GPTOSSReasoningParser and handle the dispatch in serving.py with a type check or capability flag. Or create a separate method like prepare_structured_tag_with_constraint().

Why we chose this: The base class change is backward-compatible (default None, existing implementations don't need changes). The concept of "scope this content constraint to the model's final output region" is generic — it's not GPT-OSS-specific. Other reasoning models (Qwen3, DeepSeek-R1, future models) with structural tag support would benefit from the same interface. Keeping it on the base class establishes a clean contract.

What reviewers might disagree with: This couples content constraint format knowledge (xgrammar dict format) to the reasoning parser interface. If vLLM ever supports a non-xgrammar structured output backend, this dict format may not apply. A more abstract interface (e.g., passing StructuredOutputsParams directly) might be more future-proof.

Related Issues, PRs, and RFCs

Directly Addressed by This PR

# Title Status How This PR Relates
#34857 Responses API & Tool Calling H1 2026 roadmap Open Explicitly lists "guided decode and structured outputs" as focus area. This PR delivers that.
#23120 Structured output not correctly enforced with GPT-OSS Open Root cause: grammar bitmasks applied from token 0 without structural tag channel separation. This PR fixes the Responses API path.
#26288 schema field becomes None in streaming with json_schema Closed Root-cause analysis of the schema_/schema alias bug in streaming. The .model_dump() removal in this PR fixes it.
#34611 Fix ResponseCreatedEvent ValidationError for json_schema in streaming Open Proposes removing .model_dump() in streaming path. We adopt this approach.
#26356 Fix json schema alias serializing when streaming Open Alternative fix (add by_alias=True). We prefer #34611's approach (pass objects directly).
#26822 Fix crash when text type response_format received Merged Added validation for type: "text" passthrough. Our json_object handling follows the same pattern.
#26639 ValueError: No valid structured output parameter found Closed (by #26822) The json_object gap in the Responses API could produce similar errors. Our Step 1 prevents this.

Foundation This PR Builds On

# Title Status Relevance
#33709 Enable generic structured_outputs for responses API Merged Added the structured_outputs field to ResponsesRequest. Our work builds on this.
#32609 Add sampling parameters to Responses API Merged Established to_sampling_params() infrastructure on ResponsesRequest.
#32712 Initial Parser for Responses API Merged Introduced Parser/ParserManager and the structural tag preparation block we're extending.
#34454 Fix structured output in multi-turn GPT-OSS Merged Fixed premature grammar bitmask activation from previous-turn markers. Our structural tag approach inherently avoids this class of bug by constraining grammar to the `<
#32791 chat.completions returns null for GPT-OSS multi-turn with json_object Closed (by #34454) Same root cause as #23120. Our approach prevents this by design.

Related PRs (same problem space)

# Title Status Relevance
#37388 Fix structural_tag bitmask not applied on reasoning models Closed (not merged) Attempted to fix the reasoning_ended gate that prevents structural tag bitmasks from being applied during reasoning. Our approach inherently avoids this problem — the grammar handles reasoning/content boundaries internally via triggers, so the external reasoning_ended gate doesn't interfere.
#36915 Consolidate GPT-OSS reasoning parser tests Merged Reorganized test file structure; our new test_gptoss_structural_tags.py follows the consolidated layout.
#37433 [Responses API] tool_choice support for GPT-OSS Draft (ours) Downstream dependent — extends prepare_structured_tag() with tool_choice + function_tools params, building on the final_content_format infrastructure introduced here.

Related RFCs

# Title Status Design Decision
#19097 RFC: Response format extensions for structured outputs Closed Led to the structured_outputs field. We reuse StructuredOutputsParams rather than creating new types.
#33381 RFC: Align with openresponses spec Open Argues vLLM-specific extensions should go through extension mechanism. Decision: We keep the existing structured_outputs field (already merged in #33709) and also fix the standard text.format path. No new protocol extensions.
#29632 RFC: Force EOS when grammar terminates Open When grammar is satisfied, model may not produce EOS immediately. Out of scope for this PR but noted as a follow-up.
#16313 Support structured output + tool call together Open Tool calls + JSON schema in one request. Our structural tag approach naturally supports this since tool channels and the final channel are independent tags.
#33249 Add structured_outputs as instance field on ResponsesRequest Open Promotes structured_outputs from local var to field for tool parser mutation. Compatible with our changes; we support both the field path and the text.format path.

Changes

File Change
vllm/entrypoints/openai/responses/protocol.py Add json_object handling in to_sampling_params()
vllm/entrypoints/openai/parser/harmony_utils.py Add inject_response_formats() helper; add response_format_section param to get_developer_message() so schema is preserved independently of instructions (fixes system-instructions mode)
vllm/entrypoints/openai/responses/serving.py Add _extract_response_format_schema() and _constraint_to_content_format() helpers; inject response formats into developer message; dedent structural tag block out of else branch so it runs for both Harmony and non-Harmony paths; split response format from instructions at developer message call site; fix streaming .model_dump()
vllm/reasoning/abs_reasoning_parsers.py Add final_content_format param to prepare_structured_tag() base class
vllm/reasoning/gptoss_reasoning_parser.py Implement final_content_format — append `<
tests/entrypoints/openai/responses/test_structured_output.py New — unit tests for _constraint_to_content_format
tests/v1/structured_output/test_gptoss_structural_tags.py New — structural tag tests including constraint embedding
tests/entrypoints/openai/responses/test_sampling_params.py Extend with json_object test
tests/entrypoints/openai/responses/test_response_formats.py New — tests for _extract_response_format_schema()
tests/entrypoints/openai/parser/test_harmony_utils.py Add TestInjectResponseFormats — tests for inject_response_formats(); add TestGetDeveloperMessageResponseFormats — tests for response_format_section param behavior with/without system-instructions mode

Test plan

  • Unit tests pass: pytest tests/entrypoints/openai/responses/test_structured_output.py tests/v1/structured_output/test_gptoss_structural_tags.py tests/entrypoints/openai/responses/test_sampling_params.py tests/entrypoints/openai/responses/test_response_formats.py tests/entrypoints/openai/parser/test_harmony_utils.py::TestInjectResponseFormats tests/entrypoints/openai/parser/test_harmony_utils.py::TestGetDeveloperMessageResponseFormats -v
  • TestGetDeveloperMessageResponseFormats: verifies response format preserved/dropped correctly with and without system-instructions mode
  • No regressions in responses unit tests
  • Pre-commit passes on all changed files
  • E2e with GPT-OSS model: verify json_schema + reasoning produces valid JSON with reasoning properly separated
  • E2e with Qwen3: verify json_schema, json_object, and streaming all work (non-regression)

Out of Scope (follow-ups)

  • Chat Completions has the same gap (#23120) — serving_chat.py never calls prepare_structured_tag(). Same fix pattern applies but is a separate PR targeting the chat completions path.
  • strict field forwarding from ResponseFormatTextJSONSchemaConfig — low priority, vLLM always enforces strictly.
  • Force EOS when grammar terminates (#29632) — separate design discussion affecting all APIs.
  • OpenResponses alignment (#33381) — policy decision about whether structured_outputs should go through extension mechanism.
  • structured_outputs as instance field (#33249) — promotes structured_outputs from local variable to field for tool parser mutation. Compatible with our changes but independent concern.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the structured output capabilities of the Responses API, particularly for reasoning models, by embedding content constraints within structural tags, introducing json_object format support, fixing a streaming bug related to Pydantic model serialization, and improving robustness with reasoning channel tags. A security review found no vulnerabilities. The changes are well-designed, thoroughly tested, and well-documented, with no high or critical severity issues identified by the code review.

will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 3, 2026
@will-deines will-deines force-pushed the worktree-responses-structured-output branch from 6b69e1a to 76faff3 Compare March 4, 2026 20:12
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 5, 2026
will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 17, 2026
…d tags

Extend prepare_structured_tag() to be the single authority for all
generation constraints in GPT-OSS Harmony models: channel structure,
tool enforcement, argument validation, and content constraints.

tool_choice=required support:
- New from_function_tool_to_tag() and tag_with_function_tools() helpers
- prepare_structured_tag() extended with tool_choice, function_tools params
- Channel blocking: omit <|channel|>final trigger to force tool calls
- Remove NotImplementedError for non-auto tool_choice in Harmony path

Absorbed from upstream PR vllm-project#35904 (structured output + reasoning):
- Content constraint embedding in <|channel|>final tag
- _constraint_to_content_format() and _extract_response_format_schema()
- struct_out is None branch (reasoning tags always applied)
- inject_response_formats() for Harmony cookbook compliance
- json_object format handling (was silently ignored)
- Streaming .model_dump() alias bug fix
will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 17, 2026
…d tags

Extend prepare_structured_tag() to be the single authority for all
generation constraints in GPT-OSS Harmony models: channel structure,
tool enforcement, argument validation, and content constraints.

tool_choice=required support:
- New from_function_tool_to_tag() and tag_with_function_tools() helpers
- prepare_structured_tag() extended with tool_choice, function_tools params
- Channel blocking: omit <|channel|>final trigger to force tool calls
- Remove NotImplementedError for non-auto tool_choice in Harmony path

Absorbed from upstream PR vllm-project#35904 (structured output + reasoning):
- Content constraint embedding in <|channel|>final tag
- _constraint_to_content_format() and _extract_response_format_schema()
- struct_out is None branch (reasoning tags always applied)
- inject_response_formats() for Harmony cookbook compliance
- json_object format handling (was silently ignored)
- Streaming .model_dump() alias bug fix
will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 17, 2026
…d tags

Extend prepare_structured_tag() to be the single authority for all
generation constraints in GPT-OSS Harmony models: channel structure,
tool enforcement, argument validation, and content constraints.

tool_choice=required support:
- New from_function_tool_to_tag() and tag_with_function_tools() helpers
- prepare_structured_tag() extended with tool_choice, function_tools params
- Channel blocking: omit <|channel|>final trigger to force tool calls
- Remove NotImplementedError for non-auto tool_choice in Harmony path

Absorbed from upstream PR vllm-project#35904 (structured output + reasoning):
- Content constraint embedding in <|channel|>final tag
- _constraint_to_content_format() and _extract_response_format_schema()
- struct_out is None branch (reasoning tags always applied)
- inject_response_formats() for Harmony cookbook compliance
- json_object format handling (was silently ignored)
- Streaming .model_dump() alias bug fix
will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 18, 2026
…d tags

Extend prepare_structured_tag() to be the single authority for all
generation constraints in GPT-OSS Harmony models: channel structure,
tool enforcement, argument validation, and content constraints.

tool_choice=required support:
- New from_function_tool_to_tag() and tag_with_function_tools() helpers
- prepare_structured_tag() extended with tool_choice, function_tools params
- Channel blocking: omit <|channel|>final trigger to force tool calls
- Remove NotImplementedError for non-auto tool_choice in Harmony path

Absorbed from upstream PR vllm-project#35904 (structured output + reasoning):
- Content constraint embedding in <|channel|>final tag
- _constraint_to_content_format() and _extract_response_format_schema()
- struct_out is None branch (reasoning tags always applied)
- inject_response_formats() for Harmony cookbook compliance
- json_object format handling (was silently ignored)
- Streaming .model_dump() alias bug fix
will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 18, 2026
…d tags

Extend prepare_structured_tag() to be the single authority for all
generation constraints in GPT-OSS Harmony models: channel structure,
tool enforcement, argument validation, and content constraints.

tool_choice=required support:
- New from_function_tool_to_tag() and tag_with_function_tools() helpers
- prepare_structured_tag() extended with tool_choice, function_tools params
- Channel blocking: omit <|channel|>final trigger to force tool calls
- Remove NotImplementedError for non-auto tool_choice in Harmony path

Absorbed from upstream PR vllm-project#35904 (structured output + reasoning):
- Content constraint embedding in <|channel|>final tag
- _constraint_to_content_format() and _extract_response_format_schema()
- struct_out is None branch (reasoning tags always applied)
- inject_response_formats() for Harmony cookbook compliance
- json_object format handling (was silently ignored)
- Streaming .model_dump() alias bug fix

Signed-off-by: Will Deines <will@garr.io>
…dding

When a user requests JSON schema enforcement (text.format.type=json_schema)
with a reasoning model (GPT-OSS), the grammar constraint was never scoped
to the final output channel. This caused grammar bitmasks to be applied
from token 0, clobbering reasoning output.

Fix by embedding content constraints (json_schema, json_object, regex,
grammar, choice) inside the structural tag's <|channel|>final region
using xgrammar's native TriggeredTagsFormat support. This ensures grammar
enforcement only applies within the final output region, not during
reasoning.

Also:
- Handle text.format.type=json_object (was silently ignored)
- Fix streaming + json_schema alias bug (.model_dump() dropped schema alias)
- Apply reasoning channel tags even when no structured output is requested

Signed-off-by: Will Deines <will@garr.io>
…nstraints

When creating a new StructuredOutputsParams with the structural_tag,
use dataclasses.replace() to clear content constraint fields while
preserving user-specified options like disable_any_whitespace,
disable_fallback, disable_additional_properties, and whitespace_pattern.

Signed-off-by: Will Deines <will@garr.io>
… message

Per the Harmony cookbook, structured output requires both grammar
enforcement (structural tags) and prompt guidance (a # Response Formats
section in the developer message). This injects the response format
schema into the developer message when json_schema is requested,
creating a developer message even without custom tools if needed.

Signed-off-by: Will Deines <will@garr.io>
Handled by .git/info/exclude on feature branches, force-added on
production/garrio-release.

Signed-off-by: Will Deines <will@garr.io>
@will-deines will-deines force-pushed the worktree-responses-structured-output branch from 76faff3 to c85af4a Compare March 18, 2026 13:48
@mergify mergify bot removed the needs-rebase label Mar 18, 2026
will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 18, 2026
…d tags

Extend prepare_structured_tag() to be the single authority for all
generation constraints in GPT-OSS Harmony models: channel structure,
tool enforcement, argument validation, and content constraints.

tool_choice=required support:
- New from_function_tool_to_tag() and tag_with_function_tools() helpers
- prepare_structured_tag() extended with tool_choice, function_tools params
- Channel blocking: omit <|channel|>final trigger to force tool calls
- Remove NotImplementedError for non-auto tool_choice in Harmony path

Absorbed from upstream PR vllm-project#35904 (structured output + reasoning):
- Content constraint embedding in <|channel|>final tag
- _constraint_to_content_format() and _extract_response_format_schema()
- struct_out is None branch (reasoning tags always applied)
- inject_response_formats() for Harmony cookbook compliance
- json_object format handling (was silently ignored)
- Streaming .model_dump() alias bug fix

Signed-off-by: Will Deines <will@garr.io>
StructuredOutputsParams.json can be str | dict | None but the return
type is dict | None.  Parse the string case with json.loads so mypy
is satisfied and string schemas work correctly at runtime.

Signed-off-by: Will Deines <will@garr.io>
@will-deines will-deines force-pushed the worktree-responses-structured-output branch from c85af4a to a2893f6 Compare March 18, 2026 14:33
@will-deines will-deines marked this pull request as ready for review March 18, 2026 17:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fa603e0a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +496 to +500
if self.use_harmony:
if request.stream:
context = StreamingHarmonyContext(messages, available_tools)
else:
context = HarmonyContext(messages, available_tools)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move structural-tag setup out of the non-Harmony branch

For GPT-OSS Responses requests, self.use_harmony is always true (vllm/entrypoints/openai/responses/serving.py:278), so execution takes this branch and never reaches the new reasoning_parser.prepare_structured_tag(...) logic under else. In the main target scenario of this change, text.format=json_schema/json_object therefore stays as a normal guided-decoding constraint on sampling_params instead of being embedded into <|channel|>final, so reasoning tokens are still constrained from token 0 exactly as before.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a22f008. Dedented the structural tag block out of the else branch so it now runs unconditionally after context selection, for both Harmony and non-Harmony paths.

Comment on lines +1245 to +1249
dev_instructions = request.instructions
if response_format_schema is not None:
dev_instructions = inject_response_formats(
dev_instructions, response_format_schema
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve injected schema when system instructions are enabled

The new schema guidance is only appended to dev_instructions here, but get_developer_message() drops instructions whenever VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS is set (vllm/entrypoints/openai/parser/harmony_utils.py:127). _construct_harmony_system_input_message() still sends the unmodified request.instructions to get_system_message() (serving.py:1211-1218), so in that deployment mode the # Response Formats section never reaches either prompt. GPT-OSS users running with system-instruction mode therefore lose the prompt-side schema guidance this patch is supposed to add.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a22f008. Added a separate response_format_section parameter to get_developer_message() so the schema is passed independently of instructions. When VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS is enabled, user instructions are routed to the system message as before, but the # Response Formats section still reaches the developer message. Added TestGetDeveloperMessageResponseFormats with 4 tests covering both modes.

…em-instructions mode

Fix two bugs identified in PR review:

1. The structural tag setup block was nested inside the `else` branch of
   `if self.use_harmony:`, making it unreachable for GPT-OSS (the primary
   target). Dedent the block so it runs unconditionally after context selection.

2. The `# Response Formats` schema section was lost when
   VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS was enabled, because
   get_developer_message() dropped all instructions in that mode. Add a
   separate response_format_section parameter so the schema is always
   included in the developer message regardless of the system-instructions
   flag.

Signed-off-by: Will Deines <will@garr.io>
@will-deines will-deines force-pushed the worktree-responses-structured-output branch from a22f008 to dd23bae Compare March 18, 2026 18:31
…structured-output

Signed-off-by: Will Deines <will@garr.io>
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 25, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status
Status: To Triage

Development

Successfully merging this pull request may close these issues.

2 participants