Skip to content

[Bugfix][Structured Output] Fix structural_tag bitmask not applied on reasoning models#37388

Closed
CatherineSue wants to merge 1 commit intovllm-project:mainfrom
CatherineSue:fix/structural-tag-reasoning-bitmask
Closed

[Bugfix][Structured Output] Fix structural_tag bitmask not applied on reasoning models#37388
CatherineSue wants to merge 1 commit intovllm-project:mainfrom
CatherineSue:fix/structural-tag-reasoning-bitmask

Conversation

@CatherineSue
Copy link
Copy Markdown
Contributor

Purpose

Fix structural_tag constraints being silently ignored on reasoning models (e.g., gpt-oss with openai_gptoss reasoning parser).

When a reasoning model uses a structural_tag constraint (e.g., triggered_tags format), should_fill_bitmask() and should_advance() in StructuredOutputManager return False during the reasoning phase. This happens because reasoning_ended is computed once from prompt tokens (which do not contain the reasoning end sequence) and never updated during generation. As a result, the grammar bitmask is never applied, the grammar state is never advanced, triggers never fire, and the constraint is completely ignored.

structural_tag grammars handle reasoning/content boundaries internally via triggers — the grammar itself knows when to allow free text (reasoning) and when to constrain output. The external reasoning_ended gate prevents this from working.

Fix: Add an early return in both should_fill_bitmask() and should_advance() when the request uses a structural_tag constraint, bypassing the reasoning_ended gate. Other constraint types (json_schema, regex, etc.) are unaffected and keep the existing behavior.

Test Plan

Tested with gpt-oss-120b (tp=4) via vllm serve HTTP Chat Completions with a structural_tag response_format:

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/raid/models/openai/gpt-oss-120b",
    "messages": [{"role": "user", "content": "List exactly 2 fruits."}],
    "response_format": {
      "type": "structural_tag",
      "format": {
        "type": "triggered_tags",
        "triggers": ["<|channel|>analysis", "<|channel|>final"],
        "tags": [
          {"begin": "<|channel|>analysis<|message|>", "content": {"type": "any_text"}, "end": "<|end|>"},
          {"begin": "<|channel|>final<|constrain|>json<|message|>", "content": {"type": "json_schema", "json_schema": {"type": "object", "properties": {"items": {"type": "array", "items": {"type": "string"}, "minItems": 2, "maxItems": 2}}, "required": ["items"], "additionalProperties": false}}, "end": ""}
        ],
        "at_least_one": true,
        "stop_after_first": false
      }
    },
    "temperature": 0
  }'

Test Result

Before — structural_tag trigger never fires, Harmony markers leak into content:

{
    "message": {
        "content": "<|channel|>final<|constrain|>json<|message|>{\"items\":[\"Apple\",\"Banana\"]}",
        "reasoning": "The user asks: \"List exactly 2 fruits.\" So we need to output exactly two fruit names..."
    },
    "finish_reason": "stop",
    "usage": {"completion_tokens": 98}
}

After — grammar bitmask applied, trigger fires, clean constrained JSON:

{
    "message": {
        "content": "{\"items\":[\"Apple\",\"Banana\"]}",
        "reasoning": "The user asks: \"List exactly 2 fruits.\" So we need to output exactly two fruit names..."
    },
    "finish_reason": "stop",
    "usage": {"completion_tokens": 78}
}

Note the 20 fewer completion tokens — the text-encoded <|channel|>final<|constrain|>json<|message|> markers are no longer generated.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR: Fix structural_tag constraint silently ignored on reasoning models
  • The test plan: Tested with gpt-oss-120b via HTTP Chat Completions with structural_tag response_format
  • The test results: Before/after comparison with token count difference
  • (Optional) Documentation update: N/A
  • (Optional) Release notes: N/A

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request effectively addresses the bug where structural_tag constraints were being ignored in reasoning models. The changes correctly bypass the reasoning_ended gate for structural_tag grammars in both should_fill_bitmask and should_advance methods, ensuring that the grammar bitmask is always applied and the grammar state is advanced as intended. The inline comments clearly explain the rationale behind these changes. The provided test plan and results confirm the fix and demonstrate the intended behavior, including the reduction in completion tokens due to correct constraint application.

@CatherineSue CatherineSue force-pushed the fix/structural-tag-reasoning-bitmask branch from 47880d5 to f370be6 Compare March 18, 2026 05:41
…okens on reasoning models

When a reasoning model (e.g., gpt-oss with openai_gptoss parser) uses
a structural_tag constraint, should_advance() returns False during the
reasoning phase because reasoning_ended is never set to True from prompt
tokens alone. This prevents the grammar from tracking generated tokens,
so it never sees the trigger sequence and the constraint silently fails.

structural_tag grammars (triggered_tags format) need to track all tokens
from the start to maintain trigger state. For example, gpt-oss triggers
on <|channel|>final which comes after <|channel|>analysis...reasoning...
<|end|><|start|>assistant — the grammar must have seen all preceding
tokens to know it should fire.

Add an early return in should_advance() when the request uses a
structural_tag constraint, so the grammar tracks tokens during reasoning
without constraining them. should_fill_bitmask() is left unchanged —
the grammar only constrains output after reasoning_ended is set, keeping
reasoning tokens free.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
@CatherineSue CatherineSue force-pushed the fix/structural-tag-reasoning-bitmask branch from f370be6 to d461e4c Compare March 18, 2026 05:42
@CatherineSue
Copy link
Copy Markdown
Contributor Author

CatherineSue commented Mar 18, 2026

I realized that I can use enable_in_reasoning: True. So closing this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working structured-output v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant