[Bugfix][Structured Output] Fix structural_tag bitmask not applied on reasoning models#37388
Closed
CatherineSue wants to merge 1 commit intovllm-project:mainfrom
Closed
Conversation
Contributor
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the bug where structural_tag constraints were being ignored in reasoning models. The changes correctly bypass the reasoning_ended gate for structural_tag grammars in both should_fill_bitmask and should_advance methods, ensuring that the grammar bitmask is always applied and the grammar state is advanced as intended. The inline comments clearly explain the rationale behind these changes. The provided test plan and results confirm the fix and demonstrate the intended behavior, including the reduction in completion tokens due to correct constraint application.
47880d5 to
f370be6
Compare
…okens on reasoning models When a reasoning model (e.g., gpt-oss with openai_gptoss parser) uses a structural_tag constraint, should_advance() returns False during the reasoning phase because reasoning_ended is never set to True from prompt tokens alone. This prevents the grammar from tracking generated tokens, so it never sees the trigger sequence and the constraint silently fails. structural_tag grammars (triggered_tags format) need to track all tokens from the start to maintain trigger state. For example, gpt-oss triggers on <|channel|>final which comes after <|channel|>analysis...reasoning... <|end|><|start|>assistant — the grammar must have seen all preceding tokens to know it should fire. Add an early return in should_advance() when the request uses a structural_tag constraint, so the grammar tracks tokens during reasoning without constraining them. should_fill_bitmask() is left unchanged — the grammar only constrains output after reasoning_ended is set, keeping reasoning tokens free. Signed-off-by: Chang Su <chang.s.su@oracle.com>
f370be6 to
d461e4c
Compare
Contributor
Author
|
I realized that I can use |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Fix
structural_tagconstraints being silently ignored on reasoning models (e.g., gpt-oss withopenai_gptossreasoning parser).When a reasoning model uses a
structural_tagconstraint (e.g.,triggered_tagsformat),should_fill_bitmask()andshould_advance()inStructuredOutputManagerreturnFalseduring the reasoning phase. This happens becausereasoning_endedis computed once from prompt tokens (which do not contain the reasoning end sequence) and never updated during generation. As a result, the grammar bitmask is never applied, the grammar state is never advanced, triggers never fire, and the constraint is completely ignored.structural_taggrammars handle reasoning/content boundaries internally via triggers — the grammar itself knows when to allow free text (reasoning) and when to constrain output. The externalreasoning_endedgate prevents this from working.Fix: Add an early return in both
should_fill_bitmask()andshould_advance()when the request uses astructural_tagconstraint, bypassing thereasoning_endedgate. Other constraint types (json_schema,regex, etc.) are unaffected and keep the existing behavior.Test Plan
Tested with gpt-oss-120b (tp=4) via
vllm serveHTTP Chat Completions with astructural_tagresponse_format:Test Result
Before — structural_tag trigger never fires, Harmony markers leak into content:
{ "message": { "content": "<|channel|>final<|constrain|>json<|message|>{\"items\":[\"Apple\",\"Banana\"]}", "reasoning": "The user asks: \"List exactly 2 fruits.\" So we need to output exactly two fruit names..." }, "finish_reason": "stop", "usage": {"completion_tokens": 98} }After — grammar bitmask applied, trigger fires, clean constrained JSON:
{ "message": { "content": "{\"items\":[\"Apple\",\"Banana\"]}", "reasoning": "The user asks: \"List exactly 2 fruits.\" So we need to output exactly two fruit names..." }, "finish_reason": "stop", "usage": {"completion_tokens": 78} }Note the 20 fewer completion tokens — the text-encoded
<|channel|>final<|constrain|>json<|message|>markers are no longer generated.Essential Elements of an Effective PR Description Checklist