[Bugfix] Grammar ignored when reasoning ends within speculated tokens by sfbemerk · Pull Request #34241 · vllm-project/vllm

sfbemerk · 2026-02-10T12:48:28Z

Purpose

This PR attempts to fix a bug (#31858) when Speculative Decoding (such as MTP), Reasoning, and Structured Output / Grammar are used in combination: typically, grammar is not enabled during reasoning but only for the final answer. However, when the reasoning end token is generated, any subsequent draft tokens are not validated against the grammar, leading to an invalid final answer.

Test Plan

In general, the bug seems to be independent of the specific SpecDecode method; originally I had observed it with DeepSeek models and MTP, but for testing I recommend a smaller model like Qwen3-8B and using the same model as draft model. This way, we have high acceptance rates for our tests and a high likelihood that the original bug appears.

vllm serve "Qwen/Qwen3-8B" \
  --max-model-len 40960 \
  --reasoning-parser qwen3 \
  --speculative-config '{"method":"draft_model","model":"Qwen/Qwen3-8B","num_speculative_tokens":5}'

The test request should have response_format=json_schema and a prompt that lurkes the model into generating not pure json, e.g.

example payload


{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {
      "role": "user",
      "content": "Imagine a Fantasy hero (10). Return valid json, wrapped in markdown fences: ```json\n[...]\n```"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "hero",
      "schema": {
        "$defs": {
          "CharacterRole": {"enum": ["mage", "warrior", "healer"], "title": "CharacterRole", "type": "string"}
        },
        "properties": {
          "name": {"description": "Character name", "title": "Name", "type": "string"},
          "age": {"description": "Character age", "title": "Age", "type": "integer"},
          "role": {"allOf": [{"$ref": "#/$defs/CharacterRole"}], "description": "Character class"}
        },
        "required": ["name", "age", "role"],
        "title": "Character",
        "type": "object"
      }
    }
  }
}

Test Result

without bugfix, the content field contains invalid json, e.g. because of markdown fences

"content": "```json\n{\n\n\"name\": \"Eldrin the Flameheart\",\n\"age\": 32,\n\"role\": \"warrior\"\n}```"

with the bugfix, the content field contains valid json that satisfies the requested grammar

"content": "{\n\n\"name\": \"Eldrin the Flameheart\",\n\"age\": 32,\n\"role\": \"warrior\"\n}"

I am happy to receive feedback and suggestions on how to improve the PR: the interplay of spec decode, grammar, reasoning, and async scheduling seems to be quite complex. I found the first commits with bugfix attempts in the vllm-chutes fork but had to make a few more additions.

gemini-code-assist

Code Review

The pull request effectively addresses a complex bug involving the interaction of speculative decoding, reasoning, and structured output. The new test case, test_reasoning_spec_decode_grammar_comprehensive, is well-structured and crucial for validating the fix across various scenarios. The logic introduced to manage speculative tokens and apply grammar constraints during the reasoning-to-structured-output transition appears sound and robust. However, there are several instances where full copies of request.all_token_ids are created, which could lead to significant performance overhead and increased memory usage, especially for long sequences. Optimizing these operations to avoid unnecessary list copying would be a critical improvement.

Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>

mergify Bot added structured-output v1 bug Something isn't working labels Feb 10, 2026

github-project-automation Bot added this to Structured Output Feb 10, 2026

sfbemerk changed the title ~~[Bugfix] Grammar ignored when reasoning ends in speculated tokens~~ [Bugfix] Grammar ignored when reasoning ends within speculated tokens Feb 10, 2026

gemini-code-assist Bot reviewed Feb 10, 2026

View reviewed changes

sfbemerk force-pushed the bugfix/specdecode-grammar-reasoning-pr branch 2 times, most recently from f81c6c0 to b61dd96 Compare February 16, 2026 20:39

MTP grammar fixes.

38e3859

Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>

sfbemerk force-pushed the bugfix/specdecode-grammar-reasoning-pr branch 2 times, most recently from 5473327 to e6fb6e5 Compare February 16, 2026 23:08

Benjamin Merkel added 2 commits February 17, 2026 08:23

Update bugfix

f03e652

Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>

Add test

5a5e6b5

Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>

sfbemerk force-pushed the bugfix/specdecode-grammar-reasoning-pr branch from e6fb6e5 to 5a5e6b5 Compare February 17, 2026 07:26

sfbemerk marked this pull request as ready for review February 17, 2026 07:42

sfbemerk requested review from ApostaC, WoosukKwon, aarnphm, alexm-redhat, benchislett, heheda12345, mgoin, njhill, orozery, robertgshaw2-redhat, russellb and ywang96 as code owners February 17, 2026 07:42

sfbemerk mentioned this pull request Feb 20, 2026

Bug: Speculative Decoding (MTP) Causes </think> Detection Failure in Structured Output + Reasoning Mode #34650

Open

nbethala mentioned this pull request Feb 20, 2026

fix: DeepSeek-R1 structured-output reasoning end detection (scheduler + parser) #34978

Closed

fergusfinn mentioned this pull request Mar 2, 2026

[Performance] Add is_reasoning_end_streaming() override to GptOssReasoningParser #35745

Merged

sfbemerk closed this Mar 5, 2026

github-project-automation Bot moved this to Done in Structured Output Mar 5, 2026

sfbemerk mentioned this pull request Mar 5, 2026

[Bugfix] Grammar was ignored when reasoning ended within speculated tokens #36138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Grammar ignored when reasoning ends within speculated tokens#34241

[Bugfix] Grammar ignored when reasoning ends within speculated tokens#34241
sfbemerk wants to merge 3 commits into
vllm-project:mainfrom
sfbemerk:bugfix/specdecode-grammar-reasoning-pr

sfbemerk commented Feb 10, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sfbemerk commented Feb 10, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfbemerk commented Feb 10, 2026 •

edited by github-actions Bot

Loading