[BugFix] Fix spec decode + structured outputs + preemption edge case by njhill · Pull Request #30916 · vllm-project/vllm

njhill · 2025-12-18T01:58:37Z

Fix an edge case that can be triggered when using spec decode with structured outputs.

There is a sequence of preemption and drafting being skipped that can result in the scheduler's request.spec_token_ids being stale which can then fail bitmask generation because they will be out of sync with the grammar.

When triggered, vLLM crashes with an error like this:

(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/engine/core.py", line 920, in _process_engine_step
(EngineCore_DP0 pid=1549249)     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=1549249)                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/engine/core.py", line 344, in step
(EngineCore_DP0 pid=1549249)     grammar_output = self.scheduler.get_grammar_bitmask(scheduler_output)
(EngineCore_DP0 pid=1549249)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/core/sched/scheduler.py", line 1022, in get_grammar_bitmask
(EngineCore_DP0 pid=1549249)     bitmask = self.structured_output_manager.grammar_bitmask(
(EngineCore_DP0 pid=1549249)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/structured_output/__init__.py", line 277, in grammar_bitmask
(EngineCore_DP0 pid=1549249)     assert accepted, (token, req_id, scheduled_spec_decode_tokens)
(EngineCore_DP0 pid=1549249)            ^^^^^^^^
(EngineCore_DP0 pid=1549249) AssertionError: (220, '80', {'80': [330, 220]})

This PR fixes that case by ensuring that request.spec_token_ids is cleared when drafting is skipped for a given step, rather than potentially leaving set to a prior step's draft tokens.

It will covered by tests once we extend test_async_scheduling.py test matrix in #29821.

Note that this bug/fix doesn't actually apply to the async scheduling case.

Signed-off-by: Nick Hill <nhill@redhat.com>

gemini-code-assist

Code Review

This pull request addresses an edge case in speculative decoding when used with structured outputs. The issue arises when drafting is skipped, which could leave stale spec_token_ids in the scheduler, leading to failures in bitmask generation for structured outputs. The fix correctly modifies take_draft_token_ids to return an empty DraftTokenIds object when no new draft tokens are proposed, ensuring the scheduler's state is properly cleared. The change is well-targeted and effectively resolves the bug. The code looks good.

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com>

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com>

[BugFix] Fix spec decode + structured outputs + preemption edge case

222ddc0

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill requested a review from benchislett December 18, 2025 01:58

njhill added the bug Something isn't working label Dec 18, 2025

mergify bot added the v1 label Dec 18, 2025

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

njhill mentioned this pull request Dec 18, 2025

[Perf] Async Scheduling + Speculative Decoding + Structured Outputs #29821

Merged

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 18, 2025

benchislett approved these changes Dec 18, 2025

View reviewed changes

njhill merged commit b0b77c4 into vllm-project:main Dec 18, 2025
52 of 53 checks passed

njhill deleted the fix-sd-so-edgecase branch December 18, 2025 21:00

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025

[BugFix] Fix spec decode + structured outputs + preemption edge case (v…

792769d

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com>

Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025

[BugFix] Fix spec decode + structured outputs + preemption edge case (v…

78ebd76

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

njhill mentioned this pull request Jan 8, 2026

[BugFix] Fix spec decoding edge case bugs #31944

Merged

rain2bow mentioned this pull request Jan 13, 2026

feature: spec decoding 相关问题讨论 baidu/vLLM-Kunlun#107

Open

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[BugFix] Fix spec decode + structured outputs + preemption edge case (v…

bf1ce62

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[BugFix] Fix spec decode + structured outputs + preemption edge case (v…

dc6ce8b

…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix spec decode + structured outputs + preemption edge case#30916

[BugFix] Fix spec decode + structured outputs + preemption edge case#30916
njhill merged 1 commit intovllm-project:mainfrom
njhill:fix-sd-so-edgecase

njhill commented Dec 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhill commented Dec 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented Dec 18, 2025 •

edited by github-actions bot

Loading