Skip to content

[BugFix] Fix spec decode + structured outputs + preemption edge case#30916

Merged
njhill merged 1 commit intovllm-project:mainfrom
njhill:fix-sd-so-edgecase
Dec 18, 2025
Merged

[BugFix] Fix spec decode + structured outputs + preemption edge case#30916
njhill merged 1 commit intovllm-project:mainfrom
njhill:fix-sd-so-edgecase

Conversation

@njhill
Copy link
Member

@njhill njhill commented Dec 18, 2025

Fix an edge case that can be triggered when using spec decode with structured outputs.

There is a sequence of preemption and drafting being skipped that can result in the scheduler's request.spec_token_ids being stale which can then fail bitmask generation because they will be out of sync with the grammar.

When triggered, vLLM crashes with an error like this:

(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/engine/core.py", line 920, in _process_engine_step
(EngineCore_DP0 pid=1549249)     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=1549249)                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/engine/core.py", line 344, in step
(EngineCore_DP0 pid=1549249)     grammar_output = self.scheduler.get_grammar_bitmask(scheduler_output)
(EngineCore_DP0 pid=1549249)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/core/sched/scheduler.py", line 1022, in get_grammar_bitmask
(EngineCore_DP0 pid=1549249)     bitmask = self.structured_output_manager.grammar_bitmask(
(EngineCore_DP0 pid=1549249)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1549249)   File "/home/nickhill/workspace/vllm2/vllm/vllm/v1/structured_output/__init__.py", line 277, in grammar_bitmask
(EngineCore_DP0 pid=1549249)     assert accepted, (token, req_id, scheduled_spec_decode_tokens)
(EngineCore_DP0 pid=1549249)            ^^^^^^^^
(EngineCore_DP0 pid=1549249) AssertionError: (220, '80', {'80': [330, 220]})

This PR fixes that case by ensuring that request.spec_token_ids is cleared when drafting is skipped for a given step, rather than potentially leaving set to a prior step's draft tokens.

It will covered by tests once we extend test_async_scheduling.py test matrix in #29821.

Note that this bug/fix doesn't actually apply to the async scheduling case.

@njhill njhill requested a review from benchislett December 18, 2025 01:58
@njhill njhill added the bug Something isn't working label Dec 18, 2025
@mergify mergify bot added the v1 label Dec 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an edge case in speculative decoding when used with structured outputs. The issue arises when drafting is skipped, which could leave stale spec_token_ids in the scheduler, leading to failures in bitmask generation for structured outputs. The fix correctly modifies take_draft_token_ids to return an empty DraftTokenIds object when no new draft tokens are proposed, ensuring the scheduler's state is properly cleared. The change is well-targeted and effectively resolves the bug. The code looks good.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 18, 2025
@njhill njhill merged commit b0b77c4 into vllm-project:main Dec 18, 2025
52 of 53 checks passed
@njhill njhill deleted the fix-sd-so-edgecase branch December 18, 2025 21:00
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
…llm-project#30916)

Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…llm-project#30916)

Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants