[BugFix] Fix spec decode + structured outputs + preemption edge case#30916
Merged
njhill merged 1 commit intovllm-project:mainfrom Dec 18, 2025
Merged
[BugFix] Fix spec decode + structured outputs + preemption edge case#30916njhill merged 1 commit intovllm-project:mainfrom
njhill merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Nick Hill <nhill@redhat.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request addresses an edge case in speculative decoding when used with structured outputs. The issue arises when drafting is skipped, which could leave stale spec_token_ids in the scheduler, leading to failures in bitmask generation for structured outputs. The fix correctly modifies take_draft_token_ids to return an empty DraftTokenIds object when no new draft tokens are proposed, ensuring the scheduler's state is properly cleared. The change is well-targeted and effectively resolves the bug. The code looks good.
benchislett
approved these changes
Dec 18, 2025
yugong333
pushed a commit
to yugong333/vllm
that referenced
this pull request
Dec 22, 2025
…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com>
Majid-Taheri
pushed a commit
to Majid-Taheri/vllm
that referenced
this pull request
Dec 23, 2025
…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin
pushed a commit
to dsuhinin/vllm
that referenced
this pull request
Jan 21, 2026
…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX
pushed a commit
to ItzDEXX/vllm
that referenced
this pull request
Feb 19, 2026
…llm-project#30916) Signed-off-by: Nick Hill <nhill@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix an edge case that can be triggered when using spec decode with structured outputs.
There is a sequence of preemption and drafting being skipped that can result in the scheduler's
request.spec_token_idsbeing stale which can then fail bitmask generation because they will be out of sync with the grammar.When triggered, vLLM crashes with an error like this:
This PR fixes that case by ensuring that
request.spec_token_idsis cleared when drafting is skipped for a given step, rather than potentially leaving set to a prior step's draft tokens.It will covered by tests once we extend test_async_scheduling.py test matrix in #29821.
Note that this bug/fix doesn't actually apply to the async scheduling case.