Skip to content

[BugFix] Fix spec decoding edge case bugs#31944

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
njhill:fix-spec-edgecases
Jan 8, 2026
Merged

[BugFix] Fix spec decoding edge case bugs#31944
DarkLight1337 merged 1 commit intovllm-project:mainfrom
njhill:fix-spec-edgecases

Conversation

@njhill
Copy link
Member

@njhill njhill commented Jan 8, 2026

There are a couple of edge case issues:

  • Since [Perf] Async Scheduling + Speculative Decoding + Structured Outputs #29821, it's again possible to hit the issue described in [BugFix] Fix spec decode + structured outputs + preemption edge case #30916. Our test does cover it but it's not guaranteed to trigger so was missed and is manifesting as a flake now. The fix here is to clear the request's spec_token_ids in the scheduler when it is preempted.
  • There's also an issue related to a case where the request can be excluded from the batch temporarily (but not preempted). The "paused" request may still have valid drafted spec tokens but these aren't properly handled upon resumption in the model runner _update_states method since they are treated as new requests and the spec token handling isn't on that path.

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Comment on lines +1518 to +1522
if (
self.input_batch.num_computed_tokens_cpu[req_idx]
>= self.input_batch.num_prompt_tokens[req_idx]
):
num_decode_draft_tokens[req_idx] = len(draft_token_ids)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a not-directly-related simplification

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two important bug fixes for edge cases in speculative decoding. The first fix correctly clears speculative token IDs when a request is preempted, preventing state corruption. The second fix addresses an issue with 'paused' requests by refactoring the speculative token update logic into a new method and ensuring it's applied to resumed requests. The changes are well-implemented, and the refactoring improves code clarity and maintainability. Overall, this is a solid contribution that enhances the robustness of the speculative decoding feature.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 8, 2026
@DarkLight1337 DarkLight1337 merged commit 287b37c into vllm-project:main Jan 8, 2026
51 checks passed
@njhill njhill deleted the fix-spec-edgecases branch January 8, 2026 17:22
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Feb 11, 2026
…fect (#6615)

### What this PR does / why we need it?
This pr corrects the patch from
[#5786](#5786),
otherwise it might not take effect when tp_size > 1.

Related changes in this patch has been merged in vLLM
[#31944](vllm-project/vllm#31944).

### Does this PR introduce _any_ user-facing change?
No.

Signed-off-by: Angazenn <supperccell@163.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants