[BugFix] Fix spec decoding edge case bugs by njhill · Pull Request #31944 · vllm-project/vllm

njhill · 2026-01-08T04:53:44Z

There are a couple of edge case issues:

Since [Perf] Async Scheduling + Speculative Decoding + Structured Outputs #29821, it's again possible to hit the issue described in [BugFix] Fix spec decode + structured outputs + preemption edge case #30916. Our test does cover it but it's not guaranteed to trigger so was missed and is manifesting as a flake now. The fix here is to clear the request's spec_token_ids in the scheduler when it is preempted.
There's also an issue related to a case where the request can be excluded from the batch temporarily (but not preempted). The "paused" request may still have valid drafted spec tokens but these aren't properly handled upon resumption in the model runner _update_states method since they are treated as new requests and the spec token handling isn't on that path.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill · 2026-01-08T04:54:23Z

vllm/v1/worker/gpu_model_runner.py

+                if (
+                    self.input_batch.num_computed_tokens_cpu[req_idx]
+                    >= self.input_batch.num_prompt_tokens[req_idx]
+                ):
+                    num_decode_draft_tokens[req_idx] = len(draft_token_ids)


This is a not-directly-related simplification

gemini-code-assist

Code Review

This pull request introduces two important bug fixes for edge cases in speculative decoding. The first fix correctly clears speculative token IDs when a request is preempted, preventing state corruption. The second fix addresses an issue with 'paused' requests by refactoring the speculative token update logic into a new method and ensuring it's applied to resumed requests. The changes are well-implemented, and the refactoring improves code clarity and maintainability. Overall, this is a solid contribution that enhances the robustness of the speculative decoding feature.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…fect (#6615) ### What this PR does / why we need it? This pr corrects the patch from [#5786](#5786), otherwise it might not take effect when tp_size > 1. Related changes in this patch has been merged in vLLM [#31944](vllm-project/vllm#31944). ### Does this PR introduce _any_ user-facing change? No. Signed-off-by: Angazenn <supperccell@163.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>

[BugFix] Fix spec decoding edge case bugs

1be52a7

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, robertgshaw2-redhat and ywang96 as code owners January 8, 2026 04:53

mergify bot added the v1 label Jan 8, 2026

njhill commented Jan 8, 2026

View reviewed changes

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 8, 2026

DarkLight1337 approved these changes Jan 8, 2026

View reviewed changes

DarkLight1337 merged commit 287b37c into vllm-project:main Jan 8, 2026
51 checks passed

This was referenced Jan 8, 2026

[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption #31955

Closed

[CI Failure]: backend_xgrammar.py: Failed to advance FSM for request #31876

Closed

njhill deleted the fix-spec-edgecases branch January 8, 2026 17:22

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[BugFix] Fix spec decoding edge case bugs (vllm-project#31944)

97428f9

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill mentioned this pull request Jan 12, 2026

[BugFix][Hybrid] Fix prefill chunk incorrectly including draft tokens #30618

Closed

5 tasks

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[BugFix] Fix spec decoding edge case bugs (vllm-project#31944)

e0b36de

Signed-off-by: Nick Hill <nickhill123@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[BugFix] Fix spec decoding edge case bugs (vllm-project#31944)

73f4ee5

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Angazenn mentioned this pull request Feb 9, 2026

[BugFix][v0.13.0] fix a bug that patch from PR #5786 does not take effect vllm-project/vllm-ascend#6615

Merged

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[BugFix] Fix spec decoding edge case bugs (vllm-project#31944)

329e7c6

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix spec decoding edge case bugs#31944

[BugFix] Fix spec decoding edge case bugs#31944
DarkLight1337 merged 1 commit intovllm-project:mainfrom
njhill:fix-spec-edgecases

njhill commented Jan 8, 2026 •

edited by github-actions bot

Loading

Uh oh!

njhill Jan 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhill commented Jan 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented Jan 8, 2026 •

edited by github-actions bot

Loading