Skip to content

[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled#26586

Merged
njhill merged 3 commits intovllm-project:mainfrom
southfreebird:fix/logit-processors-with-spec-dec-and-structural-output
Oct 19, 2025
Merged

[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled#26586
njhill merged 3 commits intovllm-project:mainfrom
southfreebird:fix/logit-processors-with-spec-dec-and-structural-output

Conversation

@southfreebird
Copy link
Contributor

@southfreebird southfreebird commented Oct 10, 2025

Fix an error that appears after #19482 when logit processors (such as penalties) are enabled together with speculative decoding and structural output. The example of the error:

File "/vllm/model_executor/layers/utils.py", line 45, in get_token_bin_counts_and_mask
     bin_counts.scatter_add_(1, tokens, torch.ones_like(tokens))
RuntimeError: Expected index [24, 4162] to be no larger than self [21, 201089] apart from dimension 1 and to be no larger size than src [24, 4162]

Purpose

Test Plan

Test Result

…put are enabled

Signed-off-by: southfreebird <yvorott@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug that causes a RuntimeError when speculative decoding and structured output are used together with logit processors. The root cause is that stale speculative token data could persist in InputBatch if the scheduler drops all draft tokens for a request, leading to out-of-bounds errors in subsequent penalty calculations. The fix correctly ensures that InputBatch.spec_token_ids is always updated, even with an empty list of tokens, thus preventing state corruption. The change is logical, well-commented, and effectively resolves the issue. The implementation looks correct.

# meet the structural schema. This means that
# scheduler_output.scheduled_spec_decode_tokens might be empty,
# even when speculative decoding is enabled. So, we moved this line
# from the 'if' block above.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase the comment so that it explains the state of the code and not the change to the code. Comments about moved lines can become less meaningful over time with refactoring

Signed-off-by: southfreebird <yvorott@gmail.com>
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 18, 2025
@njhill njhill enabled auto-merge (squash) October 18, 2025 18:16
@njhill
Copy link
Member

njhill commented Oct 18, 2025

@southfreebird could you rebase on latest main?

@njhill njhill merged commit f6fdacd into vllm-project:main Oct 19, 2025
46 checks passed
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <yvorott@gmail.com>
adabeyta pushed a commit to adabeyta/vllm that referenced this pull request Oct 20, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <yvorott@gmail.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <yvorott@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <yvorott@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <yvorott@gmail.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <yvorott@gmail.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <yvorott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants