[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled#26586
Conversation
…put are enabled Signed-off-by: southfreebird <yvorott@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a critical bug that causes a RuntimeError when speculative decoding and structured output are used together with logit processors. The root cause is that stale speculative token data could persist in InputBatch if the scheduler drops all draft tokens for a request, leading to out-of-bounds errors in subsequent penalty calculations. The fix correctly ensures that InputBatch.spec_token_ids is always updated, even with an empty list of tokens, thus preventing state corruption. The change is logical, well-commented, and effectively resolves the issue. The implementation looks correct.
vllm/v1/worker/gpu_model_runner.py
Outdated
| # meet the structural schema. This means that | ||
| # scheduler_output.scheduled_spec_decode_tokens might be empty, | ||
| # even when speculative decoding is enabled. So, we moved this line | ||
| # from the 'if' block above. |
There was a problem hiding this comment.
Please rephrase the comment so that it explains the state of the code and not the change to the code. Comments about moved lines can become less meaningful over time with refactoring
Signed-off-by: southfreebird <yvorott@gmail.com>
|
@southfreebird could you rebase on latest main? |
…dec-and-structural-output
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com>
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com>
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com>
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com>
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com>
Fix an error that appears after #19482 when logit processors (such as penalties) are enabled together with speculative decoding and structural output. The example of the error:
Purpose
Test Plan
Test Result