[BugFix] Fix returned logprobs with spec decode + prefill chunking#29216
Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom Nov 22, 2025
Merged
[BugFix] Fix returned logprobs with spec decode + prefill chunking#29216DarkLight1337 merged 1 commit intovllm-project:mainfrom
DarkLight1337 merged 1 commit intovllm-project:mainfrom
Conversation
Also: - Fix error with bf16 models and "raw_logits" mode - Use smaller model for test Signed-off-by: Nick Hill <nhill@redhat.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where logprobs were incorrectly calculated during speculative decoding with prefill chunking. The main fix in vllm/v1/worker/gpu_model_runner.py correctly computes token offsets for logprobs before any tokens are discarded, which resolves the issue. Additionally, the PR includes a fix for handling bfloat16 models in raw_logits mode and updates the relevant test to use a smaller model and reliably trigger the chunking behavior. The changes are well-implemented and the logic is sound. I approve of this pull request.
Contributor
|
Yes, this works, thanks for fixing @njhill ! |
ywang96
pushed a commit
to ywang96/vllm
that referenced
this pull request
Nov 23, 2025
…llm-project#29216) Signed-off-by: Nick Hill <nhill@redhat.com>
lpapavassiliou
pushed a commit
to lpapavassiliou/vllm
that referenced
this pull request
Nov 24, 2025
…llm-project#29216) Signed-off-by: Nick Hill <nhill@redhat.com>
RunkaiTao
pushed a commit
to RunkaiTao/vllm
that referenced
this pull request
Nov 24, 2025
…llm-project#29216) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
devpatelio
pushed a commit
to SumanthRH/vllm
that referenced
this pull request
Nov 29, 2025
…llm-project#29216) Signed-off-by: Nick Hill <nhill@redhat.com>
kitaekatt
pushed a commit
to kitaekatt/vllm
that referenced
this pull request
Dec 1, 2025
…llm-project#29216) Signed-off-by: Nick Hill <nhill@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The produced logprobs lists in this case were incorrect due incorrectly taking into account sampled token "discards" when computing offsets into the logprobs tensors.
Also:
Original PR which had this mistake: #26060
cc @TheEpicDolphin