Skip to content

[bugfix] Fix prompt logprobs on request eviction during chunked prefill#41411

Merged
ywang96 merged 9 commits into
vllm-project:mainfrom
joa-stdn:joachim/fix_prompt_logprobs
May 4, 2026
Merged

[bugfix] Fix prompt logprobs on request eviction during chunked prefill#41411
ywang96 merged 9 commits into
vllm-project:mainfrom
joa-stdn:joachim/fix_prompt_logprobs

Conversation

@joa-stdn
Copy link
Copy Markdown
Contributor

@joa-stdn joa-stdn commented Apr 30, 2026

Summary

  • Fix prompt logprobs computation for chunked prefill: the computed_prefill < prompt_lens - 1 check incorrectly skipped the last prompt token, causing prompt logprobs to not be computed when they should be. Changed to computed_prefill < prompt_lens.
  • Move in_progress_prompt_logprobs_cpu state from InputBatch dict to CachedRequestState, ensuring prompt logprobs accumulation is tied to the request lifecycle rather than the batch.
  • Include prompt_logprobs in VllmRunner test helper output and add None handling to _logprobs_match for prompt logprob entries.
  • Add prompt_logprobs=2 test cases to async scheduling e2e tests.

Test Plan

  • Added dict(prompt_logprobs=2) and dict(prompt_logprobs=2, logprobs=2) to both test_without_spec_decoding and test_with_eagle3_spec_decoding
  • pytest tests/v1/e2e/general/test_async_scheduling.py -v

@joa-stdn joa-stdn requested a review from njhill as a code owner April 30, 2026 20:39
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added v1 bug Something isn't working labels Apr 30, 2026
@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch from 8d24122 to 6bd85f8 Compare April 30, 2026 20:40
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the management of prompt logprobs by moving the in_progress_prompt_logprobs_cpu state from a dictionary within InputBatch to the CachedRequestState object. This change streamlines how logprob tensors are tracked across prefill steps and ensures proper cleanup during request removal or updates. I have no feedback to provide.

@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch from 2fba291 to f445e2d Compare April 30, 2026 22:09
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joa-stdn, very clean fix!

Hopefully this same bug isn't in model runner v2, maybe you could check that too? (gpu/model_runner.py)

It would also be great to add or extend a test to cover this. You could look for example at https://github.com/vllm-project/vllm/blob/main/tests/v1/e2e/general/test_async_scheduling.py which forces preemptions.

Comment thread vllm/v1/worker/gpu_input_batch.py Outdated
@njhill njhill added the verified Run pre-commit for new contributors without triggering other tests label Apr 30, 2026
@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch from b85cf89 to 271f156 Compare May 1, 2026 01:28
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 1, 2026
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @joa-stdn!

@njhill njhill enabled auto-merge (squash) May 1, 2026 14:51
@njhill
Copy link
Copy Markdown
Member

njhill commented May 1, 2026

The test changes are failing I think because we need to modify the test to also take prompt logprobs into account when comparing outputs.

@joa-stdn
Copy link
Copy Markdown
Contributor Author

joa-stdn commented May 1, 2026

Thanks a lot for your review!

Hopefully this same bug isn't in model runner v2

I just checked my repro in model runner v2 and everything is fine there!

The test changes are failing I think because we need to modify the test to also take prompt logprobs into account when comparing outputs.

Yeah thanks I'll look into it!

auto-merge was automatically disabled May 1, 2026 21:05

Head branch was pushed to by a user without write access

@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch 2 times, most recently from d4c74bb to cae7f8f Compare May 2, 2026 01:50
@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch from cae7f8f to d4c74bb Compare May 2, 2026 01:55
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 2, 2026

Hi @joa-stdn, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch from a27e561 to f4b4cfa Compare May 2, 2026 01:59
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 2, 2026

Documentation preview: https://vllm--41411.org.readthedocs.build/en/41411/

@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch 2 times, most recently from 071cace to cc9d9e1 Compare May 4, 2026 00:07
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 4, 2026

Documentation preview: https://vllm--41411.org.readthedocs.build/en/41411/

@mergify mergify Bot added qwen Related to Qwen models intel-gpu Related to Intel GPU cpu Related to CPU backends labels May 4, 2026
joa-stdn and others added 7 commits May 4, 2026 00:09
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
@joa-stdn joa-stdn force-pushed the joachim/fix_prompt_logprobs branch from 815ef4b to 5bb1613 Compare May 4, 2026 00:09
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
@ywang96 ywang96 enabled auto-merge (squash) May 4, 2026 00:24
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @joa-stdn, especially for also exposing and fixing the MRV2 bug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/build cpu Related to CPU backends deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend intel-gpu Related to Intel GPU kv-connector multi-modality Related to multi-modality (#4194) new-model Requests to new models nvidia qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm structured-output v1 verified Run pre-commit for new contributors without triggering other tests

Projects

Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants