[bugfix] Fix prompt logprobs on request eviction during chunked prefill#41411
Conversation
8d24122 to
6bd85f8
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors the management of prompt logprobs by moving the in_progress_prompt_logprobs_cpu state from a dictionary within InputBatch to the CachedRequestState object. This change streamlines how logprob tensors are tracked across prefill steps and ensures proper cleanup during request removal or updates. I have no feedback to provide.
2fba291 to
f445e2d
Compare
njhill
left a comment
There was a problem hiding this comment.
Thanks @joa-stdn, very clean fix!
Hopefully this same bug isn't in model runner v2, maybe you could check that too? (gpu/model_runner.py)
It would also be great to add or extend a test to cover this. You could look for example at https://github.com/vllm-project/vllm/blob/main/tests/v1/e2e/general/test_async_scheduling.py which forces preemptions.
b85cf89 to
271f156
Compare
|
The test changes are failing I think because we need to modify the test to also take prompt logprobs into account when comparing outputs. |
|
Thanks a lot for your review!
I just checked my repro in model runner v2 and everything is fine there!
Yeah thanks I'll look into it! |
Head branch was pushed to by a user without write access
d4c74bb to
cae7f8f
Compare
cae7f8f to
d4c74bb
Compare
|
Hi @joa-stdn, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
a27e561 to
f4b4cfa
Compare
|
Documentation preview: https://vllm--41411.org.readthedocs.build/en/41411/ |
071cace to
cc9d9e1
Compare
|
Documentation preview: https://vllm--41411.org.readthedocs.build/en/41411/ |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>
815ef4b to
5bb1613
Compare
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Summary
computed_prefill < prompt_lens - 1check incorrectly skipped the last prompt token, causing prompt logprobs to not be computed when they should be. Changed tocomputed_prefill < prompt_lens.in_progress_prompt_logprobs_cpustate fromInputBatchdict toCachedRequestState, ensuring prompt logprobs accumulation is tied to the request lifecycle rather than the batch.prompt_logprobsinVllmRunnertest helper output and addNonehandling to_logprobs_matchfor prompt logprob entries.prompt_logprobs=2test cases to async scheduling e2e tests.Test Plan
dict(prompt_logprobs=2)anddict(prompt_logprobs=2, logprobs=2)to bothtest_without_spec_decodingandtest_with_eagle3_spec_decodingpytest tests/v1/e2e/general/test_async_scheduling.py -v