Skip to content

[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index#27629

Merged
njhill merged 3 commits intovllm-project:mainfrom
Jialin:logprob
Oct 30, 2025
Merged

[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index#27629
njhill merged 3 commits intovllm-project:mainfrom
Jialin:logprob

Conversation

@Jialin
Copy link
Collaborator

@Jialin Jialin commented Oct 28, 2025

Purpose

In this PR, we ensure cu_num_accepted_tokens is updated for all request_index. To avoid position skipped / shifted due to empty or None sampled_ids.

Test Plan & Test Result

> pytest tests/v1/sample/test_logprobs.py -k test_spec_decode_logprobs
> pytest tests/v1/sample/test_rejection_sampler.py

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Oct 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug in the bookkeeping logic for speculative decoding. By moving the update of cu_num_accepted_tokens to before the check for empty sampled_ids, it ensures that the cumulative token counts are accurate for all requests, including those that don't produce new tokens in a given step. This prevents potential indexing errors and position shifts. The implementation is clean and effectively resolves the described issue. I have no further suggestions.

@Jialin Jialin requested review from 22quinn and njhill October 28, 2025 05:45
@Jialin
Copy link
Collaborator Author

Jialin commented Oct 28, 2025

Include authors and reviewers of #26060 to confirm if it's the right update.

CC @TheEpicDolphin @22quinn @njhill

@TheEpicDolphin
Copy link
Collaborator

Thanks for catching this @Jialin, looks good to me!

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 28, 2025

Gentle nudge @njhill @22quinn @yeqcharlotte @houseroad

Copy link
Collaborator

@22quinn 22quinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@22quinn 22quinn enabled auto-merge (squash) October 28, 2025 17:45
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching @Jialin!

Any chance we could extend test(s) added in #26060 to catch this case?

auto-merge was automatically disabled October 28, 2025 18:07

Head branch was pushed to by a user without write access

@Jialin
Copy link
Collaborator Author

Jialin commented Oct 28, 2025

Thanks for catching @Jialin!

Any chance we could extend test(s) added in #26060 to catch this case?

@njhill Skimmed throughput the tests added in #26060, but seems most of them are e2e testing, so it might be a bit hard to add unit test to cover this.

However, after this change, as we're only updating numpy arrays in bookkeeping now. A potential followup step is to introduce a numba jit function to further speed up bookkeeping. (And I could definitely add unit tests to cover this case after with the more self-contained jit function. WDYT?

@njhill
Copy link
Member

njhill commented Oct 28, 2025

Thanks for catching @Jialin!
Any chance we could extend test(s) added in #26060 to catch this case?

@njhill Skimmed throughput the tests added in #26060, but seems most of them are e2e testing, so it might be a bit hard to add unit test to cover this.

However, after this change, as we're only updating numpy arrays in bookkeeping now. A potential followup step is to introduce a numba jit function to further speed up bookkeeping. (And I could definitely add unit tests to cover this case after with the more self-contained jit function. WDYT?

@Jialin fair enough, I wasn't sure whether we could tweak the e2e test to trigger the bug. If that's nontrivial then sounds ok re leaving to future unit tests.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@njhill njhill merged commit 4574d48 into vllm-project:main Oct 30, 2025
46 checks passed
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants