[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index#27629
[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index#27629njhill merged 3 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a bug in the bookkeeping logic for speculative decoding. By moving the update of cu_num_accepted_tokens to before the check for empty sampled_ids, it ensures that the cumulative token counts are accurate for all requests, including those that don't produce new tokens in a given step. This prevents potential indexing errors and position shifts. The implementation is clean and effectively resolves the described issue. I have no further suggestions.
|
Include authors and reviewers of #26060 to confirm if it's the right update. |
|
Thanks for catching this @Jialin, looks good to me! |
|
Gentle nudge @njhill @22quinn @yeqcharlotte @houseroad |
Head branch was pushed to by a user without write access
@njhill Skimmed throughput the tests added in #26060, but seems most of them are e2e testing, so it might be a bit hard to add unit test to cover this. However, after this change, as we're only updating numpy arrays in bookkeeping now. A potential followup step is to introduce a numba jit function to further speed up bookkeeping. (And I could definitely add unit tests to cover this case after with the more self-contained jit function. WDYT? |
@Jialin fair enough, I wasn't sure whether we could tweak the e2e test to trigger the bug. If that's nontrivial then sounds ok re leaving to future unit tests. |
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#27629) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#27629) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#27629) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Purpose
In this PR, we ensure cu_num_accepted_tokens is updated for all request_index. To avoid position skipped / shifted due to empty or None sampled_ids.
Test Plan & Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.