-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[SPEC V2] fix: skip stale state updates in spec-v2 overlap #23456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+12
−2
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
46c995b
[SPEC V2] fix: skip stale state updates in spec-v2 overlap
alphabetc1 b9ebaa0
Merge branch 'main' into fix/spec_v2_fix
Qiaolin-Yu c2c2274
Merge branch 'main' into fix/spec_v2_fix
Qiaolin-Yu 0db112b
Merge branch 'main' into fix/spec_v2_fix
alphabetc1 5c9d4b9
Merge branch 'main' into fix/spec_v2_fix
Qiaolin-Yu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -389,11 +389,21 @@ def _resolve_spec_overlap_token_ids( | |||||||||||||||||||
| stride = self.draft_worker.speculative_num_draft_tokens | ||||||||||||||||||||
|
|
||||||||||||||||||||
| for i, req in enumerate(batch.reqs): | ||||||||||||||||||||
| # -1 because prepare_for_decode pre-claimed the bonus slot. | ||||||||||||||||||||
| req.kv_committed_len += accept_lens[i] - 1 | ||||||||||||||||||||
| predict_tokens.append( | ||||||||||||||||||||
| next_token_ids[i * stride : i * stride + accept_lens[i]] | ||||||||||||||||||||
| ) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| if req.is_retracted: | ||||||||||||||||||||
| # reset_for_retract() already zeroes committed/allocated KV. | ||||||||||||||||||||
| continue | ||||||||||||||||||||
|
|
||||||||||||||||||||
| if req.finished(): | ||||||||||||||||||||
| # -1 because prepare_for_decode pre-claimed the bonus slot. | ||||||||||||||||||||
| req.kv_committed_len -= 1 | ||||||||||||||||||||
| continue | ||||||||||||||||||||
|
Comment on lines
+400
to
+403
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly to retracted requests, when a request is already finished, its accepted tokens should be excluded from the global
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| # -1 because prepare_for_decode pre-claimed the bonus slot. | ||||||||||||||||||||
| req.kv_committed_len += accept_lens[i] - 1 | ||||||||||||||||||||
| req.spec_verify_ct += 1 | ||||||||||||||||||||
|
|
||||||||||||||||||||
| accepted_draft_tokens = result.num_accepted_drafts_per_req_cpu[i] | ||||||||||||||||||||
|
|
||||||||||||||||||||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When skipping a retracted request, its contribution to the global
result.num_accepted_tokens(calculated at line 358) should also be removed. This ensures that the batch-level speculative metrics (used inupdate_spec_metricsandreport_decode_stats) accurately reflect only the tokens that were actually committed to active requests, maintaining consistency between global and per-request statistics.