-
-
Notifications
You must be signed in to change notification settings - Fork 15.6k
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding #32951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
MatthewBonanni
merged 112 commits into
vllm-project:main
from
MatthewBonanni:async-eagle-mod
Mar 23, 2026
+488
−209
Merged
Changes from all commits
Commits
Show all changes
112 commits
Select commit
Hold shift + click to select a range
b141669
[Perf][Async] Implement zero-bubble async speculative decoding
izhuhaoran 679054f
skip compute_slot_mapping for async_spec_zero_bubble_mode
izhuhaoran 527b55d
remove seq_lens_cpu & num_computed_tokens_cpu for async_spec_zero_bub…
izhuhaoran 8c2ce4c
Get rid of async_spec_zero_bubble_mode option
MatthewBonanni 1554ab7
Fully async version
MatthewBonanni 315e7e6
Increase max_concurrent_batches
MatthewBonanni 15d8ee9
Handle reordering
MatthewBonanni abb8ba7
Merge branch 'main' into async-eagle-mod
MatthewBonanni dd8b7c3
Fix
MatthewBonanni 1573015
Merge branch 'main' into async-eagle-mod
MatthewBonanni 317b452
Cleanup
MatthewBonanni e7e39ce
Cleanup
MatthewBonanni 6fb3042
Fix
MatthewBonanni 7ff5674
Cleanup
MatthewBonanni c491421
Fix hybrid
MatthewBonanni a2e861e
Disable mamba cache mode align
MatthewBonanni 0c45e7e
Treat num_accepted_tokens like num_computed_tokens
MatthewBonanni 1eccc1d
Treat num_accepted_tokens like num_computed_tokens
MatthewBonanni 7005b52
Treat num_accepted_tokens like num_computed_tokens
MatthewBonanni 4e4d8d5
Cleanup
MatthewBonanni c134792
Eliminate compute_slot_mapping
MatthewBonanni ea7f670
Rename compute_slot_mapping_gpu to compute_slot_mapping
MatthewBonanni a113c3f
Restore comments
MatthewBonanni 1153da9
Restore comments
MatthewBonanni e04f1f2
Add TODO comment
MatthewBonanni a224144
Fix (num_accepted_tokens shouldn't be int64)
MatthewBonanni c1f550b
Cleanup
MatthewBonanni ab2d19a
Fix
MatthewBonanni 3c732eb
Fix
MatthewBonanni 843a151
Fix
MatthewBonanni 4c522a2
Fix
MatthewBonanni c329c93
Cleanup
MatthewBonanni 98c146b
Use CpuGpuBuffer for arange
MatthewBonanni 5f1d06f
Make seq_lens GPU-only and introduce optimistic_seq_lens
MatthewBonanni 050959c
Restructure if block
MatthewBonanni 718087f
Fix TypeError
MatthewBonanni 31c8674
Fix order of operations error
MatthewBonanni 4c6bd9d
Fix positions and seq_lens calculation
MatthewBonanni 03adb2e
Improve _get_cumsum_and_arange
MatthewBonanni e76d32e
Rename
MatthewBonanni 526fea3
Use query_pos
MatthewBonanni 48df0ab
Eliminate CPU-side num_computed_tokens from GPUModelRunner. Update op…
MatthewBonanni f3a2684
Fix M-RoPE and XD-RoPE
MatthewBonanni 9f20781
Factor out _compute_batch_index_mapping
MatthewBonanni 779ee44
Fix optimistic_seq_lens_cpu update
MatthewBonanni 2fab694
Fix acceptance length
MatthewBonanni 65f7b3d
Fix
MatthewBonanni 8cc914c
Use CpuGpuBuffer
MatthewBonanni 2dacc94
Merge branch 'main' into async-eagle-mod
MatthewBonanni fa849f8
Fix
MatthewBonanni ccc9575
Use preallocated buffer
MatthewBonanni bc13824
Use triton kernel instead of pytorch ops
MatthewBonanni 3f0be40
Use buffers to avoid sync
MatthewBonanni 664eaf1
Always use placeholders
MatthewBonanni 66ae34b
Comment
MatthewBonanni c645e05
Bugfix: add arange scratch buffer
MatthewBonanni 04e023b
Use buffers to prevent allocation on the fly
MatthewBonanni 60ca5ec
Re-add indices_match fast path
MatthewBonanni 82da483
Clean up input batch
MatthewBonanni f7d028e
Merge branch 'main' into async-eagle-mod
MatthewBonanni 0cdf7de
Simplify BatchIndexMapping and num_computed_tokens tracking
LucasWilkinson d0b43ec
Merge pull request #2 from LucasWilkinson/simplify-batch-index-mapping
MatthewBonanni 7fde95a
Fix acceptance length
MatthewBonanni e20565d
Deduplicate
MatthewBonanni fa02bb8
Make positions GPU-only
MatthewBonanni a74cd15
Eliminate has_prev_draft_tokens
MatthewBonanni 6aa50c2
Add CPU correction
MatthewBonanni 3cee2b2
Merge branch 'main' into async-eagle-mod
MatthewBonanni d4304e4
Skip unnecessary copy
MatthewBonanni 56bd8f8
Rename
MatthewBonanni a694e92
Docstring
MatthewBonanni 836fed2
Use optimistic_seq_lens_cpu and clean up runner
MatthewBonanni 02e6ee2
Fix arange_size
MatthewBonanni 39807a8
Merge branch 'main' into async-eagle-mod
MatthewBonanni 09ffa74
Undo disable
MatthewBonanni 1023895
Rename
MatthewBonanni 168ecc2
Rename
MatthewBonanni 79a2127
Docstring
MatthewBonanni d84e371
Add comment
MatthewBonanni 504a7e1
Move to utils
MatthewBonanni 83c52fd
Comment
MatthewBonanni a1cd3c8
Clean up unnecessary change
MatthewBonanni f0f9978
Add comment
MatthewBonanni 1c7d2a7
Restore fast path
MatthewBonanni 74fb29a
Remove unrelated fast path
MatthewBonanni 5f22dce
Clean up
MatthewBonanni 65f18f1
Merge branch 'main' into async-eagle-mod
MatthewBonanni 741f181
Update comment
MatthewBonanni 1a31810
Merge branch 'main' into async-eagle-mod
MatthewBonanni 865f2c1
Clean up commit_slot_mapping (dead code)
MatthewBonanni 70c12ed
Accumulate rejections and only issue copies when previous has been co…
MatthewBonanni 279b867
Fix
MatthewBonanni 790a8ab
Revert "Accumulate rejections and only issue copies when previous has…
MatthewBonanni abc93ec
Update CPU side with _finalize_async_spec_cpu_state
MatthewBonanni e2d5dbc
Clean up
MatthewBonanni 8980abb
Clean up
MatthewBonanni 7a3412a
Comments
MatthewBonanni 1cac626
Clean up
MatthewBonanni 4192b65
Comment
MatthewBonanni 1c2499d
Use deferred_spec_decode_corrections per Lucas's suggestion
MatthewBonanni 30dae23
Merge branch 'main' into async-eagle-mod
MatthewBonanni f852fcc
Fix type
MatthewBonanni 5152276
Clean up
MatthewBonanni 39b8b38
Merge branch 'main' into async-eagle-mod
MatthewBonanni 377d609
Fix mamba align
MatthewBonanni 51eb871
Fix ngram during correction
MatthewBonanni 6811111
Fix mamba
MatthewBonanni 26741ad
Fix ngram
MatthewBonanni dfd0896
Clean up
MatthewBonanni ed3fb14
Merge branch 'main' into async-eagle-mod
MatthewBonanni 6bed7ac
Fix
MatthewBonanni 7e29a71
Merge branch 'main' into async-eagle-mod
MatthewBonanni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.