-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Add Speculative Decoding Eagle3 topk > 1 #5318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 64 commits
Commits
Show all changes
65 commits
Select commit
Hold shift + click to select a range
4cc5501
Add Speculative Decoding Eagle3 topk > 1
qingquansong 6d5247c
Merge remote-tracking branch 'upstream/main' into qsong/sdtopk
qingquansong 2cabcde
Merge branch 'main' into qsong/sdtopk
hebiao064 a431024
Support Cuda Graph for Draft Decode when topk > 1
hebiao064 855755a
Support CUDA Graph for Target Verfy
hebiao064 d29608a
set metadata expand
hebiao064 121021f
fix
hebiao064 2afa5fb
update cuda graph
qingquansong 202867a
Fix problem which break normal path
hebiao064 e1eb605
switch to vllm merge state
qingquansong 2122a0a
Merge branch 'main' into qsong/sdtopk
qingquansong d6e9cc9
clean up
qingquansong 02a1a15
support deepseek
hebiao064 3bf8e77
remove verify expand attention mask pad
qingquansong 0bdb3b8
Merge branch 'main' into qsong/sdtopk
qingquansong 64f1c0f
switch to merge_state v2
qingquansong e73c9c4
update to triton
qingquansong e140f6d
addd mode
qingquansong 6381207
Merge branch 'main' into qsong/sdtopk
qingquansong 6c67661
rebase
qingquansong 36668c2
Merge branch 'main' into qsong/sdtopk
qingquansong 0ec0422
remove comment
hebiao064 e45368a
Merge branch 'qsong/sdtopk' of https://github.com/qingquansong/sglang…
hebiao064 e78a96f
cleanup
qingquansong 57cec66
Merge branch 'qsong/sdtopk' of https://github.com/qingquansong/sglang…
hebiao064 ab16678
Merge branch 'main' into qsong/sdtopk
qingquansong f133545
fix
hebiao064 5f9a0d6
fix
hebiao064 336009f
fix test
hebiao064 beb981f
Merge branch 'main' into qsong/sdtopk
hebiao064 190dcfa
fix
hebiao064 0a2606a
Revert "fix"
hebiao064 1452022
Revert "fix test"
hebiao064 de0bb77
fix
hebiao064 43377ab
Merge branch 'main' into qsong/sdtopk
hebiao064 669ae0d
fix
hebiao064 d7ecff3
Merge remote-tracking branch 'upstream/main' into qsong/sdtopk
qingquansong deb083c
format
qingquansong 30bfa5c
remove submodule
qingquansong 039d5b8
Merge branch 'main' into qsong/sdtopk
qingquansong 48caaaf
Merge branch 'main' into qsong/sdtopk
qingquansong e16c2e2
fix
hebiao064 5909701
fix rebase
qingquansong 3fca5dd
Merge branch 'main' into qsong/sdtopk
hebiao064 92a9307
address comment about return_softmax_lse
hebiao064 595dd70
Merge branch 'qsong/sdtopk' of https://github.com/qingquansong/sglang…
hebiao064 352a83e
Merge branch 'main' into qsong/sdtopk
qingquansong 506653a
fix
hebiao064 7a40ffe
Merge branch 'qsong/sdtopk' of https://github.com/qingquansong/sglang…
hebiao064 68d43d2
Merge branch 'main' into qsong/sdtopk
qingquansong ef01c2d
enable fa3 for broader use case
qingquansong cf7cb69
fix format and remove is_no_spec_infer_or_topk_one
hebiao064 67691f2
support page size > 1 for top k = 1
hebiao064 788051f
Merge branch 'main' into qsong/sdtopk
qingquansong c1b6f87
fix
hebiao064 6fe3ec0
Merge branch 'qsong/sdtopk' of https://github.com/qingquansong/sglang…
hebiao064 f648298
Merge branch 'main' into qsong/sdtopk
zhyncs beee6a0
Update model_runner.py typo
hebiao064 cac34ec
Merge branch 'main' into qsong/sdtopk
zhyncs 6469e62
auto adjust draft_tokens = num_steps + 1
hebiao064 8af3ec1
Merge branch 'qsong/sdtopk' of https://github.com/qingquansong/sglang…
hebiao064 67e0095
Merge branch 'main' into qsong/sdtopk
qingquansong b966c6f
add test for top k > 1
hebiao064 9985da0
Merge branch 'main' into qsong/sdtopk
zhyncs 73c1868
Merge branch 'main' into qsong/sdtopk
qingquansong File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
931 changes: 781 additions & 150 deletions
931
python/sglang/srt/layers/attention/flashattention_backend.py
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hebiao064 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.