-
Notifications
You must be signed in to change notification settings - Fork 0
[Meta] Unshift eagle prefill support with potential KV sharing from base to draft #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: eagle-mm-support
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
e19a8ee to
2c4e856
Compare
d64bf91 to
793e3c0
Compare
2c4e856 to
fb93e7f
Compare
c025d00 to
d58d4fc
Compare
d58d4fc to
406e8fa
Compare
fb93e7f to
e77dbaf
Compare
406e8fa to
80c318e
Compare
2c55a86 to
3b34c1c
Compare
80c318e to
873c311
Compare
3b34c1c to
cedeb99
Compare
873c311 to
c531b0e
Compare
ed5a9a3 to
5c089f5
Compare
…2184) Signed-off-by: kf <[email protected]> Signed-off-by: tjtanaavllm <[email protected]> Signed-off-by: vllmellm <[email protected]> Co-authored-by: kf <[email protected]>
…ect#22099) Signed-off-by: Moritz Sanft <[email protected]>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…lm-project#20707) Signed-off-by: Adrian Garcia <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
…22433) Signed-off-by: shaojunqi <[email protected]> Co-authored-by: shaojunqi <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
…ay Observability (vllm-project#21578) Signed-off-by: Ricardo Decal <[email protected]>
Signed-off-by: Andrew Chan <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
…22942) Signed-off-by: mgoin <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: mgoin <[email protected]>
…llm-project#22812) Signed-off-by: Amir Klein <[email protected]> Co-authored-by: Michael Goin <[email protected]>
…ut Improvement (vllm-project#22763) Signed-off-by: yewentao256 <[email protected]>
…2818) Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>
…e." (vllm-project#22956) Signed-off-by: vllmellm <[email protected]> Co-authored-by: vllmellm <[email protected]>
…ject#22643) Signed-off-by: frankie-ys <[email protected]> Signed-off-by: frankie <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Kuntai Du <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: amirk <[email protected]> Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Co-authored-by: Asaf Joseph Gardin <[email protected]>
Signed-off-by: Sayandip Dutta <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Jinzhen Lin <[email protected]>
…ject#22950) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>
…2, Mamba1, Minimax) (vllm-project#22928) Signed-off-by: Daniel Afrimi <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Co-authored-by: Daniel Afrimi <[email protected]> Co-authored-by: Burkhard Ringlein <[email protected]> Co-authored-by: Chen Zhang <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Csrayz <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
…ogonal to compilation, add support for FA2 and FlashInfer (vllm-project#20059) Signed-off-by: fhl <[email protected]> Signed-off-by: fhl2000 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
…VCrossParallelLinear since its width is 0 (vllm-project#22369) Signed-off-by: sstamenk <[email protected]>
17483f9 to
4899a61
Compare
rope change rope change rebase
Signed-off-by: morgendave <[email protected]>
Signed-off-by: morgendave <[email protected]>
4899a61 to
5132b74
Compare
5132b74 to
a915428
Compare
Purpose
Eagle Draft Prefill Unshift the tokens + KV sharing support, generally removes the condition of shifting in prefill which is a better case for calculating prefix caching and supports MM
Pending work needs to be followed
Under auto batch size for lm_eval, MM might crash due to some scheduling issues causing extra/unexpected placeholder tokens leaked into input tokens. Roger might fix this using mm positions
Test Plan
Local test with MT Bench
Unshift only - TEST A
shift original - Baseline
Test Result
Baseline
TEST A