Skip to content

Conversation

@morgendave
Copy link
Owner

@morgendave morgendave commented Jul 15, 2025

Purpose

Eagle Draft Prefill Unshift the tokens + KV sharing support, generally removes the condition of shifting in prefill which is a better case for calculating prefix caching and supports MM

Pending work needs to be followed
Under auto batch size for lm_eval, MM might crash due to some scheduling issues causing extra/unexpected placeholder tokens leaked into input tokens. Roger might fix this using mm positions

Test Plan

Local test with MT Bench
Unshift only - TEST A

 CUDA_VISIBLE_DEVICES=4,5,6,7 VLLM_USE_V1=1 python examples/offline_inference/spec_decode.py  --num_spec_tokens 3 --num_prompts 80 --method eagle --model_dir /home/zhiweiz/local/models/scout_base_HF_20250605_201140 --eagle_dir /home/zhiweiz/local/models/scout_draft_HF_20250605_202942 --tp 4 --dataset-name hf --dataset-path philschmid/mt-bench --no-prefill-token-shift

shift original - Baseline

 CUDA_VISIBLE_DEVICES=4,5,6,7 VLLM_USE_V1=1 python examples/offline_inference/spec_decode.py  --num_spec_tokens 3 --num_prompts 80 --method eagle --model_dir /home/zhiweiz/local/models/scout_base_HF_20250605_201140 --eagle_dir /home/zhiweiz/local/models/scout_draft_HF_20250605_202942 --tp 4 --dataset-name hf --dataset-path philschmid/mt-bench

Test Result

Baseline

total_num_output_tokens: 17757
num_drafts: 6348
num_draft_tokens: 19044
num_accepted_tokens: 11413
mean acceptance length: 2.80
--------------------------------------------------
acceptance at token 0: 0.79
acceptance at token 1: 0.58
acceptance at token 2: 0.42

TEST A

total_num_output_tokens: 17697
num_drafts: 6375
num_draft_tokens: 19125
num_accepted_tokens: 11329
mean acceptance length: 2.78
--------------------------------------------------
acceptance at token 0: 0.78
acceptance at token 1: 0.58
acceptance at token 2: 0.42

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@morgendave morgendave force-pushed the unshift-eagle-prefill branch from e19a8ee to 2c4e856 Compare July 15, 2025 21:06
@morgendave morgendave force-pushed the unshift-eagle-prefill branch from 2c4e856 to fb93e7f Compare July 16, 2025 18:45
@morgendave morgendave force-pushed the eagle-mm-support branch 3 times, most recently from c025d00 to d58d4fc Compare July 17, 2025 23:28
@morgendave morgendave force-pushed the unshift-eagle-prefill branch from fb93e7f to e77dbaf Compare July 24, 2025 23:22
@morgendave morgendave force-pushed the unshift-eagle-prefill branch 2 times, most recently from 2c55a86 to 3b34c1c Compare July 25, 2025 23:22
@morgendave morgendave force-pushed the unshift-eagle-prefill branch from 3b34c1c to cedeb99 Compare July 28, 2025 21:36
@morgendave morgendave force-pushed the unshift-eagle-prefill branch 5 times, most recently from ed5a9a3 to 5c089f5 Compare August 6, 2025 20:31
vllmellm and others added 10 commits August 6, 2025 23:05
…2184)

Signed-off-by: kf <[email protected]>
Signed-off-by: tjtanaavllm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Co-authored-by: kf <[email protected]>
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
njhill and others added 22 commits August 15, 2025 11:17
…ject#22643)

Signed-off-by: frankie-ys <[email protected]>
Signed-off-by: frankie <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
Signed-off-by: amirk <[email protected]>
Signed-off-by: asafg <[email protected]>
Co-authored-by: asafg <[email protected]>
Co-authored-by: Asaf Joseph Gardin <[email protected]>
…ject#22950)

Signed-off-by: Roger Wang <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…2, Mamba1, Minimax) (vllm-project#22928)

Signed-off-by: Daniel Afrimi <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Co-authored-by: Daniel Afrimi <[email protected]>
Co-authored-by: Burkhard Ringlein <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
…ogonal to compilation, add support for FA2 and FlashInfer (vllm-project#20059)

Signed-off-by: fhl <[email protected]>
Signed-off-by: fhl2000 <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
…VCrossParallelLinear since its width is 0 (vllm-project#22369)

Signed-off-by: sstamenk <[email protected]>
@morgendave morgendave force-pushed the unshift-eagle-prefill branch from 17483f9 to 4899a61 Compare August 15, 2025 17:27
@morgendave morgendave force-pushed the unshift-eagle-prefill branch from 4899a61 to 5132b74 Compare August 15, 2025 18:12
@morgendave morgendave force-pushed the unshift-eagle-prefill branch from 5132b74 to a915428 Compare August 15, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.