[feat] Kimi K2/DeepSeek Support eagle3#35966
[feat] Kimi K2/DeepSeek Support eagle3#35966leihuang-sketch wants to merge 3 commits intovllm-project:mainfrom
Conversation
|
Hi @leihuang-sketch, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request adds Eagle3 speculative decoding support for DeepSeek V2 and Kimi K2.5 models. The changes involve implementing the SupportsEagle3 interface, adding logic to extract auxiliary hidden states from specified layers, and plumbing this through the Kimi model. I've found a couple of critical issues related to pipeline parallelism in the deepseek_v2.py implementation that need to be addressed. The proposed changes to the layer iteration and index calculation should resolve these issues.
|
Way over-commented. Please trim down the unnecessary comments. See also #36063 where I am refactoring how we do this, which should cut down on the amount of boilerplate needed to enable support here. |
Simplify docstrings and remove redundant comments that duplicate what the code already expresses. Keep only essential technical notes that explain non-obvious implementation details. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@benchislett Thanks for the feedback! I've trimmed down the comments significantly. #36063 looks great – looking forward to seeing it merged so I can refactor accordingly. |
|
Did you notice the accept rate you measured is way lower than the reported 2.746/3 ? @leihuang-sketch I personally ran some tests, using num_speculative_tokens=3 for example. The accept rate seems reasonable(~85%) when I use Kimi K2 + K2 eagle3 draft model,once I changed the checkpoints into K2.5 + K2.5 eagle3 draft model,it downs to ~50%. There shouldn't be any difference here, still wondering... cc @benchislett |
The accept rate on sglang is correct though, since the released K2.5 eagle3 draft model is measured on sglang. |
|
@leihuang-sketch can you try to reproduce the issue seen by @oreo-wjx ? |
|
See also #36361 |
|
Closing in favor of #36361. I was able to get both a custom EAGLE3 head as well as However, https://huggingface.co/lightseekorg/kimi-k2.5-eagle3 did not work. I will assume there is a configuration issue with that model and move forward with the other PR. |
I think I've somehow figured it out. The acceptance seems correct because the acc_len in SGLang includes the bonus token. So the actual accept rate for this Eagle3 draft model is ~1.746/3 for GSM8K. |
Hi from novita.ai team 👋
Purpose
improve throughput
Test Plan
Test Result
I just used the draft model
serve args:
test with gsm8k
From metrics, the acceptance rate is 51.78%.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.