Kimi k2.5 MLA based eagle3 by jhaotingc · Pull Request #36361 · vllm-project/vllm

jhaotingc · 2026-03-07T23:21:15Z

Purpose

This allows for Eagles that share MLA instead of GQA for attention, so one can train Eagle3s for Kimi and Deepseek and use them across TRTLLM, SGL, and vLLM.

Test Plan

Acc benchmark:

lm_eval \
  --model local-completions \
  --model_args base_url=http://my_server:8001/v1/completions,model=/trt_llm_ci/data/llm-models/Kimi-K2.5-NVFP4,num_concurrent=16,tokenized_requests=False,trust_remote_code=True \
  --tasks gsm8k \
  --batch_size 16

Test Result

without Eagle3

local-completions ({'base_url': 'http://my_server:8001/v1/completions', 'model': '/trt_llm_ci/data/llm-models/Kimi-K2.5-NVFP4', 'num_concurrent': 16, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: None, batch_size: 16
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9295|±  |0.0071|
|     |       |strict-match    |     5|exact_match|↑  |0.9295|±  |0.0071|

with Eagle3

local-completions ({'base_url': 'http://my_server:8001/v1/completions', 'model': '/trt_llm_ci/data/llm-models/Kimi-K2.5-NVFP4', 'num_concurrent': 16, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: None, batch_size: 16
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9265|±  |0.0072|
|     |       |strict-match    |     5|exact_match|↑  |0.9257|±  |0.0072|

Acceptance:

  --- Weighted averages (by accepted tokens) ---
  Mean acceptance length:   2.785
  Avg draft accept rate:    59.5%
  Per-position accept rate: [0.826, 0.594, 0.365]

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-03-07T23:21:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jhaotingc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request adds support for Eagle3 speculative decoding for Deepseek and Kimi models. The changes include a new deepseek_eagle3.py model implementation, and modifications to deepseek_v2.py to support auxiliary hidden state extraction. The configuration and model registry are also updated. My review focuses on a logical issue in the new deepseek_eagle3.py file where a condition is always false, leading to dead code and a bypassed assertion. I've provided a suggestion to fix this.

_{Note: Security Review did not run due to the size of the PR.}

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

benchislett · 2026-03-10T16:32:22Z

See #35966: why does this PR need a new model implementation and the other one doesn't?

IzzyPutterman · 2026-03-10T16:33:51Z

See #35966: why does this PR need a new model implementation and the other one doesn't?

This one implements the MLA based eagle3 not GQA like the other PR
This is a rebase of my ancient PR: #30574

benchislett · 2026-03-10T20:36:12Z

I was able to get this working locally. Seems fine but https://huggingface.co/lightseekorg/kimi-k2.5-eagle3 does not run. Two other EAGLE heads, one internal and one public, work fine. Assuming there's an unrelated config issue with the broken one.

Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

jhaotingc · 2026-03-11T01:30:36Z

The failed tests seem unrelated to our change @benchislett

benchislett · 2026-03-11T01:48:21Z

will rerun failed tests a couple times and update from main once again if that doesn't work. failing that, we'll force-merge tomorrow.

leihuang-sketch · 2026-03-11T03:05:10Z

@jhaotingc In my tests, I found that there was no improvement in performance, and the throughput remained almost the same without additional parameters for eagle3. Could it be because the output of the gsm8k dataset is too short?

leihuang-sketch · 2026-03-11T06:41:42Z

LGTM

Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com>

mergify Bot added deepseek Related to DeepSeek models new-model Requests to new models speculative-decoding v1 labels Mar 7, 2026

mergify Bot added the needs-rebase label Mar 7, 2026

gemini-code-assist Bot reviewed Mar 7, 2026

View reviewed changes

Comment thread vllm/model_executor/models/deepseek_eagle3.py

Izzy Putterman and others added 4 commits March 10, 2026 09:17

DeepSeek Eagle3

e7fad95

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

Typo

4a0e14d

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

modification for K2.5

492b7f1

Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

pre-commit

9be34d6

Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

jhaotingc force-pushed the kimi_k2_eagle3 branch from 6be5049 to 9be34d6 Compare March 10, 2026 16:19

jhaotingc marked this pull request as ready for review March 10, 2026 16:20

jhaotingc requested review from MatthewBonanni, ProExpertProg, WoosukKwon, benchislett, hmellor, houseroad, luccafong, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 10, 2026 16:20

mergify Bot removed the needs-rebase label Mar 10, 2026

benchislett mentioned this pull request Mar 10, 2026

[feat] Kimi K2/DeepSeek Support eagle3 #35966

Closed

5 tasks

benchislett reviewed Mar 10, 2026

View reviewed changes

Comment thread vllm/model_executor/models/deepseek_eagle3.py

jhaotingc changed the title ~~Kimi k25 eagle3~~ Kimi k2.5 MLA based eagle3 Mar 10, 2026

benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026

benchislett enabled auto-merge (squash) March 10, 2026 21:15

fix ci test examples

688e8b9

Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>

auto-merge was automatically disabled March 10, 2026 23:38
Head branch was pushed to by a user without write access

jhaotingc requested review from DarkLight1337 and ywang96 as code owners March 10, 2026 23:38

benchislett enabled auto-merge (squash) March 10, 2026 23:40

cicirori mentioned this pull request Mar 11, 2026

Exported Eagle3 checkpoints contain random embed_tokens weights, causing poor acceptance length in vLLM lightseekorg/TorchSpec#38

Closed

Merge branch 'main' into kimi_k2_eagle3

0f8be99

benchislett approved these changes Mar 11, 2026

View reviewed changes

benchislett merged commit 5573894 into vllm-project:main Mar 11, 2026
55 checks passed

juliendenize mentioned this pull request Mar 16, 2026

Fix EagleMistralLarge3Model initialization #37232

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kimi k2.5 MLA based eagle3#36361

Kimi k2.5 MLA based eagle3#36361
benchislett merged 6 commits intovllm-project:mainfrom
jhaotingc:kimi_k2_eagle3

jhaotingc commented Mar 7, 2026 •

edited by github-actions Bot

Loading

Uh oh!

mergify Bot commented Mar 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

benchislett commented Mar 10, 2026

Uh oh!

IzzyPutterman commented Mar 10, 2026

Uh oh!

Uh oh!

benchislett commented Mar 10, 2026

Uh oh!

jhaotingc commented Mar 11, 2026

Uh oh!

benchislett commented Mar 11, 2026

Uh oh!

leihuang-sketch commented Mar 11, 2026

Uh oh!

leihuang-sketch commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

jhaotingc commented Mar 7, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Mar 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

benchislett commented Mar 10, 2026

Uh oh!

IzzyPutterman commented Mar 10, 2026

Uh oh!

Uh oh!

benchislett commented Mar 10, 2026

Uh oh!

jhaotingc commented Mar 11, 2026

Uh oh!

benchislett commented Mar 11, 2026

Uh oh!

leihuang-sketch commented Mar 11, 2026

Uh oh!

leihuang-sketch commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jhaotingc commented Mar 7, 2026 •

edited by github-actions Bot

Loading