[Bugfix] Fixed an accuracy problem of sp with eagle3 by drslark · Pull Request #5816 · vllm-project/vllm-ascend

drslark · 2026-01-12T10:52:14Z

What this PR does / why we need it?

Fixed an accuracy problem when using eagle3 with sp.

The problem is described in #5825.

It also adds a much more precise way to determine whether drafter should use sp or not.

Also, it changes the eager of drafter to be a real eager in frontend to avoid a fx-graph problem.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

For simpilicity, we test it as in #5825.

And we get the same result of eagle3 with sp disabled.

--------------------------------------------------
total_num_output_tokens: 1000
num_drafts: 437
num_draft_tokens: 1311
num_accepted_tokens: 564
mean acceptance length: 2.29
--------------------------------------------------
acceptance at token 0: 0.62
acceptance at token 1: 0.40
acceptance at token 2: 0.27
acceptance at token 3: 0.00
acceptance at token 4: 0.00
acceptance at token 5: 0.00

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2f4e654

github-actions · 2026-01-12T10:52:28Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to fix an accuracy issue with sequence parallelism (sp) for eagle3. The changes involve introducing a new function split_inputs_tp_to_sp to handle data splitting for sequence parallelism, replacing the previous torch.ops.vllm.maybe_pad_and_reduce. The logic for determining if a model is a Mixture-of-Experts (MoE) model has also been updated to differentiate between the main and drafter models.

My review has identified a critical issue in the new split_inputs_tp_to_sp function. The function does not correctly handle padding, which can lead to the model processing uninitialized data and result in accuracy problems. I have provided a code suggestion to fix this.

Signed-off-by: drslark <slarksblood@qq.com>

…gle3 (#5814) ### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in #5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in #5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` pick-from: #5816 * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: [CI] Fix lint CI (vllm-project#5880) [Feature] implement eagle spec decoding for model runner v2 (vllm-project#5840) [Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (vllm-project#5718) [EPLB][Bugfix] Get expert map from layers (vllm-project#5817) [Bugfix] Fixed an accuracy problem of sp with eagle3 (vllm-project#5816) [P/D] bugfix for p node force free requset (vllm-project#5431) [Lint]Style: Convert `example` to `ruff format` (vllm-project#5863) [Main2Main] Upgrade vllm commit to 0109 (vllm-project#5752) [Bugfix][P/D] fix layerwise connector for decoder tp size > num kv heads (vllm-project#5846) [Test][e2e][LoRA] Add more e2e tests to cover scenarios of LoRA (vllm-project#4075) [CustomOp][Perf] Merge Q/K split to simplify AscendApplyRotaryEmb for better performance (vllm-project#5799) [Lint]Style: Convert `root`, `benchmarks`, `tools` and `docs` to `ruff format` (vllm-project#5843) enable ep32 for dispatch_ffn_combine (vllm-project#5787)

### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in vllm-project#5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in vllm-project#5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>

…gle3 (vllm-project#5814) ### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in vllm-project#5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in vllm-project#5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` pick-from: vllm-project#5816 * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>

### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in vllm-project#5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in vllm-project#5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>

### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in vllm-project#5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in vllm-project#5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in vllm-project#5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in vllm-project#5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>

### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in vllm-project#5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in vllm-project#5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Fixed an accuracy problem when using eagle3 with sp. The problem is described in vllm-project#5825. It also adds a much more precise way to determine whether drafter should use `sp` or not. Also, it changes the `eager` of drafter to be a real `eager` in frontend to avoid a `fx-graph` problem. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? For simpilicity, we test it as in vllm-project#5825. And we get the same result of `eagle3` with `sp` disabled. ```text -------------------------------------------------- total_num_output_tokens: 1000 num_drafts: 437 num_draft_tokens: 1311 num_accepted_tokens: 564 mean acceptance length: 2.29 -------------------------------------------------- acceptance at token 0: 0.62 acceptance at token 1: 0.40 acceptance at token 2: 0.27 acceptance at token 3: 0.00 acceptance at token 4: 0.00 acceptance at token 5: 0.00 ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>

github-actions bot added the module:core label Jan 12, 2026

drslark force-pushed the eagle3_sp branch from 0d9c32a to 46647dd Compare January 12, 2026 10:53

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

Comment thread vllm_ascend/spec_decode/eagle_proposer.py

drslark force-pushed the eagle3_sp branch 5 times, most recently from cf1a4dc to 0b69224 Compare January 13, 2026 11:57

[main][Bugfix] Fixed an accuracy problem of sp with eagle3

0df1cce

Signed-off-by: drslark <slarksblood@qq.com>

drslark force-pushed the eagle3_sp branch from 0b69224 to 0df1cce Compare January 13, 2026 12:04

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 13, 2026

wangxiyuan changed the title ~~[main][Bugfix] Fixed an accuracy problem of sp with eagle3~~ [Bugfix] Fixed an accuracy problem of sp with eagle3 Jan 14, 2026

wangxiyuan approved these changes Jan 14, 2026

View reviewed changes

wangxiyuan merged commit 48ec978 into vllm-project:main Jan 14, 2026
39 checks passed

drslark mentioned this pull request Jan 14, 2026

[Bug]: eagle3 with sp will cause an accuracy problem in drafter model #5825

Closed

yiz-liu mentioned this pull request Apr 11, 2026

[Bug]: 投机推理相关入图逻辑与代码实现不符 #8138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fixed an accuracy problem of sp with eagle3#5816

[Bugfix] Fixed an accuracy problem of sp with eagle3#5816
wangxiyuan merged 1 commit intovllm-project:mainfrom
drslark:eagle3_sp

drslark commented Jan 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drslark commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drslark commented Jan 12, 2026 •

edited

Loading