[0.13.0][cherry-pick][BugFix] Support setting tp=1 for the Eagle draft model to take effect by zhaomingyu13 · Pull Request #5804 · vllm-project/vllm-ascend

zhaomingyu13 · 2026-01-12T08:25:18Z

What this PR does / why we need it?

According to the official documentation, the parameter "draft_tensor_parallel_size": 1 is supposed to be applied to the Eagle3 model. However, based on actual debugging, it was found that the number of tensor parallelisms (tp) of the Eagle model is consistent with that of the target model. The setting of tp for the draft model did not take effect as expected.

Note: This feature has not been superimposed and tested with sp and dp. It will be adapted later

Does this PR introduce any user-facing change?

No

How was this patch tested?

from vllm import LLM, SamplingParams

def main():
    prompts = [
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    # Create an LLM.
    llm = LLM(
            model="meta-llama/Llama-3.1-8B-Instruct",
            tensor_parallel_size=4,
            gpu_memory_utilization=0.9,
            enforce_eager=True,
            speculative_config={
                "method": "eagle3",
                "model": "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"
                "draft_tensor_parallel_size": 1,
                "num_speculative_tokens": 3,
            },
        )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    print(f"Outputs: {outputs}")
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Fixes vllm-project/vllm#31345

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: drslark <slarksblood@qq.com>

gemini-code-assist

Code Review

This pull request aims to fix a bug where setting tensor_parallel_size=1 for the Eagle draft model was not taking effect. The approach of creating a temporary tensor parallel group and patching the global TP group during the draft model loading is sound. However, I've identified a critical issue in the implementation that prevents it from working as intended. The new tensor parallel group is being created with group_name="tp", which is the same name as the main tensor parallel group. This causes init_model_parallel_group to return the existing main TP group instead of creating a new one, rendering the patch ineffective. I have provided a suggestion to resolve this. The accompanying test additions and an unrelated but correct fix for UniformTypeKVCacheSpecs are well-implemented.

[BugFix] Support setting tp=1 for the Eagle draft model to take effect

f8224ab

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: drslark <slarksblood@qq.com>

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

Comment thread vllm_ascend/spec_decode/eagle_proposer.py

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 12, 2026

wangxiyuan merged commit 7c71736 into vllm-project:releases/v0.13.0 Jan 13, 2026
17 checks passed

wangxiyuan changed the title ~~[BugFix] Support setting tp=1 for the Eagle draft model to take effect~~ [0.13.0][cherry-pick][BugFix] Support setting tp=1 for the Eagle draft model to take effect Jan 13, 2026

Yikun mentioned this pull request Feb 5, 2026

[v0.13.0rc2] FAQ / Feedback | 问题/反馈 #6186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.13.0][cherry-pick][BugFix] Support setting tp=1 for the Eagle draft model to take effect#5804

[0.13.0][cherry-pick][BugFix] Support setting tp=1 for the Eagle draft model to take effect#5804
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
zhaomingyu13:releases

zhaomingyu13 commented Jan 12, 2026 •

edited by wangxiyuan

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhaomingyu13 commented Jan 12, 2026 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhaomingyu13 commented Jan 12, 2026 •

edited by wangxiyuan

Loading