[Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3 by zhaomingyu13 · Pull Request #6139 · vllm-project/vllm-ascend

zhaomingyu13 · 2026-01-22T09:24:09Z

What this PR does / why we need it?

Due to the long-term lack of synchronization with the upstream code, a problem that led to a decrease in the acceptance rate of the Qwen3-30B-A3B-EAGLE3 draft model was introduced when fixing the bug（#5967）. Now, synchronize with the upstream and fix this bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

from vllm import LLM, SamplingParams

def main():
    prompts = [
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    # Create an LLM.
    llm = LLM(
            model="Qwen/Qwen3-30B-A3B",
            tensor_parallel_size=4,
            gpu_memory_utilization=0.9,
            enforce_eager=True,
            speculative_config={
                "method": "eagle3",
                "model": "AngelSlim/Qwen3-a3B_eagle3"
                "num_speculative_tokens": 3,
            },
        )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    print(f"Outputs: {outputs}")
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

gemini-code-assist

Code Review

This pull request aims to fix a bug causing a decline in the acceptance rate for the Qwen3-30B-A3B-EAGLE3 model by synchronizing code with an upstream version. The core changes are within vllm_ascend/spec_decode/eagle_proposer.py, where the logic for sharing token embeddings between the target and draft models has been significantly improved. The new implementation is more robust, handling multimodal models, various embedding layer names, and differentiating between EAGLE and MTP models. However, I've identified a critical issue where a refactoring accidentally removed the definition of an attribute that is still used elsewhere in the class, which would lead to a runtime error. My review includes a suggestion to fix this. Overall, the changes are a good step forward, but this critical issue must be addressed.

gemini-code-assist · 2026-01-22T09:25:51Z

        assert len(draft_attn_layer_names) == 1
-        self.attn_layer_name = list(draft_attn_layer_names)
-        self.attn_layer_names = self.attn_layer_name
+        self.attn_layer_names = list(draft_attn_layer_names)


It seems that while refactoring, the assignment to self.attn_layer_name was removed. However, this attribute is still used later in the dummy_run (line 345) and _propose (line 458) methods. This change will cause an AttributeError at runtime. To fix this, please restore the assignment.

Suggested change

self.attn_layer_names = list(draft_attn_layer_names)

self.attn_layer_names = list(draft_attn_layer_names)

self.attn_layer_name = self.attn_layer_names

drslark · 2026-01-22T09:46:49Z

+                        "Keeping separate embedding weights from the target model."
+                    )
            else:
+                # MTP model


Please follow the original logic.

Oops, it's my mistake.

whx-sjtu · 2026-01-23T08:10:04Z

+                elif (isinstance(target_embed_tokens.weight, torch.Tensor)
+                      and isinstance(self.model.model.embed_tokens.weight,
+                                     torch.Tensor)
+                      # TODO: Offload to CPU for comparison to avoid extra GPU memory


github-actions · 2026-01-23T11:14:24Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…3B-EAGLE3 Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: drslark <slarkblood@qq.com>

Signed-off-by: zhaomingyu13 <zhaomingyu13@h-partners.com>

…3B-EAGLE3 (vllm-project#6139) ### What this PR does / why we need it? Due to the long-term lack of synchronization with the upstream code, a problem that led to a decrease in the acceptance rate of the Qwen3-30B-A3B-EAGLE3 draft model was introduced when fixing the bug（vllm-project#5967）. Now, synchronize with the upstream and fix this bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ```python from vllm import LLM, SamplingParams def main(): prompts = [ "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # Create an LLM. llm = LLM( model="Qwen/Qwen3-30B-A3B", tensor_parallel_size=4, gpu_memory_utilization=0.9, enforce_eager=True, speculative_config={ "method": "eagle3", "model": "AngelSlim/Qwen3-a3B_eagle3" "num_speculative_tokens": 3, }, ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) print(f"Outputs: {outputs}") for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Signed-off-by: zhaomingyu13 <zhaomingyu13@h-partners.com> Co-authored-by: drslark <slarkblood@qq.com>

gemini-code-assist Bot reviewed Jan 22, 2026

View reviewed changes

zhaomingyu13 force-pushed the releases/v0.13.0 branch from eeb96af to 821889f Compare January 23, 2026 02:19

zhaomingyu13 changed the title ~~[Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE~~ [Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3 Jan 23, 2026

zhaomingyu13 force-pushed the releases/v0.13.0 branch from 821889f to 635a10c Compare January 23, 2026 02:21

drslark reviewed Jan 23, 2026

View reviewed changes

drslark approved these changes Jan 23, 2026

View reviewed changes

zhaomingyu13 force-pushed the releases/v0.13.0 branch from 635a10c to b5f180d Compare January 23, 2026 07:36

whx-sjtu reviewed Jan 23, 2026

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 23, 2026

zhaomingyu13 force-pushed the releases/v0.13.0 branch 2 times, most recently from d972281 to 831a3bb Compare January 23, 2026 09:42

github-actions Bot added the merge-conflicts label Jan 23, 2026

zhaomingyu13 force-pushed the releases/v0.13.0 branch from 831a3bb to b7562ee Compare January 23, 2026 13:12

[Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A…

8420a65

…3B-EAGLE3 Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: drslark <slarkblood@qq.com>

zhaomingyu13 force-pushed the releases/v0.13.0 branch from b7562ee to 8420a65 Compare January 23, 2026 13:24

Merge branch 'releases/v0.13.0' into releases/v0.13.0

9a37ca0

Signed-off-by: zhaomingyu13 <zhaomingyu13@h-partners.com>

github-actions Bot removed the merge-conflicts label Jan 23, 2026

wangxiyuan merged commit 72a8f51 into vllm-project:releases/v0.13.0 Jan 23, 2026
11 checks passed

zhaomingyu13 deleted the releases/v0.13.0 branch March 2, 2026 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3#6139

[Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3#6139
wangxiyuan merged 2 commits intovllm-project:releases/v0.13.0from
zhaomingyu13:releases/v0.13.0

zhaomingyu13 commented Jan 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jan 22, 2026

Uh oh!

drslark Jan 22, 2026

Uh oh!

drslark Jan 22, 2026

Uh oh!

drslark Jan 22, 2026

Uh oh!

whx-sjtu Jan 23, 2026

Uh oh!

github-actions Bot commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	self.attn_layer_names = list(draft_attn_layer_names)
	self.attn_layer_names = list(draft_attn_layer_names)
	self.attn_layer_name = self.attn_layer_names

Conversation

zhaomingyu13 commented Jan 22, 2026

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

drslark Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

drslark Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

drslark Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants