[bugfix]Prevent overwriting drafters lm-head and embed_tokens by HF-001 · Pull Request #4134 · vllm-project/vllm-ascend

HF-001 · 2025-11-12T02:40:54Z

What this PR does / why we need it?

Some EAGLE3 drafters might have their own lm_head and/or embed_tokens layers. Existing codebase ignores this.By solving this problem, it can greatly improve acceptance rates. Refer to this pr in vllm: vllm-project/vllm#27737

Does this PR introduce any user-facing change?

How was this patch tested?

export CUDA_VISIBLE_DEVICES=0
export TP=1
export MODEL_PATH=/model/Llama-3.1-8B-Instruct
export MODEL_NAME=Llama-3.1-8B-Instruct
export PORT=10133
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port ${PORT} --dtype bfloat16 --model ${MODEL_PATH} --served-model-name ${MODEL_NAME} --tensor-parallel-size ${TP} --gpu-memory-utilization 0.85 --max-model-len 32768 --trust-remote-code --seed 42 --speculative_config '{"method":"eagle3","model":"/model/EAGLE3-LLaMA3.1-Instruct-8B","num_speculative_tokens":5,"draft_tensor_parallel_size":1}'

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@83f478b

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

github-actions · 2025-11-12T02:41:02Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to prevent overwriting the lm_head and embed_tokens of a drafter model if they are already initialized, which is particularly important for some EAGLE3 drafters. The changes correctly add checks for has_own_embed_tokens and has_own_lm_head flags before sharing these layers from the target model. However, I've identified a critical regression in the logic for handling lm_head for SpecDcodeType.EAGLE. The check that ensures the target model has an lm_head attribute before it's accessed has been removed, which will likely cause an AttributeError at runtime. My review includes a specific code suggestion to fix this issue.

gemini-code-assist · 2025-11-12T02:41:57Z

vllm_ascend/spec_decode/eagle_proposer.py

+            if hasattr(model, "lm_head"):
+                logger.info("Loading EAGLE LM head weights from the target model.")
            if supports_multimodal(model):
                self.model.lm_head = model.get_language_model().lm_head
            else:
                self.model.lm_head = model.lm_head


The logic for sharing lm_head for SpecDcodeType.EAGLE has been changed in a way that introduces a potential AttributeError. The assignments to self.model.lm_head are no longer guarded by hasattr(model, "lm_head"). If the target model does not have an lm_head attribute, this will cause a crash. The original logic was safer and should be restored.

Suggested change

if hasattr(model, "lm_head"):

logger.info("Loading EAGLE LM head weights from the target model.")

if supports_multimodal(model):

self.model.lm_head = model.get_language_model().lm_head

else:

self.model.lm_head = model.lm_head

if hasattr(model, "lm_head"):

logger.info("Loading EAGLE LM head weights from the target model.")

if supports_multimodal(model):

self.model.lm_head = model.get_language_model().lm_head

else:

self.model.lm_head = model.lm_head

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

github-actions · 2025-12-29T08:29:09Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2026-01-05T03:38:25Z

Any progress? If this PR is still alive, please rebase to main and make CI happy. Thanks

HF-001 · 2026-01-06T08:41:51Z

Any progress? If this PR is still alive, please rebase to main and make CI happy. Thanks

@wangxiyuan I will handle this PR soon

wangxiyuan · 2026-03-10T01:16:38Z

Please rebase and fix the merge conflicts

wangxiyuan · 2026-04-10T06:26:36Z

no update for long time, close this now. Feel free to reopen if it's still needed.

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

9b396bc

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

01267596 added 5 commits November 12, 2025 02:53

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

cccf7ef

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

f258b30

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

a28bdf1

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

471ed46

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

05ab251

Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

github-actions bot added the merge-conflicts label Dec 29, 2025

wangxiyuan closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix]Prevent overwriting drafters lm-head and embed_tokens#4134

[bugfix]Prevent overwriting drafters lm-head and embed_tokens#4134
HF-001 wants to merge 6 commits intovllm-project:mainfrom
HF-001:eagle3_lm_head_dev

HF-001 commented Nov 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

HF-001 commented Jan 6, 2026

Uh oh!

wangxiyuan commented Mar 10, 2026

Uh oh!

wangxiyuan commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HF-001 commented Nov 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

HF-001 commented Jan 6, 2026

Uh oh!

wangxiyuan commented Mar 10, 2026

Uh oh!

wangxiyuan commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HF-001 commented Nov 12, 2025 •

edited by github-actions bot

Loading