Skip to content

[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py#26590

Merged
robertgshaw2-redhat merged 3 commits intovllm-project:mainfrom
neuralmagic:fix-26042
Oct 13, 2025
Merged

[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py#26590
robertgshaw2-redhat merged 3 commits intovllm-project:mainfrom
neuralmagic:fix-26042

Conversation

@rahul-tuli
Copy link
Copy Markdown
Contributor

@rahul-tuli rahul-tuli commented Oct 10, 2025

This PR extends the quantization config fix from #25883 to llama_eagle.py

Background

PR #25883 fixed an issue where Eagle3 drafter models were incorrectly using the verifier model's quantization config instead of their own. This caused problems when the drafter and verifier models had different quantization configurations.

The fix introduced a get_quant_config() method in LlamaDecoderLayer that can be overridden by Eagle subclasses to use the draft model's quantization config.

Changes

This PR applies the same pattern to additional Eagle drafters:

llama_eagle.py

  • Adds get_quant_config() method to the LlamaDecoderLayer subclass in llama_eagle.py
  • The method retrieves the quantization config from the draft model instead of the verifier model
  • Uses VllmConfig.get_quantization_config() to properly obtain the draft model's quantization config

Impact

The fix ensures that Eagle drafter models correctly use their own quantization configuration, preventing quantization mismatches when used with differently quantized verifier models.

Notes


Fixes: Extends fix from #25883
Related: #25883

Verification

CUDA_VISIBLE_DEVICES=0 python examples/offline_inference/spec_decode.py \
  --method "eagle" \
  --tp 1 \
  --print-output \
  --model-dir "RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic" \
  --eagle-dir "yuhuili/EAGLE-LLaMA3.1-Instruct-8B" \
  --dataset_name "hf" \
  --dataset_path "philschmid/mt-bench" \
  --num-spec-tokens 5 2>&1 | tee local/acceptance-rates-eagle.txt

Output:

--------------------------------------------------
--------------------------------------------------
total_num_output_tokens: 210113
num_drafts: 87017
num_draft_tokens: 435085
num_accepted_tokens: 123333
mean acceptance length: 2.42
--------------------------------------------------
acceptance at token 0: 0.68
acceptance at token 1: 0.38
acceptance at token 2: 0.20
acceptance at token 3: 0.10
acceptance at token 4: 0.06

@mergify mergify bot added llama Related to Llama models speculative-decoding labels Oct 10, 2025
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
@rahul-tuli rahul-tuli marked this pull request as ready for review October 13, 2025 13:16
@rahul-tuli rahul-tuli changed the title Extend: fix from #25883 to llama_eagle.py [Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py Oct 13, 2025
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
@jmkuebler
Copy link
Copy Markdown
Contributor

Thx @rahul-tuli , the fix LGTM and solves #26402

I wouldn't expect this to correctly handle cases where the eagle head is actually quantized. For example, I beleive the ignore list would need to be written taking into account that the layers are registered as further layers in the original model etc.. But that is beyond the pressing issue.

Thanks!

@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 13, 2025
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) October 13, 2025 15:14
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@shreyas269
Copy link
Copy Markdown
Contributor

I tested these changes with quantized Eagle head. LGTM!

@robertgshaw2-redhat robertgshaw2-redhat merged commit e3b90c1 into vllm-project:main Oct 13, 2025
55 checks passed
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: 1994 <1994@users.noreply.github.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…o llama_eagle.py (vllm-project#26590)

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants