[Bugfix][ROCm] Fix Unsupported attention metadata type for speculative decoding in `eagle.py` by vllmellm · Pull Request #31714 · vllm-project/vllm

vllmellm · 2026-01-05T09:09:20Z

Purpose

this PR fixes the issue mentioned here #30811 (comment)

Test Plan

Only testing VLLM_ATTENTION_BACKEND=ROCM_ATTN

run vllm serve:
VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=0 VLLM_ATTENTION_BACKEND=ROCM_ATTN vllm serve meta-llama/Llama-3.3-70B-Instruct -tp 4 --speculative-config '{"model": "yuhuili/EAGLE3-LLaMA3.3-Instruct-70B", "num_speculative_tokens": 3, "method":"eagle3", "draft_tensor_parallel_size":1}'
run lm_eval
lm_eval --model local-completions --tasks gsm8k --model_args model=meta-llama/Llama-3.3-70B-Instruct,base_url=http://localhost:8000/v1/completions --trust_remote_code --num_fewshot 5 --batch_size 128

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9333	±	0.0144
		strict-match	5	exact_match	↑	0.9033	±	0.0171

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

gemini-code-assist

Code Review

This pull request addresses a ValueError that occurs during speculative decoding on ROCm when using the ROCM_ATTN backend. The fix correctly includes RocmAttentionMetadata in the list of allowed attention metadata types. My review includes a suggestion to make this fix more comprehensive by also adding support for the ROCM_AITER_UNIFIED_ATTN backend, which appears to have been overlooked and would cause a similar issue.

gemini-code-assist · 2026-01-05T09:10:47Z

vllm/v1/spec_decode/eagle.py

+            from vllm.v1.attention.backends.rocm_attn import RocmAttentionMetadata
+
+            rocm_types = [
+                TritonAttentionMetadata,
+                FlashAttentionMetadata,
+                RocmAttentionMetadata,
+            ]


While this change correctly adds support for RocmAttentionMetadata, it appears that another ROCm attention backend, ROCM_AITER_UNIFIED_ATTN, might have been missed. This backend likely uses RocmAiterUnifiedAttentionMetadata, which is not included in rocm_types and could lead to a similar ValueError during speculative decoding. To ensure a more comprehensive fix, I recommend adding RocmAiterUnifiedAttentionMetadata to the list of allowed types.

from vllm.v1.attention.backends.rocm_attn import RocmAttentionMetadata from vllm.v1.attention.backends.rocm_aiter_unified_attn import RocmAiterUnifiedAttentionMetadata rocm_types = [ TritonAttentionMetadata, FlashAttentionMetadata, RocmAttentionMetadata, RocmAiterUnifiedAttentionMetadata, ]

RocmAiterUnifiedAttentionMetadata does not exist. RocmAiterUnifiedAttentionBackend shares the same metadata as RocmAttentionMetadata and the lm_eval scores are as follow:

Tasks Version Filter n-shot Metric Value Stderr

gsm8k 3 flexible-extract 5 exact_match ↑ 0.9333 ± 0.0144

strict-match 5 exact_match ↑ 0.9133 ± 0.0163

run vllm serve command:

VLLM_USE_V1=1 \ VLLM_ROCM_USE_AITER=1 VLLM_ATTENTION_BACKEND=ROCM_AITER_UNIFIED_ATTN vllm serve meta-llama/Llama-3.3-70B-Instruct -tp 4 --speculative-config '{"model": "yuhuili/EAGLE3-LLaMA3.3-Instruct-70B", "num_speculative_tokens": 3, "method":"eagle3", "draft_tensor_parallel_size":1}'

tjtanaa · 2026-01-05T09:11:34Z

vllm/v1/spec_decode/eagle.py

+
+            rocm_types = [
+                TritonAttentionMetadata,
+                FlashAttentionMetadata,


@vllmellm is FlashAttentionMetadata still relevant for ROCm platform?

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

tjtanaa

LGTM. Thanks for cleaning up the conditions

…e decoding in `eagle.py` (vllm-project#31714) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

…e decoding in `eagle.py` (vllm-project#31714) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…e decoding in `eagle.py` (vllm-project#31714) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

add ROCm attention metadata backend in eagle

d44b0ac

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

vllmellm requested review from benchislett and luccafong as code owners January 5, 2026 09:09

mergify bot added rocm Related to AMD ROCm speculative-decoding v1 labels Jan 5, 2026

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

tjtanaa reviewed Jan 5, 2026

View reviewed changes

remove unused metadata for rocm platform

8b36f2e

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

tjtanaa approved these changes Jan 5, 2026

View reviewed changes

Merge branch 'main' into fix-rocm-eagle

f7eef1b

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026

tjtanaa enabled auto-merge (squash) January 6, 2026 04:29

Merge branch 'main' into fix-rocm-eagle

a389f8d

tjtanaa merged commit e971780 into vllm-project:main Jan 6, 2026
45 checks passed

LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026

[Bugfix][ROCm] Fix Unsupported attention metadata type for speculativ…

f1714b5

…e decoding in `eagle.py` (vllm-project#31714) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[Bugfix][ROCm] Fix Unsupported attention metadata type for speculativ…

6dbc668

…e decoding in `eagle.py` (vllm-project#31714) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Bugfix][ROCm] Fix Unsupported attention metadata type for speculativ…

94190e1

…e decoding in `eagle.py` (vllm-project#31714) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix][ROCm] Fix Unsupported attention metadata type for speculativ…

11707db

…e decoding in `eagle.py` (vllm-project#31714) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][ROCm] Fix Unsupported attention metadata type for speculative decoding in `eagle.py`#31714

[Bugfix][ROCm] Fix Unsupported attention metadata type for speculative decoding in `eagle.py`#31714
tjtanaa merged 4 commits intovllm-project:mainfrom
EmbeddedLLM:fix-rocm-eagle

vllmellm commented Jan 5, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

vllmellm Jan 5, 2026 •

edited

Loading

Uh oh!

tjtanaa Jan 5, 2026

Uh oh!

vllmellm Jan 5, 2026

Uh oh!

tjtanaa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vllmellm commented Jan 5, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

vllmellm Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

vllmellm Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vllmellm commented Jan 5, 2026 •

edited by github-actions bot

Loading

vllmellm Jan 5, 2026 •

edited

Loading