Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions vllm/v1/spec_decode/eagle.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
from vllm.platforms import current_platform
from vllm.triton_utils import triton
from vllm.utils.platform_utils import is_pin_memory_available
from vllm.v1.attention.backends.flash_attn import FlashAttentionMetadata
from vllm.v1.attention.backends.tree_attn import (
TreeAttentionMetadata,
TreeAttentionMetadataBuilder,
Expand Down Expand Up @@ -167,7 +166,12 @@ def __init__(
# Determine allowed attention backends once during initialization.
self.allowed_attn_types: tuple | None = None
if current_platform.is_rocm():
rocm_types = [TritonAttentionMetadata, FlashAttentionMetadata]
from vllm.v1.attention.backends.rocm_attn import RocmAttentionMetadata

rocm_types = [
TritonAttentionMetadata,
RocmAttentionMetadata,
]
Comment on lines +169 to +174
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change correctly adds support for RocmAttentionMetadata, it appears that another ROCm attention backend, ROCM_AITER_UNIFIED_ATTN, might have been missed. This backend likely uses RocmAiterUnifiedAttentionMetadata, which is not included in rocm_types and could lead to a similar ValueError during speculative decoding. To ensure a more comprehensive fix, I recommend adding RocmAiterUnifiedAttentionMetadata to the list of allowed types.

            from vllm.v1.attention.backends.rocm_attn import RocmAttentionMetadata
            from vllm.v1.attention.backends.rocm_aiter_unified_attn import RocmAiterUnifiedAttentionMetadata

            rocm_types = [
                TritonAttentionMetadata,
                FlashAttentionMetadata,
                RocmAttentionMetadata,
                RocmAiterUnifiedAttentionMetadata,
            ]

Copy link
Copy Markdown
Contributor Author

@vllmellm vllmellm Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RocmAiterUnifiedAttentionMetadata does not exist. RocmAiterUnifiedAttentionBackend shares the same metadata as RocmAttentionMetadata and the lm_eval scores are as follow:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9333 ± 0.0144
strict-match 5 exact_match 0.9133 ± 0.0163

run vllm serve command:

VLLM_USE_V1=1 \ VLLM_ROCM_USE_AITER=1 VLLM_ATTENTION_BACKEND=ROCM_AITER_UNIFIED_ATTN vllm serve meta-llama/Llama-3.3-70B-Instruct -tp 4 --speculative-config '{"model": "yuhuili/EAGLE3-LLaMA3.3-Instruct-70B", "num_speculative_tokens": 3, "method":"eagle3", "draft_tensor_parallel_size":1}'

# ROCM_AITER_FA is an optional backend
if find_spec(
AttentionBackendEnum.ROCM_AITER_FA.get_path(include_classname=False)
Expand Down