[FIXBUG ] Allow disabling rocm_aiter_fa backend for ROCm GPUs not compatible with AITER by JartX · Pull Request #22795 · vllm-project/vllm

JartX · 2025-08-13T07:25:05Z

This PR fixes an issue where VLLM failed to start on ROCm GPUs that are not compatible with the rocm_aiter_fa attention backend. An example of such a GPU is the AMD Radeon RX 7900 XTX, which uses the RDNA 3 architecture.

The bug was introduced in commit 1ee5ead, which hardcoded the loading of the vllm.v1.attention.backends.rocm_aiter_fa module in vllm/v1/spec_decode/eagle.py. This forced VLLM to fail on startup before it could even select a different attention backend.

To solve this, I've added a conditional check that allows the user to explicitly enable this backend. The rocm_aiter_fa module will now only be loaded if the environment variable VLLM_ROCM_USE_AITER is set to 1.

This change ensures that:

Users with ROCm GPUs that are not compatible with the rocm_aiter_fa backend can use VLLM without any startup failures.

Users who do need this backend can still enable it manually, preserving the original functionality.

github-actions · 2025-08-13T07:25:12Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request correctly fixes a startup failure on incompatible ROCm GPUs by making the import of rocm_aiter_fa conditional based on an environment variable. The approach is sound. My review includes one high-severity suggestion to improve performance by caching the environment variable lookup, as it currently resides in a hot path.

gemini-code-assist · 2025-08-13T07:26:32Z

vllm/v1/spec_decode/eagle.py

+            if os.environ.get("VLLM_ROCM_USE_AITER") == "1":
+                from vllm.v1.attention.backends.rocm_aiter_fa import (
+                    AiterFlashAttentionMetadata)
+                allowed_types += (AiterFlashAttentionMetadata, )


Calling os.environ.get() inside the propose method can introduce performance overhead, as this method is on a hot path during inference. It's better to check the environment variable only once when the module is imported.

I recommend defining a module-level constant at the top of the file:

# At the top of vllm/v1/spec_decode/eagle.py import os _VLLM_ROCM_USE_AITER = os.environ.get("VLLM_ROCM_USE_AITER") == "1"

Then, you can use this constant here:

if _VLLM_ROCM_USE_AITER: from vllm.v1.attention.backends.rocm_aiter_fa import ( AiterFlashAttentionMetadata) allowed_types += (AiterFlashAttentionMetadata, )

This change will improve performance by avoiding repeated environment variable lookups.

JartX · 2025-08-15T17:50:39Z

Hi @russellb would you be so kind as to review this PR? Right now you can't start VLLM with ROCM and RDNA3 like 7900XTX

russellb · 2025-08-15T17:57:31Z

vllm/v1/spec_decode/eagle.py

+            if os.environ.get("VLLM_ROCM_USE_AITER") == "1":
+                from vllm.v1.attention.backends.rocm_aiter_fa import (
+                    AiterFlashAttentionMetadata)
+                allowed_types += (AiterFlashAttentionMetadata, )


See the pre-commit failures under this line

russellb · 2025-08-15T17:58:14Z

vllm/v1/spec_decode/eagle.py

-                (TritonAttentionMetadata, AiterFlashAttentionMetadata,
-                 FlashAttentionMetadata))
+            allowed_types = (TritonAttentionMetadata, FlashAttentionMetadata)
+            if os.environ.get("VLLM_ROCM_USE_AITER") == "1":


Is there any way you can make this more dynamic if it's known what device types would support this vs not?

@russellb
I think the architecture names can be used, but it will always have to be expanded. Do you know of another mechanism for this?

For example:
def _is_rocm_gpu_with_matrix_cores() -> bool:
if not torch.cuda.is_available() or not torch.version.hip:
returns False
proof:
device_properties = torch.cuda.get_device_properties(
torch.cuda.current_device())
gcn_arch_name = getattr(device_properties, "gcnArchName", "")
supported_archs = ("gfx908", "gfx90a", "gfx940", "gfx941", "gfx942")
returns any(gcn_arch_name.startswith(arch) for arch in support_archs)
except (RuntimeError, AttributeError):
returns False

@JartX
Let's cache the value of os.environ.get as it's overhead is large, similar to
#17067

And alternative approach is to check if aiter is installed using from importlib.util import find_spec. However, this is also a very costly operation, it should be only called once when a class is initialized of a file is import.

@tjtanaa Many thanks for your answer the other way :) 47f9141

Using fallback

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX · 2025-08-16T13:17:24Z

Hi @russellb new way using fallback :
47f9141

yewentao256

LGTM, thanks for the work!

hongxiayang · 2025-08-16T18:44:26Z

cc @tjtanaa

JartX · 2025-08-17T06:14:03Z

@tjtanaa many thanks for your time,47f9141

The refactor using fallback :)

tjtanaa · 2025-08-17T11:17:13Z

@JartX Maybe let's do this instead. we store the allowed_types in the EagleProposer class as I have wrote simple script to time the overhead. It seems it is quite high, as this cost is incurred every decode step. Usually we are decoding for a few thousand tokens like in thinking mode. So the cost will be multiplied by thousand-fold per request.

======================================================================
IMPORT TRY-EXCEPT OVERHEAD BENCHMARK
======================================================================
SUCCESS CASE (module exists):
Samples: 50,000
Mean: 8.938 μs
Median: 0.000 μs
Min: 0.000 μs
Max: 200.001 μs
90th %ile: 0.000 μs
95th %ile: 100.000 μs
99th %ile: 100.001 μs
Std Dev: 28.655 μs

FAILURE CASE (module missing):
Samples: 50,000
Mean: 62.346 μs
Median: 99.999 μs
Min: 0.000 μs
Max: 1100.000 μs
90th %ile: 100.001 μs
95th %ile: 100.001 μs
99th %ile: 200.001 μs
Std Dev: 59.740 μs

BASELINE (no try-except):
Samples: 50,000
Mean: 0.270 μs
Median: 0.000 μs
Min: 0.000 μs
Max: 1000.000 μs
90th %ile: 0.000 μs
95th %ile: 0.000 μs
99th %ile: 0.000 μs
Std Dev: 6.703 μs

Proposed solution

from importlib.util import find_spec

class EagleProposer:

    def __init__(
        self,
        vllm_config: VllmConfig,
        device: torch.device,
        runner=None,
    ):
    
    ...
    self.allowed_attn_types = ()
    if current_platform.is_rocm():
        self.allowed_attn_types += (TritonAttentionMetadata, FlashAttentionMetadata)
        
        if find_spec("aiter"):
                from vllm.v1.attention.backends.rocm_aiter_fa import (
                    AiterFlashAttentionMetadata)
                self.allowed_attn_types += (AiterFlashAttentionMetadata, )
    else:
         self.allowed_attn_types = (FlashAttentionMetadata, TreeAttentionMetadata)
    ...


    def propose(
        self,
        # [num_tokens]
        target_token_ids: torch.Tensor,
        # [num_tokens]
        target_positions: torch.Tensor,
        # [num_tokens, hidden_size]
        target_hidden_states: torch.Tensor,
        # [batch_size]
        next_token_ids: torch.Tensor,
        common_attn_metadata: CommonAttentionMetadata,
        sampling_metadata: SamplingMetadata,
        mm_embeds: Optional[list[torch.Tensor]] = None,
    ) -> torch.Tensor:

    ...

    assert isinstance(attn_metadata, self.allowed_attn_types)

    ...

tjtanaa · 2025-08-19T08:18:49Z

@JartX

vllm/vllm/v1/attention/backends/rocm_aiter_fa.py

Line 20 in 08d5f71

if current_platform.is_rocm():

...
from vllm.platforms.rocm import on_mi3xx
...
if current_platform.is_rocm() and find_spec("aiter") and on_mi3xx:
...

Then we can revert all the changes from the eagle.py

This also handles the case where the aiter is installed, and not supported.
You will not need to modify the Dockerfile.rocm in this case.

JartX · 2025-08-19T18:58:36Z

Hi @tjtanaa bad news Crash in other point after apply the last recomendation

vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559] WorkerProc failed to start.
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559] Traceback (most recent call last):
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 533, in worker_main
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     worker = WorkerProc(*args, **kwargs)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 379, in __init__
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     wrapper.init_worker(all_kwargs)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 556, in init_worker
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     worker_class = resolve_obj_by_qualname(
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]                    ^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2568, in resolve_obj_by_qualname
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     module = importlib.import_module(module_name)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     return _bootstrap._gcd_import(name[level:], package, level)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap_external>", line 999, in exec_module
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 33, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from vllm.v1.worker.gpu_model_runner import GPUModelRunner
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 75, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from vllm.v1.spec_decode.eagle import ```
EagleProposer
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/eagle.py", line 23, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from vllm.v1.attention.backends.rocm_aiter_fa import (
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/rocm_aiter_fa.py", line 23, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     import aiter
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/__init__.py", line 43, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from .ops.quant import *
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/ops/quant.py", line 12, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from ..utility import dtypes, fp4_utils
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/utility/dtypes.py", line 18, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     fp8 = get_dtype_fp8()
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]           ^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/utility/dtypes.py", line 13, in get_dtype_fp8
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     return defaultDtypes[get_gfx()]["fp8"]
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]            ~~~~~~~~~~~~~^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559] KeyError: 'gfx1100'

vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700] EngineCore failed to start.
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700] Traceback (most recent call last):
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     engine_core = EngineCoreProc(*args, **kwargs)
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 492, in __init__
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     super().__init__(vllm_config, executor_class, log_stats,
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 80, in __init__
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     self.model_executor = executor_class(vllm_config)
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     self._init_executor()
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 96, in _init_executor
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     self.workers = WorkerProc.wait_for_ready(unready_workers)
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 472, in wait_for_ready
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     raise e from None
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.

I would say that this error comes from another point. Are you sure we can't choose either of the two solutions verified above?
I have verified, at least for me, that the most solid is the one he proposed to carry out the checks in the init itself, because you do not have to touch the attention or update documentation
Thank you very much for your time

JartX · 2025-08-19T22:06:36Z

@tjtanaa think that now is the better way:

from importlib.util import find_spec

class EagleProposer:

    def __init__(
        self,
        vllm_config: VllmConfig,
        device: torch.device,
        runner=None,
    ):
    
    ...
    self.allowed_attn_types = ()
    if current_platform.is_rocm():
        self.allowed_attn_types += (TritonAttentionMetadata, FlashAttentionMetadata)
        
        if find_spec("aiter"):
                from vllm.v1.attention.backends.rocm_aiter_fa import (
                    AiterFlashAttentionMetadata)
                self.allowed_attn_types += (AiterFlashAttentionMetadata, )
    else:
         self.allowed_attn_types = (FlashAttentionMetadata, TreeAttentionMetadata)
    ...


    def propose(
        self,
        # [num_tokens]
        target_token_ids: torch.Tensor,
        # [num_tokens]
        target_positions: torch.Tensor,
        # [num_tokens, hidden_size]
        target_hidden_states: torch.Tensor,
        # [batch_size]
        next_token_ids: torch.Tensor,
        common_attn_metadata: CommonAttentionMetadata,
        sampling_metadata: SamplingMetadata,
        mm_embeds: Optional[list[torch.Tensor]] = None,
    ) -> torch.Tensor:

    ...

    assert isinstance(attn_metadata, self.allowed_attn_types)

    ...

Everything goes smoothly and works like a cream.

tjtanaa · 2025-08-20T02:35:47Z

vllm/v1/spec_decode/eagle.py


        if self.use_cuda_graph and \
-            batch_size <= self.cudagraph_batch_sizes[-1]:
+                batch_size <= self.cudagraph_batch_sizes[-1]:


NITs, can you revert all of the unrelated changes?

hi! @tjtanaa
These changes were included so I could pass the precommit. I've been trying to contribute to the project for a short time, and @mgoin told me that precommit normally had to be used:

https://marketplace.visualstudio.com/items?itemName=elagil.pre-commit-helper

https://github.com/pre-commit/pre-commit

https://github.com/vllm-project/vllm/blob/main/.github/workflows/pre-commit.yml

So that it would correctly format the file after the changes.

Sorry if this bothered you. Thank you very much for your time and dedication.

If you find that I have it configured incorrectly, please don't hesitate to let me know.

P.S.: If I remove the spaces and use precommit check again, I get an error, so I have to use the fix. It then adds the spaces back and leaves everything ok.

tjtanaa · 2025-08-20T02:37:17Z

@JartX Can you revert all of the unrelated changes? Those changes in indentation and spaces?
Else LGTM.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa · 2025-08-20T10:43:02Z

vllm/v1/spec_decode/eagle.py

-            assert isinstance(attn_metadata, FlashAttentionMetadata)
+        # The mypy errors are caused because mypy cannot infer the type of
+        # attn_metadata. We add this assert to help mypy.
+        assert isinstance(attn_metadata, FlashAttentionMetadata)


@JartX I tested using other backend. This will cause issue as

FlashAttentionMetadata is not a generic class.

TreeAttentionMetadata, AiterFlashAttentionMetadata, TritonAttentionMetadata and FlashAttentionMetadata are 4 different instances.

I have opened a PR into your branch JartX#1 . It is a mypy fix through Protocol class.

Merged. @tjtanaa Thank you very much for helping me with the development and testing. I have very limited hardware and am assimilating the work on VLLM.

Thank you for your work on this PR as well @JartX 🥂

…aiter [Bugfix] Fix mypy error with Protocol

DarkLight1337

LGTM if tests pass, thanks!

…patible with AITER (vllm-project#22795) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

…patible with AITER (vllm-project#22795) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>

…patible with AITER (vllm-project#22795) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…patible with AITER (vllm-project#22795) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix(bug )disable forced aiter on spec eagle with rocm

5eea6b2

JartX requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 13, 2025 07:25

mergify bot added rocm Related to AMD ROCm speculative-decoding v1 labels Aug 13, 2025

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

russellb reviewed Aug 15, 2025

View reviewed changes

JartX requested review from mgoin, tlrmchlsmth and yewentao256 as code owners August 16, 2025 11:17

JartX force-pushed the fix/disable_force_spec_eagle_rocm_aiter branch 3 times, most recently from 625860e to 9bc9f67 Compare August 16, 2025 11:35

update precommit

d23a403

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX force-pushed the fix/disable_force_spec_eagle_rocm_aiter branch from 9bc9f67 to d23a403 Compare August 16, 2025 11:36

JartX added 3 commits August 16, 2025 13:50

refactor(rocm): Dynamically detect Aiter attention backend

47f9141

Signed-off-by: JartX <sagformas@epdcenter.es>

update precommit

48f239e

Signed-off-by: JartX <sagformas@epdcenter.es>

update type for precommit

2b4967b

Signed-off-by: JartX <sagformas@epdcenter.es>

yewentao256 approved these changes Aug 16, 2025

View reviewed changes

tjtanaa reviewed Aug 20, 2025

View reviewed changes

fix mypy error with protocol

83d3e6a

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa reviewed Aug 20, 2025

View reviewed changes

Merge pull request #1 from tjtanaa/fix/disable_force_spec_eagle_rocm_…

33c5c34

…aiter [Bugfix] Fix mypy error with Protocol

DarkLight1337 approved these changes Aug 20, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 20, 2025 12:32

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 20, 2025

vllm-bot merged commit 3b11b26 into vllm-project:main Aug 20, 2025
45 of 47 checks passed

JartX deleted the fix/disable_force_spec_eagle_rocm_aiter branch August 20, 2025 16:15

Uh oh!

Conversation

JartX commented Aug 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JartX commented Aug 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JartX commented Aug 16, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

hongxiayang commented Aug 16, 2025

Uh oh!

JartX commented Aug 17, 2025

Uh oh!

tjtanaa commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaa commented Aug 19, 2025

Uh oh!

JartX commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JartX commented Aug 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Aug 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JartX Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

JartX commented Aug 13, 2025 •

edited by github-actions bot

Loading

tjtanaa Aug 17, 2025 •

edited

Loading

tjtanaa commented Aug 17, 2025 •

edited

Loading

JartX commented Aug 19, 2025 •

edited

Loading

JartX Aug 20, 2025 •

edited

Loading