[Bugfix][Hardware][AMD] Fix hardcoded device in AITER MLA and Fused MOE by c0de128 · Pull Request #31729 · vllm-project/vllm

c0de128 · 2026-01-05T15:04:46Z

Summary

Fix hardcoded device="cuda" in AITER MLA sparse attention and Fused MOE initialization code. This ensures tensors are created on the correct device in multi-GPU setups and improves ROCm compatibility.

Changes

1. `vllm/attention/ops/rocm_aiter_mla_sparse.py`

Replace 5 instances of device="cuda" with device=q.device or device=q_fp8.device:

Lines 46, 49: Use q.device in fp8_mqa_logits_torch()
Lines 127, 135: Use q.device in fp8_paged_mqa_logits_torch()
Line 194: Use q_fp8.device in rocm_fp8_paged_mqa_logits()

2. `vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py`

Add device parameter to init_aiter_topK_meta_data() function:

Replace 3 instances of device="cuda" with device=device parameter
Default value device="cuda" maintains backward compatibility

3. `vllm/model_executor/layers/fused_moe/layer.py`

Update call site to pass current_platform.device_type to the init function.

Test Plan

Code inspection verified correct device propagation
Linting passed (ruff format, ruff check)
The fix follows existing vLLM patterns for device handling

cc @hongxiayang @tjtanaa

Note

Ensures tensors are allocated on the correct device (CUDA/ROCm) instead of being hardcoded to CUDA, improving multi-GPU correctness and ROCm compatibility.

In vllm/v1/attention/ops/rocm_aiter_mla_sparse.py, uses q.device/q_fp8.device for torch.arange, tensor fills, and logits buffers.
In vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py, adds device param to init_aiter_topK_meta_data() and switches tensor allocations to device=device.
In vllm/model_executor/layers/fused_moe/layer.py, passes current_platform.device_type to init_aiter_topK_meta_data() for shared-expert TopK buffers.

^{Written by Cursor Bugbot for commit d87dadc. This will update automatically on new commits. Configure here.}

c0de128 · 2026-01-05T15:04:51Z

/ci-run

gemini-code-assist

Code Review

This pull request correctly addresses the issue of hardcoded device="cuda" in AITER MLA and Fused MOE components. The changes are well-implemented, replacing the hardcoded device with dynamic device information from existing tensors or function parameters. This is a crucial fix for improving ROCm compatibility and ensuring correctness in multi-GPU setups. The approach of adding a device parameter with a backward-compatible default is sound. Overall, this is a solid bugfix that improves the robustness and portability of the code.

c0de128 · 2026-01-05T16:45:55Z

Hi @tjtanaa, AMD CI passed (#2374). This fixes hardcoded device parameters in AITER MLA and Fused MOE code. Would appreciate a review when you have time. Thanks!

mergify · 2026-01-09T21:17:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @c0de128.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Replace hardcoded device="cuda" with dynamic device inference from input tensors or platform configuration: 1. rocm_aiter_mla_sparse.py: Use q.device or q_fp8.device (5 instances) 2. rocm_aiter_fused_moe.py: Add device parameter to init_aiter_topK_meta_data 3. layer.py: Pass current_platform.device_type to init function This ensures tensors are created on the correct device in multi-GPU setups and improves ROCm compatibility. Signed-off-by: c0de128 <kevin.mckay@outlook.com>

c0de128 · 2026-01-09T23:28:32Z

/buildkite run

c0de128 · 2026-01-12T23:27:55Z

Closing this PR to reduce maintainer review burden. The fix is available in this branch if needed in the future. Thank you for your time!

c0de128 requested review from mgoin, pavanimajety and tjtanaa as code owners January 5, 2026 15:04

mergify bot added the rocm Related to AMD ROCm label Jan 5, 2026

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

c0de128 mentioned this pull request Jan 8, 2026

[Bugfix][Hardware][AMD] Fix FP8 support detection on gfx11x architectures #31184

Closed

mergify bot added the needs-rebase label Jan 9, 2026

c0de128 force-pushed the fix-aiter-device-hardcoding branch from 9f73123 to d87dadc Compare January 9, 2026 23:28

mergify bot added v1 and removed needs-rebase labels Jan 9, 2026

c0de128 closed this Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Hardware][AMD] Fix hardcoded device in AITER MLA and Fused MOE#31729

[Bugfix][Hardware][AMD] Fix hardcoded device in AITER MLA and Fused MOE#31729
c0de128 wants to merge 1 commit intovllm-project:mainfrom
c0de128:fix-aiter-device-hardcoding

c0de128 commented Jan 5, 2026 •

edited by github-actions bot

Loading

Uh oh!

c0de128 commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

c0de128 commented Jan 5, 2026

Uh oh!

mergify bot commented Jan 9, 2026

Uh oh!

c0de128 commented Jan 9, 2026

Uh oh!

c0de128 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

c0de128 commented Jan 5, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. vllm/attention/ops/rocm_aiter_mla_sparse.py

2. vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py

3. vllm/model_executor/layers/fused_moe/layer.py

Test Plan

Uh oh!

c0de128 commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

c0de128 commented Jan 5, 2026

Uh oh!

mergify bot commented Jan 9, 2026

Uh oh!

c0de128 commented Jan 9, 2026

Uh oh!

c0de128 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

c0de128 commented Jan 5, 2026 •

edited by github-actions bot

Loading

1. `vllm/attention/ops/rocm_aiter_mla_sparse.py`

2. `vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py`

3. `vllm/model_executor/layers/fused_moe/layer.py`