[Bugfix][Hardware][AMD] Fix device parameter and exception handling by c0de128 · Pull Request #31552 · vllm-project/vllm

c0de128 · 2025-12-31T00:00:41Z

Summary

Fix two ROCm-related issues:

1. Fusion Helper Functions (`vllm/compilation/fusion.py`)

Bug: Hardcoded device="cuda" in helper functions prevents explicit device selection.

# Before (hardcoded):
def empty_bf16(*args, **kwargs):
    return torch.empty(*args, **kwargs, dtype=torch.bfloat16, device="cuda")

# After (flexible):
def empty_bf16(*args, device="cuda", **kwargs):
    return torch.empty(*args, **kwargs, dtype=torch.bfloat16, device=device)

This allows explicit device selection in multi-GPU scenarios while maintaining backward compatibility with the default.

2. AITER Import Exception Handling (`scaled_mm/aiter.py`)

Bug: Used broad except Exception: which masks unexpected errors like driver issues, OOM, etc.

# Before: except Exception:
# After: except (ImportError, ModuleNotFoundError):

This ensures only import-related errors are caught, allowing other errors to propagate for debugging.

Test Plan

Verify inductor compilation works with explicit device selection
Verify AITER import detection still works correctly
CI validation on AMD hardware

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces two important bug fixes for ROCm/AMD hardware support. First, it correctly parameterizes the device in several tensor creation helper functions in vllm/compilation/fusion.py, removing a hardcoded "cuda" value and allowing for explicit device selection. Second, it improves exception handling in vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter.py by replacing a broad except Exception with a more specific except (ImportError, ModuleNotFoundError), which prevents masking unrelated errors. The changes are well-implemented and improve the robustness and cross-platform compatibility of the codebase. I've suggested a further improvement in vllm/compilation/fusion.py to make the default device platform-aware, which will enhance support for other hardware backends like XPU.

gemini-code-assist · 2025-12-31T00:02:04Z

vllm/compilation/fusion.py

+def empty_bf16(*args, device="cuda", **kwargs):
+    return torch.empty(*args, **kwargs, dtype=torch.bfloat16, device=device)


To improve platform-agnosticism and ensure correctness on non-CUDA/ROCm devices (like XPU), it's better to use a dynamic default for the device. Using current_platform.device_type will correctly select the device type for the active platform.

Suggested change

def empty_bf16(*args, device="cuda", **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.bfloat16, device=device)

def empty_bf16(*args, device=current_platform.device_type, **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.bfloat16, device=device)

gemini-code-assist · 2025-12-31T00:02:04Z

vllm/compilation/fusion.py

+def empty_fp32(*args, device="cuda", **kwargs):
+    return torch.empty(*args, **kwargs, dtype=torch.float32, device=device)


Similar to the empty_bf16 function, using current_platform.device_type as the default device will make this helper more robust across different hardware platforms.

Suggested change

def empty_fp32(*args, device="cuda", **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.float32, device=device)

def empty_fp32(*args, device=current_platform.device_type, **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.float32, device=device)

gemini-code-assist · 2025-12-31T00:02:04Z

vllm/compilation/fusion.py

+def empty_i32(*args, device="cuda", **kwargs):
+    return torch.empty(*args, **kwargs, dtype=torch.int32, device=device)


To maintain consistency and improve platform support, please update the default device to current_platform.device_type.

Suggested change

def empty_i32(*args, device="cuda", **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.int32, device=device)

def empty_i32(*args, device=current_platform.device_type, **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.int32, device=device)

gemini-code-assist · 2025-12-31T00:02:04Z

vllm/compilation/fusion.py

+def empty_i64(*args, device="cuda", **kwargs):
+    return torch.empty(*args, **kwargs, dtype=torch.int64, device=device)


Finally, please update this function to use current_platform.device_type for the default device to ensure consistent, platform-agnostic behavior.

Suggested change

def empty_i64(*args, device="cuda", **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.int64, device=device)

def empty_i64(*args, device=current_platform.device_type, **kwargs):

return torch.empty(*args, **kwargs, dtype=torch.int64, device=device)

Fix two ROCm-related issues: 1. fusion.py helper functions (vllm/compilation/fusion.py): - Bug: Hardcoded device="cuda" in empty_bf16, empty_fp32, etc. - Fix: Add device parameter with "cuda" default for flexibility - This allows explicit device selection in multi-GPU scenarios 2. AITER import exception handling (scaled_mm/aiter.py): - Bug: Used broad `except Exception:` which masks unexpected errors - Fix: Use specific `except (ImportError, ModuleNotFoundError):` - This prevents masking driver errors, OOM, etc. during imports Signed-off-by: c0de128 <kevin.mckay@outlook.com>

c0de128 · 2025-12-31T00:16:56Z

Applied Gemini's suggestion - now using current_platform.device_type instead of hardcoded "cuda" for better platform-agnosticism (supports XPU, etc.).

c0de128 · 2025-12-31T00:40:48Z

📊 Hardware Verification (MI300X)

Verified on AMD Instinct MI300X VF (gfx942, ROCm 6.2).

Changes Applied:

fusion.py helpers - Now use current_platform.device_type instead of hardcoded "cuda" (per Gemini's suggestion for better platform portability)
AITER import - Narrowed exception from Exception to (ImportError, ModuleNotFoundError)

Validation Results:

=== Platform Device Type Resolution ===
current_platform.device_type = "cuda"
Tensor created on: cuda:0
Tensor dtype: torch.bfloat16
PASS: Dynamic device_type works correctly

=== Import Exception Handling ===
Caught specific exception: ModuleNotFoundError
PASS: Narrow exception handling works

=== GPU Compute Verification ===
Matrix multiply (1000x1000 BF16): PASS
Result device: cuda:0

Platform Portability: The current_platform.device_type approach ensures this code works correctly on CUDA, ROCm (HIP), and XPU environments without modification.

Device Info:

GPU: AMD Instinct MI300X VF
Architecture: gfx942:sramecc+:xnack-
ROCm: 6.2.41133
PyTorch: 2.5.1+rocm6.2

c0de128 · 2026-01-12T23:27:51Z

Closing this PR to reduce maintainer review burden. The fix is available in this branch if needed in the future. Thank you for your time!

c0de128 requested review from ProExpertProg, tjtanaa, youkaichao and zou3519 as code owners December 31, 2025 00:00

mergify bot added the rocm Related to AMD ROCm label Dec 31, 2025

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

c0de128 force-pushed the fix/rocm-fusion-device-and-aiter-exception branch from e83c89b to 023eb38 Compare December 31, 2025 00:16

c0de128 mentioned this pull request Jan 8, 2026

[Bugfix][Hardware][AMD] Fix FP8 support detection on gfx11x architectures #31184

Closed

c0de128 closed this Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Hardware][AMD] Fix device parameter and exception handling#31552

[Bugfix][Hardware][AMD] Fix device parameter and exception handling#31552
c0de128 wants to merge 1 commit intovllm-project:mainfrom
c0de128:fix/rocm-fusion-device-and-aiter-exception

c0de128 commented Dec 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

c0de128 commented Dec 31, 2025

Uh oh!

c0de128 commented Dec 31, 2025

Uh oh!

c0de128 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		def empty_bf16(args, device="cuda", *kwargs):
		return torch.empty(args, *kwargs, dtype=torch.bfloat16, device=device)

		def empty_fp32(args, device="cuda", *kwargs):
		return torch.empty(args, *kwargs, dtype=torch.float32, device=device)

		def empty_i32(args, device="cuda", *kwargs):
		return torch.empty(args, *kwargs, dtype=torch.int32, device=device)

		def empty_i64(args, device="cuda", *kwargs):
		return torch.empty(args, *kwargs, dtype=torch.int64, device=device)

Uh oh!

Conversation

c0de128 commented Dec 31, 2025

Summary

1. Fusion Helper Functions (vllm/compilation/fusion.py)

2. AITER Import Exception Handling (scaled_mm/aiter.py)

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

c0de128 commented Dec 31, 2025

Uh oh!

c0de128 commented Dec 31, 2025

📊 Hardware Verification (MI300X)

Uh oh!

c0de128 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Fusion Helper Functions (`vllm/compilation/fusion.py`)

2. AITER Import Exception Handling (`scaled_mm/aiter.py`)