Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion vllm/attention/layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,8 +357,11 @@ def forward(

if self.use_output:
if output_shape is None:
# Handle both 2D [num_tokens, hidden] and
# 3D [num_tokens, heads, head_dim] query
num_tokens = query.shape[0]
output_shape = torch.Size(
(*query.shape[:-1], self.num_heads * self.head_size_v)
(num_tokens, self.num_heads * self.head_size_v)
)
output_shape = output_shape if output_shape is not None else query.shape
output = torch.empty(output_shape, dtype=output_dtype, device=query.device)
Expand Down
14 changes: 13 additions & 1 deletion vllm/model_executor/layers/quantization/fp8.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,19 @@ def get_fp8_moe_backend(
scope="local",
)

if envs.VLLM_USE_DEEP_GEMM and moe_use_deep_gemm and block_quant:
# Determine if we should use DeepGEMM (top-level enable switch)
# - If explicitly set by user, respect their choice
# - If not platform supports DeepGEMM, disable it
# This helps avoid warning messages on unsupported platforms.
use_deep_gemm = envs.VLLM_USE_DEEP_GEMM
if not is_deep_gemm_supported():
use_deep_gemm = False
logger.info_once(
"DeepGEMM is disabled because the platform does not support it.",
scope="local",
)
Comment on lines +188 to +193
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for checking DeepGEMM support can produce a misleading log message. If a user explicitly disables DeepGEMM by setting VLLM_USE_DEEP_GEMM=0, is_deep_gemm_supported() will return False, causing the message "DeepGEMM is disabled because the platform does not support it" to be logged. This is inaccurate because the user disabled it, not the platform.

The check should only log a message if the user intended to use DeepGEMM, but it's not supported by the platform. I've suggested a change to correct this logic and make the log message more precise.

Suggested change
if not is_deep_gemm_supported():
use_deep_gemm = False
logger.info_once(
"DeepGEMM is disabled because the platform does not support it.",
scope="local",
)
if use_deep_gemm and not is_deep_gemm_supported():
use_deep_gemm = False
logger.info_once(
"DeepGEMM was requested but is disabled because the platform does not support it.",
scope="local",
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively the next check that is performed. And the message in the next if statement is the same with the proposed one. So I think this modification is unnecessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are unrelated to the intent of the PR; why did you add this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it to a different PR if that's what you are asking. On ROCm right now the message logged during a run is that DeepGemm is requested but not found, which is not that accurate because DeepGemm is not a ROCm supported feature. So I put together this short block that renders a more precise check first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks 👍


if use_deep_gemm and moe_use_deep_gemm and block_quant:
if not has_deep_gemm():
logger.warning_once(
"DeepGEMM backend requested but not available.", scope="local"
Expand Down
Loading