-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs #31596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -180,7 +180,19 @@ def get_fp8_moe_backend( | |
| scope="local", | ||
| ) | ||
|
|
||
| if envs.VLLM_USE_DEEP_GEMM and moe_use_deep_gemm and block_quant: | ||
| # Determine if we should use DeepGEMM (top-level enable switch) | ||
| # - If explicitly set by user, respect their choice | ||
| # - If not platform supports DeepGEMM, disable it | ||
| # This helps avoid warning messages on unsupported platforms. | ||
| use_deep_gemm = envs.VLLM_USE_DEEP_GEMM | ||
| if not is_deep_gemm_supported(): | ||
| use_deep_gemm = False | ||
| logger.info_once( | ||
| "DeepGEMM is disabled because the platform does not support it.", | ||
| scope="local", | ||
| ) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These changes are unrelated to the intent of the PR; why did you add this?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can move it to a different PR if that's what you are asking. On ROCm right now the message logged during a run is that DeepGemm is requested but not found, which is not that accurate because DeepGemm is not a ROCm supported feature. So I put together this short block that renders a more precise check first.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. got it, thanks 👍 |
||
|
|
||
| if use_deep_gemm and moe_use_deep_gemm and block_quant: | ||
| if not has_deep_gemm(): | ||
| logger.warning_once( | ||
| "DeepGEMM backend requested but not available.", scope="local" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for checking DeepGEMM support can produce a misleading log message. If a user explicitly disables DeepGEMM by setting
VLLM_USE_DEEP_GEMM=0,is_deep_gemm_supported()will returnFalse, causing the message "DeepGEMM is disabled because the platform does not support it" to be logged. This is inaccurate because the user disabled it, not the platform.The check should only log a message if the user intended to use DeepGEMM, but it's not supported by the platform. I've suggested a change to correct this logic and make the log message more precise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively the next check that is performed. And the message in the next if statement is the same with the proposed one. So I think this modification is unnecessary.