[Bugfix] Fix DeepGEMM after #29546 #30267
Merged
yeqcharlotte merged 4 commits intovllm-project:mainfrom Dec 9, 2025
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request aims to fix an issue with DeepGEMM by using a configured flag self.use_deep_gemm_e8m0 instead of a hardcoded True for the use_ue8m0 parameter. While this change is a step in the right direction, it exposes a critical logic flaw where the DeepGEMM path can be taken even when E8M0 is not enabled, which will lead to an assertion failure. I've added a comment with details on how to address this.
yewentao256
reviewed
Dec 8, 2025
yewentao256
reviewed
Dec 8, 2025
f4a942c to
2edfcfc
Compare
yewentao256
reviewed
Dec 8, 2025
Member
yewentao256
left a comment
There was a problem hiding this comment.
LGTM, thanks for the work!
One small update before landed
| @@ -269,11 +270,14 @@ def _run_deepgemm( | |||
| weight_scale: torch.Tensor, | |||
| ) -> torch.Tensor: | |||
| assert self.deepgemm_input_quant_op is not None | |||
Member
There was a problem hiding this comment.
Suggested change
| assert self.deepgemm_input_quant_op is not None |
yewentao256
approved these changes
Dec 8, 2025
5 tasks
1 task
Collaborator
Author
|
This PR is causing Dynamo compilation issues, fixing in #30336 |
dsuhinin
pushed a commit
to dsuhinin/vllm
that referenced
this pull request
Jan 21, 2026
Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: Zhewen Li <zhewenli@meta.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Fixed hopper incompatibility after #29546:
_run_deepgemmmethod was changed to always useper_token_group_quant_fp8_packed_for_deepgemm, which produces packed int32 scales, and the format is only supported on Blackwell:Test Plan
Passed CI: https://buildkite.com/vllm/ci/builds/42493#019aff0f-8d6e-4499-907f-010fc5594fc7