Skip to content

[Bugfix] Fix DeepGEMM after #29546 #30267

Merged
yeqcharlotte merged 4 commits intovllm-project:mainfrom
zhewenl:fix-ue8m0-hopper
Dec 9, 2025
Merged

[Bugfix] Fix DeepGEMM after #29546 #30267
yeqcharlotte merged 4 commits intovllm-project:mainfrom
zhewenl:fix-ue8m0-hopper

Conversation

@zhewenl
Copy link
Copy Markdown
Collaborator

@zhewenl zhewenl commented Dec 8, 2025

Purpose

Fixed hopper incompatibility after #29546: _run_deepgemm method was changed to always use
per_token_group_quant_fp8_packed_for_deepgemm, which produces packed int32 scales, and the format is only supported on Blackwell:


2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]     return self.w8a8_block_fp8_linear.apply(
-- | --
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 255, in apply
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]     output = self._run_deepgemm(input_2d, weight, weight_scale)
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 282, in _run_deepgemm
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]     torch.ops.vllm.fp8_gemm_nt_op(
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]     return self._op(*args, **kwargs)
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 170, in _fp8_gemm_nt_op
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]     fp8_gemm_nt(
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/deep_gemm.py", line 186, in fp8_gemm_nt
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]     return _fp8_gemm_nt_impl(*args, disable_ue8m0_cast=not use_ue8m0, **kwargs)
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-08 07:48:29 UTC | (Worker_DP1_TP0_EP2 pid=918) ERROR 12-07 23:48:29 [multiproc_executor.py:822] RuntimeError: Assertion error (csrc/apis/../jit_kernels/impls/../heuristics/../../utils/layout.hpp:49): sfa_dtype == torch::kFloat and sfb_dtype == torch::kFloat

Test Plan

Passed CI: https://buildkite.com/vllm/ci/builds/42493#019aff0f-8d6e-4499-907f-010fc5594fc7

Signed-off-by: zhewenli <zhewenli@meta.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue with DeepGEMM by using a configured flag self.use_deep_gemm_e8m0 instead of a hardcoded True for the use_ue8m0 parameter. While this change is a step in the right direction, it exposes a critical logic flaw where the DeepGEMM path can be taken even when E8M0 is not enabled, which will lead to an assertion failure. I've added a comment with details on how to address this.

Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Zhewen Li <zhewenli@meta.com>
@zhewenl zhewenl changed the title [WIP] Fix DeepGEMM after #29646 [Bugfix] Fix DeepGEMM after #29546 Dec 8, 2025
@zhewenl zhewenl marked this pull request as ready for review December 8, 2025 19:13
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!
One small update before landed

@@ -269,11 +270,14 @@ def _run_deepgemm(
weight_scale: torch.Tensor,
) -> torch.Tensor:
assert self.deepgemm_input_quant_op is not None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert self.deepgemm_input_quant_op is not None

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 8, 2025
Signed-off-by: Zhewen Li <zhewenli@meta.com>
@yeqcharlotte yeqcharlotte enabled auto-merge (squash) December 9, 2025 00:57
@yeqcharlotte yeqcharlotte disabled auto-merge December 9, 2025 00:57
@yeqcharlotte yeqcharlotte enabled auto-merge (squash) December 9, 2025 00:57
@yeqcharlotte yeqcharlotte merged commit ae339b1 into vllm-project:main Dec 9, 2025
55 checks passed
@zhewenl zhewenl deleted the fix-ue8m0-hopper branch December 9, 2025 01:28
@zhewenl
Copy link
Copy Markdown
Collaborator Author

zhewenl commented Dec 9, 2025

This PR is causing Dynamo compilation issues, fixing in #30336

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Zhewen Li <zhewenli@meta.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants