[Bugfix] Fix fp8 DeepGemm compilation issues by ElizaWszola · Pull Request #30336 · vllm-project/vllm

ElizaWszola · 2025-12-09T12:41:37Z

is_deep_gemm_e8m0_used() and current_platform.is_device_capability() are not compatible with Dynamo, causing failed compilations. This PR intends to fix this problem.

Testing

Run inference on Qwen/Qwen3-30B-A3B-FP8 (one of the models affected) with VLLM_USE_DEEP_GEMM=1.

Signed-off-by: ElizaWszola <ewszola@redhat.com>

chatgpt-codex-connector · 2025-12-09T12:41:43Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request aims to fix compilation issues for fp8 DeepGemm by refactoring how device capabilities are checked. While the changes in vllm/utils/deep_gemm.py are correct, a critical bug has been introduced in vllm/model_executor/layers/quantization/utils/fp8_utils.py. A line of code was moved out of an else block, which will cause incorrect quantization behavior. This needs to be addressed.

vllm/model_executor/layers/quantization/utils/fp8_utils.py

Signed-off-by: ElizaWszola <ewszola@redhat.com>

yewentao256

Why is_deepgemm_e8m0 could not be used? If it is because of the @cache, I am thinking we should have something like #29038 instead of hardcode for is_blackwell.

ElizaWszola · 2025-12-09T15:12:22Z

@yewentao256 It's because of torch._dynamo.exc.Unsupported: can't handle functions not implemented in python - this is the error I had also run into a lot originally when unwrapping W8A8BlockFp8LinearOp, it's why so many variables that rely on knowing the architecture are assigned in its constructor

yewentao256 · 2025-12-09T15:49:14Z

@yewentao256 It's because of torch._dynamo.exc.Unsupported: can't handle functions not implemented in python - this is the error I had also run into a lot originally when unwrapping W8A8BlockFp8LinearOp, it's why so many variables that rely on knowing the architecture are assigned in its constructor

Got it, please refactor the class DeepGemmQuantScaleFMT, make it cache the architecture value in initialization then.
There are a lot of places using this class and may have the similar issue.

ElizaWszola · 2025-12-09T16:14:18Z

@yewentao256 Isn't from_oracle() a static method?

Alternatively, before cleaning up this PR, I had implemented this kind of changes: aea97d1 but this felt a bit superfluous to me

yewentao256 · 2025-12-09T20:16:28Z

@yewentao256 Isn't from_oracle() a static method?

Alternatively, before cleaning up this PR, I had implemented this kind of changes: aea97d1 but this felt a bit superfluous to me

I think we don't actually need the static, just refactoring it to normal func. @varun-sundar-rabindranath also CC

jhaotingc · 2025-12-10T00:58:24Z

vllm/model_executor/layers/quantization/utils/fp8_utils.py

        weight_scale: torch.Tensor,
    ) -> torch.Tensor:
-        if DeepGemmQuantScaleFMT.from_oracle() == DeepGemmQuantScaleFMT.UE8M0:
+        if self.use_deep_gemm_e8m0 and self.is_blackwell:


#26197

Shouldn't e8m0 also compatible with hopper?

cc @yewentao256 , thanks!

IIUC this is specifically for the case where e8m0 scales need to be packed, which is a Blackwell only case

Thank you @mgoin for the explanation!

yewentao256 · 2025-12-10T01:31:41Z

OK Let's land this first since it is a blocker for CI, I can have follow up PR for refactoring the class later.

Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Fix fp8 DeepGemm compilation issues

aea97d1

Signed-off-by: ElizaWszola <ewszola@redhat.com>

ElizaWszola requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 9, 2025 12:41

ElizaWszola mentioned this pull request Dec 9, 2025

[Bugfix] Fix fusion for VL models #30244

Merged

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated Show resolved Hide resolved

ElizaWszola added 2 commits December 9, 2025 12:49

fix else

0afcdf0

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Simplify

3b5eb3a

Signed-off-by: ElizaWszola <ewszola@redhat.com>

yewentao256 reviewed Dec 9, 2025

View reviewed changes

bnellnm mentioned this pull request Dec 9, 2025

[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply #29066

Merged

5 tasks

zhewenl mentioned this pull request Dec 9, 2025

[Bugfix] Fix DeepGEMM after #29546 #30267

Merged

bnellnm approved these changes Dec 9, 2025

View reviewed changes

mgoin approved these changes Dec 9, 2025

View reviewed changes

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI deepseek Related to DeepSeek models labels Dec 9, 2025

github-project-automation bot added this to CI Failures Dec 9, 2025

jhaotingc reviewed Dec 10, 2025

View reviewed changes

mgoin approved these changes Dec 10, 2025

View reviewed changes

mgoin merged commit 2e7035d into vllm-project:main Dec 10, 2025
55 checks passed

github-project-automation bot moved this to Done in CI Failures Dec 10, 2025

yewentao256 deleted the fix-deepgemm-fp8-compilation branch December 10, 2025 01:31

yewentao256 mentioned this pull request Dec 17, 2025

[Refactor] Refactor for DeepGemmQuantScaleFMT using cache #30898

Merged

Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025

[Bugfix] Fix fp8 DeepGemm compilation issues (vllm-project#30336)

d940602

Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[Bugfix] Fix fp8 DeepGemm compilation issues (vllm-project#30336)

f6c9548

Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Uh oh!

Conversation

ElizaWszola commented Dec 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

chatgpt-codex-connector bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

ElizaWszola commented Dec 9, 2025

Uh oh!

yewentao256 commented Dec 9, 2025

Uh oh!

ElizaWszola commented Dec 9, 2025

Uh oh!

yewentao256 commented Dec 9, 2025

Uh oh!

jhaotingc Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

jhaotingc Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

jhaotingc Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yewentao256 commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ElizaWszola commented Dec 9, 2025 •

edited by github-actions bot

Loading