Skip to content

[Bugfix] Fix fp8 DeepGemm compilation issues#30336

Merged
mgoin merged 3 commits intovllm-project:mainfrom
neuralmagic:fix-deepgemm-fp8-compilation
Dec 10, 2025
Merged

[Bugfix] Fix fp8 DeepGemm compilation issues#30336
mgoin merged 3 commits intovllm-project:mainfrom
neuralmagic:fix-deepgemm-fp8-compilation

Conversation

@ElizaWszola
Copy link
Copy Markdown
Contributor

@ElizaWszola ElizaWszola commented Dec 9, 2025

is_deep_gemm_e8m0_used() and current_platform.is_device_capability() are not compatible with Dynamo, causing failed compilations. This PR intends to fix this problem.

Testing

Run inference on Qwen/Qwen3-30B-A3B-FP8 (one of the models affected) with VLLM_USE_DEEP_GEMM=1.

Signed-off-by: ElizaWszola <ewszola@redhat.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix compilation issues for fp8 DeepGemm by refactoring how device capabilities are checked. While the changes in vllm/utils/deep_gemm.py are correct, a critical bug has been introduced in vllm/model_executor/layers/quantization/utils/fp8_utils.py. A line of code was moved out of an else block, which will cause incorrect quantization behavior. This needs to be addressed.

Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is_deepgemm_e8m0 could not be used? If it is because of the @cache, I am thinking we should have something like #29038 instead of hardcode for is_blackwell.

@ElizaWszola
Copy link
Copy Markdown
Contributor Author

@yewentao256 It's because of torch._dynamo.exc.Unsupported: can't handle functions not implemented in python - this is the error I had also run into a lot originally when unwrapping W8A8BlockFp8LinearOp, it's why so many variables that rely on knowing the architecture are assigned in its constructor

@yewentao256
Copy link
Copy Markdown
Member

@yewentao256 It's because of torch._dynamo.exc.Unsupported: can't handle functions not implemented in python - this is the error I had also run into a lot originally when unwrapping W8A8BlockFp8LinearOp, it's why so many variables that rely on knowing the architecture are assigned in its constructor

Got it, please refactor the class DeepGemmQuantScaleFMT, make it cache the architecture value in initialization then.
There are a lot of places using this class and may have the similar issue.

@ElizaWszola
Copy link
Copy Markdown
Contributor Author

@yewentao256 Isn't from_oracle() a static method?

Alternatively, before cleaning up this PR, I had implemented this kind of changes: aea97d1 but this felt a bit superfluous to me

@yewentao256
Copy link
Copy Markdown
Member

@yewentao256 Isn't from_oracle() a static method?

Alternatively, before cleaning up this PR, I had implemented this kind of changes: aea97d1 but this felt a bit superfluous to me

I think we don't actually need the static, just refactoring it to normal func. @varun-sundar-rabindranath also CC

@mgoin mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI deepseek Related to DeepSeek models labels Dec 9, 2025
weight_scale: torch.Tensor,
) -> torch.Tensor:
if DeepGemmQuantScaleFMT.from_oracle() == DeepGemmQuantScaleFMT.UE8M0:
if self.use_deep_gemm_e8m0 and self.is_blackwell:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#26197

Shouldn't e8m0 also compatible with hopper?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yewentao256 , thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is specifically for the case where e8m0 scales need to be packed, which is a Blackwell only case

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mgoin for the explanation!

@mgoin mgoin merged commit 2e7035d into vllm-project:main Dec 10, 2025
55 checks passed
@yewentao256 yewentao256 deleted the fix-deepgemm-fp8-compilation branch December 10, 2025 01:31
@yewentao256
Copy link
Copy Markdown
Member

OK Let's land this first since it is a blocker for CI, I can have follow up PR for refactoring the class later.

Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci-failure Issue about an unexpected test failure in CI deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants