[Refactor] Refactor for `DeepGemmQuantScaleFMT` using cache by yewentao256 · Pull Request #30898 · vllm-project/vllm

yewentao256 · 2025-12-17T19:15:05Z

Purpose

We should use DeepGemmQuantScaleFMT instead of if self.use_deep_gemm_e8m0 and self.is_blackwell:
Update the DeepGemmQuantScaleFMT using cache so that we won't have torch._dynamo.exc.Unsupported: can't handle functions not implemented in python in the future

Signed-off-by: yewentao256 <zhyanwentao@126.com>

chatgpt-codex-connector · 2025-12-17T19:15:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request refactors the DeepGemmQuantScaleFMT logic to be torch.compile friendly by introducing a caching mechanism. The changes are well-structured and improve maintainability by centralizing the logic.

However, I've found a critical issue of infinite recursion in the new implementation of init_oracle_cache. Please see my comment for details and a suggested fix. There is also a minor typo in the pull request title ('Refactpr').

vllm/utils/deep_gemm.py

Signed-off-by: yewentao256 <zhyanwentao@126.com>

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com>

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com>

refactor for DeepGemmQuantScaleFMT

6f8e44c

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from mgoin, pavanimajety, robertgshaw2-redhat and tlrmchlsmth as code owners December 17, 2025 19:15

yewentao256 changed the title ~~[Refactpr] Refactor for DeepGemmQuantScaleFMT~~ [Refactor] Refactor for DeepGemmQuantScaleFMT using cache Dec 17, 2025

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

vllm/utils/deep_gemm.py Outdated Show resolved Hide resolved

yewentao256 added 2 commits December 17, 2025 11:29

fix

5f8c2d9

Signed-off-by: yewentao256 <zhyanwentao@126.com>

fix

b6b9a01

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 17, 2025

Merge branch 'main' into wentao-refactor-DeepGemmQuantScaleFMT

c6d50cc

mgoin approved these changes Dec 19, 2025

View reviewed changes

mgoin merged commit 3bd8335 into main Dec 19, 2025
56 checks passed

mgoin deleted the wentao-refactor-DeepGemmQuantScaleFMT branch December 19, 2025 20:50

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025

[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (vllm-pro…

bceee36

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com>

Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025

[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (vllm-pro…

2092b5b

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (vllm-pro…

f71c10d

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (vllm-pro…

bd692c3

…ject#30898) Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Refactor for `DeepGemmQuantScaleFMT` using cache#30898

[Refactor] Refactor for `DeepGemmQuantScaleFMT` using cache#30898
mgoin merged 4 commits intomainfrom
wentao-refactor-DeepGemmQuantScaleFMT

yewentao256 commented Dec 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yewentao256 commented Dec 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

chatgpt-codex-connector bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yewentao256 commented Dec 17, 2025 •

edited by github-actions bot

Loading