Skip to content

[Refactor] Refactor for DeepGemmQuantScaleFMT using cache#30898

Merged
mgoin merged 4 commits intomainfrom
wentao-refactor-DeepGemmQuantScaleFMT
Dec 19, 2025
Merged

[Refactor] Refactor for DeepGemmQuantScaleFMT using cache#30898
mgoin merged 4 commits intomainfrom
wentao-refactor-DeepGemmQuantScaleFMT

Conversation

@yewentao256
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 commented Dec 17, 2025

Purpose

A follow up PR for #30336 (comment)

  1. We should use DeepGemmQuantScaleFMT instead of if self.use_deep_gemm_e8m0 and self.is_blackwell:
  2. Update the DeepGemmQuantScaleFMT using cache so that we won't have torch._dynamo.exc.Unsupported: can't handle functions not implemented in python in the future

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@yewentao256 yewentao256 changed the title [Refactpr] Refactor for DeepGemmQuantScaleFMT [Refactor] Refactor for DeepGemmQuantScaleFMT using cache Dec 17, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the DeepGemmQuantScaleFMT logic to be torch.compile friendly by introducing a caching mechanism. The changes are well-structured and improve maintainability by centralizing the logic.

However, I've found a critical issue of infinite recursion in the new implementation of init_oracle_cache. Please see my comment for details and a suggested fix. There is also a minor typo in the pull request title ('Refactpr').

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 17, 2025
@mgoin mgoin merged commit 3bd8335 into main Dec 19, 2025
56 checks passed
@mgoin mgoin deleted the wentao-refactor-DeepGemmQuantScaleFMT branch December 19, 2025 20:50
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
…ject#30898)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…ject#30898)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants