[Feature] Refactor batch invariant fp8 DeepGEMM #27606

yewentao256 · 2025-10-27T20:49:00Z

Purpose

We can reuse the code from fp8_utils to simplify the logic

Signed-off-by: yewentao256 <[email protected]>

gemini-code-assist

Code Review

This pull request refactors the apply method in Fp8LinearMethod to simplify the logic for batch-invariant FP8 GEMM. The change reuses W8A8BlockFp8LinearOp for block-quantized weights, aligning the batch-invariant path with the non-batch-invariant path and replacing a dequantization fallback with a more performant quantized kernel path. While this is a good simplification, I've identified a potential issue related to the removal of a safety check for handling weight scales of square matrices, which could lead to incorrect behavior in some edge cases.

mergify · 2025-10-30T20:18:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yewentao256.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: yewentao256 <[email protected]>

vllm/model_executor/layers/quantization/fp8.py

yewentao256 · 2025-11-07T22:59:04Z

This PR could also fix #28249

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: yewentao256 <[email protected]> (cherry picked from commit 35d801f)

Signed-off-by: yewentao256 <[email protected]>

Refactor batch invariant fp8 deepgemm

e5b7958

Signed-off-by: yewentao256 <[email protected]>

yewentao256 requested review from mgoin, pavanimajety, robertgshaw2-redhat and tlrmchlsmth as code owners October 27, 2025 20:49

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 27, 2025

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

90978b2

bwasti added this to Batch-invariant Inference Oct 28, 2025

yewentao256 added 3 commits October 28, 2025 14:39

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

e179b70

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

6c5382d

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

f0756a5

mergify bot added the needs-rebase label Oct 30, 2025

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

620ad79

Signed-off-by: yewentao256 <[email protected]>

mergify bot removed the needs-rebase label Oct 30, 2025

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

935933c

mgoin reviewed Nov 4, 2025

View reviewed changes

vllm/model_executor/layers/quantization/fp8.py Show resolved Hide resolved

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

ae71341

yewentao256 added 3 commits November 9, 2025 10:40

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

df037af

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

8566dd3

Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm

7f7bf15

mgoin approved these changes Nov 10, 2025

View reviewed changes

mgoin enabled auto-merge (squash) November 10, 2025 22:28

mgoin merged commit 35d801f into main Nov 11, 2025
56 checks passed

mgoin deleted the wentao-refactor-batch-invariant-fp8-deepgemm branch November 11, 2025 00:08

yewentao256 mentioned this pull request Nov 11, 2025

[Bug]: Batch invariant torch.dynamo.exc.Unsupported: Logger not supported for non-export cases #28249

Closed

1 task

ywang96 added this to the v0.11.1 milestone Nov 13, 2025

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025

[Feature] Refactor batch invariant fp8 DeepGEMM (vllm-project#27606)

3637d06

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

khluu pushed a commit that referenced this pull request Nov 16, 2025

[Feature] Refactor batch invariant fp8 DeepGEMM (#27606)

e9e39fc

Signed-off-by: yewentao256 <[email protected]>

khluu pushed a commit that referenced this pull request Nov 16, 2025

[Feature] Refactor batch invariant fp8 DeepGEMM (#27606)

324c8cb

Signed-off-by: yewentao256 <[email protected]> (cherry picked from commit 35d801f)

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Feature] Refactor batch invariant fp8 DeepGEMM (vllm-project#27606)

626f4fa

Signed-off-by: yewentao256 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature] Refactor batch invariant fp8 DeepGEMM #27606

[Feature] Refactor batch invariant fp8 DeepGEMM #27606

Uh oh!

yewentao256 commented Oct 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Oct 30, 2025

Uh oh!

Uh oh!

yewentao256 commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Feature] Refactor batch invariant fp8 DeepGEMM #27606

[Feature] Refactor batch invariant fp8 DeepGEMM #27606

Uh oh!

Conversation

yewentao256 commented Oct 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Oct 30, 2025

Uh oh!

Uh oh!

yewentao256 commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yewentao256 commented Oct 27, 2025 •

edited by github-actions bot

Loading