[OneDNN] add mxfp8, mxfp4 onednn gemm by zufangzhu · Pull Request #235 · vllm-project/vllm-xpu-kernels

zufangzhu · 2026-03-30T05:20:58Z

cherry-pick #20
refine onednn gemm ut since the quant api changed.

baodii

LGTM

Copilot

Pull request overview

Updates the oneDNN backend and XPU bindings to enable MXFP8/MXFP4 (FP8/FP4 with block scaling) GEMM paths, along with expanded test coverage.

Changes:

Bump oneDNN submodule to a commit that includes/aligns with MXFP8/MXFP4 GEMM support.
Add FP4 GEMM operator plumbing (C++ op, torch binding, Python wrapper) and new FP4 GEMM tests.
Extend FP8 GEMM tests and update FP8 matmul scaling-attribute handling for MXFP8 block scales.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
third_party/oneDNN	Updates oneDNN submodule commit to pick up MXFP8/MXFP4 GEMM support.
tests/test_fp8_gemm_onednn.py	Expands fp8 GEMM test matrices and adds an MXFP8 GEMM test.
tests/test_fp4_gemm_onednn.py	Adds coverage for MXFP4 GEMM including reference reconstruction.
tests/register_ops.py	Adds a Python-level wrapper for the new fp4_gemm op.
csrc/xpu/torch_bindings.cpp	Registers the fp4_gemm operator schema and XPU implementation.
csrc/xpu/ops.h	Extends API surface with fp4_gemm declaration and updates shape comment.
csrc/xpu/onednn/onednn_matmul.cpp	Implements fp4_gemm entry point and routes to oneDNN FP4 matmul.
csrc/xpu/onednn/onednn_ext.h	Adds oneDNN dtype mappings for e8m0 scales and FP4, plus joint dtype cases.
csrc/xpu/onednn/fp8_gemm_w8a8.h	Adds MXFP8 scale handling via e8m0 block-wise scales.
csrc/xpu/onednn/fp4_gemm_w4a4.h	New oneDNN FP4 matmul implementation using block-wise e8m0 scales.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* add mxfp4 onednn gemm Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> * add ut for mx Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> * fix Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> * format with pre-commit Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> * thanks copilot Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> --------- Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

Signed-off-by: mayuyuace <qiming1.zhang@intel.com> Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

vllm-project#232) Signed-off-by: Qiao, Zhefeng <zhefeng.qiao@intel.com> Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

zufangzhu force-pushed the zufang/uptream_onednn_mx branch 2 times, most recently from b655464 to 2ddc61e Compare March 30, 2026 06:04

zufangzhu marked this pull request as ready for review April 2, 2026 06:56

Copilot AI review requested due to automatic review settings April 2, 2026 06:56

baodii approved these changes Apr 2, 2026

View reviewed changes

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Copilot started reviewing on behalf of zufangzhu April 2, 2026 07:10 View session

zufangzhu force-pushed the zufang/uptream_onednn_mx branch from ca13f77 to e33478e Compare April 3, 2026 01:19

zufangzhu requested a review from jikunshang April 3, 2026 01:19

zufangzhu force-pushed the zufang/uptream_onednn_mx branch from 2f9930c to 2857c28 Compare April 8, 2026 06:24

xinyu-intel approved these changes Apr 8, 2026

View reviewed changes

Yejing-Lai reviewed Apr 8, 2026

View reviewed changes

Comment thread tests/test_fp8_gemm_onednn.py Outdated

zufangzhu and others added 6 commits April 8, 2026 01:45

format

864ca6b

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

refine onednn gemm ut

a1b7ccc

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

skip scales check (vllm-project#256)

6e187f1

Signed-off-by: mayuyuace <qiming1.zhang@intel.com> Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

Support sycl impl relu2_no_mul for NVIDIA-Nemotron-3-Nano-30B-A3B-bf16 (

afbcb08

vllm-project#232) Signed-off-by: Qiao, Zhefeng <zhefeng.qiao@intel.com> Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

Update test_fp8_gemm_onednn.py

8f97d49

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

zufangzhu force-pushed the zufang/uptream_onednn_mx branch from faa5c73 to 8f97d49 Compare April 8, 2026 08:46

Merge branch 'main' into zufang/uptream_onednn_mx

2c74e74

jikunshang approved these changes Apr 8, 2026

View reviewed changes

jikunshang merged commit 6792890 into vllm-project:main Apr 9, 2026
8 checks passed

zufangzhu mentioned this pull request May 9, 2026

[OneDNN] upgrade onednn to 3.12 and add fp8 block gemm #173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OneDNN] add mxfp8, mxfp4 onednn gemm#235

[OneDNN] add mxfp8, mxfp4 onednn gemm#235
jikunshang merged 7 commits into
vllm-project:mainfrom
zufangzhu:zufang/uptream_onednn_mx

zufangzhu commented Mar 30, 2026 •

edited

Loading

Uh oh!

baodii left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

zufangzhu commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baodii left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

zufangzhu commented Mar 30, 2026 •

edited

Loading