Skip to content

[OneDNN] add mxfp8, mxfp4 onednn gemm#235

Merged
jikunshang merged 7 commits into
vllm-project:mainfrom
zufangzhu:zufang/uptream_onednn_mx
Apr 9, 2026
Merged

[OneDNN] add mxfp8, mxfp4 onednn gemm#235
jikunshang merged 7 commits into
vllm-project:mainfrom
zufangzhu:zufang/uptream_onednn_mx

Conversation

@zufangzhu
Copy link
Copy Markdown
Collaborator

@zufangzhu zufangzhu commented Mar 30, 2026

  1. cherry-pick #20
  2. refine onednn gemm ut since the quant api changed.

@zufangzhu zufangzhu force-pushed the zufang/uptream_onednn_mx branch 2 times, most recently from b655464 to 2ddc61e Compare March 30, 2026 06:04
@zufangzhu zufangzhu marked this pull request as ready for review April 2, 2026 06:56
Copilot AI review requested due to automatic review settings April 2, 2026 06:56
Copy link
Copy Markdown
Collaborator

@baodii baodii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the oneDNN backend and XPU bindings to enable MXFP8/MXFP4 (FP8/FP4 with block scaling) GEMM paths, along with expanded test coverage.

Changes:

  • Bump oneDNN submodule to a commit that includes/aligns with MXFP8/MXFP4 GEMM support.
  • Add FP4 GEMM operator plumbing (C++ op, torch binding, Python wrapper) and new FP4 GEMM tests.
  • Extend FP8 GEMM tests and update FP8 matmul scaling-attribute handling for MXFP8 block scales.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
third_party/oneDNN Updates oneDNN submodule commit to pick up MXFP8/MXFP4 GEMM support.
tests/test_fp8_gemm_onednn.py Expands fp8 GEMM test matrices and adds an MXFP8 GEMM test.
tests/test_fp4_gemm_onednn.py Adds coverage for MXFP4 GEMM including reference reconstruction.
tests/register_ops.py Adds a Python-level wrapper for the new fp4_gemm op.
csrc/xpu/torch_bindings.cpp Registers the fp4_gemm operator schema and XPU implementation.
csrc/xpu/ops.h Extends API surface with fp4_gemm declaration and updates shape comment.
csrc/xpu/onednn/onednn_matmul.cpp Implements fp4_gemm entry point and routes to oneDNN FP4 matmul.
csrc/xpu/onednn/onednn_ext.h Adds oneDNN dtype mappings for e8m0 scales and FP4, plus joint dtype cases.
csrc/xpu/onednn/fp8_gemm_w8a8.h Adds MXFP8 scale handling via e8m0 block-wise scales.
csrc/xpu/onednn/fp4_gemm_w4a4.h New oneDNN FP4 matmul implementation using block-wise e8m0 scales.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread csrc/xpu/onednn/fp4_gemm_w4a4.h
Comment thread csrc/xpu/onednn/fp4_gemm_w4a4.h
Comment thread csrc/xpu/onednn/fp4_gemm_w4a4.h
Comment thread csrc/xpu/onednn/fp4_gemm_w4a4.h
Comment thread csrc/xpu/onednn/fp4_gemm_w4a4.h
Comment thread csrc/xpu/onednn/onednn_matmul.cpp
Comment thread csrc/xpu/onednn/onednn_ext.h
Comment thread tests/test_fp8_gemm_onednn.py Outdated
zufangzhu and others added 6 commits April 8, 2026 01:45
* add mxfp4 onednn gemm

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

* add ut for mx

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

* fix

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

* format with pre-commit

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

* thanks copilot

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

---------

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
vllm-project#232)

Signed-off-by: Qiao, Zhefeng <zhefeng.qiao@intel.com>
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
@zufangzhu zufangzhu force-pushed the zufang/uptream_onednn_mx branch from faa5c73 to 8f97d49 Compare April 8, 2026 08:46
@jikunshang jikunshang merged commit 6792890 into vllm-project:main Apr 9, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants