cpu: rv64: gemm: add bf16 gemm SIMD optimization with RISC-V V Extension by ryanle1017 · Pull Request #3845 · uxlfoundation/oneDNN

ryanle1017 · 2025-08-29T14:02:24Z

Description

This PR introduces a SIMD-optimized bfloat16 GEMM kernel for the RISC-V 64-bit architecture, leveraging the RISC-V Vector (V) Extension. This work extends the foundational f32 GEMM implementation from PR #3785, enabling high-performance mixed-precision computations.
The primary motivation is to accelerate inference on emerging RISC-V platforms. As bfloat16 becomes a critical data type for modern deep learning models, offering significant memory bandwidth savings with a dynamic range comparable to f32, this optimized kernel fills a crucial performance gap.
This implementation focuses on the bf16:bf16:f32 data type combination (bfloat16 inputs, float32 accumulation and output), which is a common and numerically robust approach for mixed-precision GEMM.

Key Changes

RVV-Optimized BF16 Kernel: Added a new GEMM kernel (rvv_gemm_bf16bf16f32) specifically designed for bfloat16 inputs and float32 outputs/accumulation.
GEMM Dispatch Integration: The main GEMM dispatch logic in src/cpu/gemm/gemm.cpp is updated to route bf16bf16f32 requests to the new RVV kernel when running on a compatible RISC-V platform.
Platform Feature Detection: Extended platform::mayiuse_bf16() to correctly detect and enable bfloat16 support when RVV intrinsics are available for RISC-V.

Checklist

General

[√] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
[√] Have you formatted the code using clang-format?

Performance improvements

[√] Have you submitted performance data that demonstrates performance improvements?

All performance data was measured on a
--TODO
The performance baseline is the default ref_gemm implementation in the oneDNN main branch. Below are some example performance benchmarks for different problem sizes.

Matmul Primitives Performance(--dt=bf16:bf16:f32)

New features

Have you published an RFC for the new feature?
Was the RFC approved?
Have you added relevant tests?

Bug fixes

Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
Have you added relevant regression tests?

RFC PR

Does RFC document follow the template?
Have you added a link to the rendered document?

Co-authored-by: Fei Zhang <zhangfei@iscas.ac.cn>

github-actions bot added platform:cpu-rv64 RISC-V component:common labels Aug 29, 2025

cpu: rv64: gemm: add bf16 gemm SIMD optimization with RISC-V V Extension

7bd3fe8

Co-authored-by: Fei Zhang <zhangfei@iscas.ac.cn>

ryanle1017 force-pushed the feature/rvv-bf16-gemm-github branch from ac70101 to 7bd3fe8 Compare August 29, 2025 14:38

ryanle1017 closed this Aug 29, 2025

ryanle1017 deleted the feature/rvv-bf16-gemm-github branch August 29, 2025 14:53

ryanle1017 restored the feature/rvv-bf16-gemm-github branch August 29, 2025 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: rv64: gemm: add bf16 gemm SIMD optimization with RISC-V V Extension#3845

cpu: rv64: gemm: add bf16 gemm SIMD optimization with RISC-V V Extension#3845
ryanle1017 wants to merge 1 commit intouxlfoundation:mainfrom
ryanle1017:feature/rvv-bf16-gemm-github

ryanle1017 commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanle1017 commented Aug 29, 2025

Description

Key Changes

Checklist

General

Performance improvements

New features

Bug fixes

RFC PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant