cpu: rv64: gemm: Implemented variable loop unrolling for GEMM by xiazhuozhao · Pull Request #4258 · uxlfoundation/oneDNN

xiazhuozhao · 2025-10-31T05:11:34Z

Description

This implementation provides a variable loop unrolling GEMM implementation for the RISC-V platform, which significantly improves the performance of GEMM.

This change introduces logic to select an appropriate loop unrolling kernel based on the L1 cache size. This significantly improves matmul performance on devices with a 64KB L1 cache, while performance on 32KB devices will remain consistent with the #3785 implementation.

For context, the current RISC-V matmul implementation (#3784) is non-GEMM-based. While an efficient GEMM kernel was also implemented in #3785, the resulting GEMM-based matmul was not prioritized over #3784.

On a 64KB L1 cache device, this new implementation (#4258) achieves a significant average speedup of 23.69x over the current matmul (#3784) and 15.43x over the GEMM-based matmul from #3785.

performance data (The unit of data in the table is average GFLOPS, and the test command is ./benchdnn --mode=p --matmul --batch=inputs/matmul/perf_matmul_training.)

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

Performance improvements

Have you submitted performance data that demonstrates performance improvements?

src/cpu/matmul/cpu_matmul_list.cpp

src/cpu/rv64/gemm/rvv_gemm_utils_f32.hpp

Co-authored-by: Fei Zhang <zhangfei@iscas.ac.cn>

zhangjian29 · 2025-12-03T02:52:47Z

On a 64KB L1 cache device, this new implementation (#4258) achieves a significant average speedup of 23.69x over the current matmul (#3784) and 15.43x over the GEMM-based matmul from #3785.

Hi @xiazhuozhao ,

I tested your code of n unrolling logic on a 2044 platform with 64KB L1 cache on each single core. It doesn't look like n-unroll-factor of 8 performs the best on it.

Batch Shape	N-Unroll-2	N-Unroll-4	N-Unroll-8	N-Unroll-16
shapes_converted_ip_inf_lb_wd	207.592	204.823	214.517	257.341
shapes_converted_ip_inf_lb_gmnt	27.9092	28.639	29.4266	44.6874
shapes_converted_ip_inf_lb_googlenet	257.508	255.205	301.095	473.827
shapes_converted_ip_inf_lb_resnet	114.712	110.454	128.085	193.939
shapes_transformer	149.28	149.719	139.334	771.982

What do you think goes wrong on my tests?

github-actions bot added the platform:cpu-rv64 RISC-V label Oct 31, 2025

xiazhuozhao force-pushed the V-GEMM branch 4 times, most recently from d5f6940 to 7637e27 Compare October 31, 2025 10:28

xiazhuozhao marked this pull request as ready for review October 31, 2025 17:21

xiazhuozhao requested a review from a team as a code owner October 31, 2025 17:21

dzarukin reviewed Nov 3, 2025

View reviewed changes

src/cpu/matmul/cpu_matmul_list.cpp Show resolved Hide resolved

src/cpu/rv64/gemm/rvv_gemm_utils_f32.hpp Outdated Show resolved Hide resolved

xiazhuozhao force-pushed the V-GEMM branch from 7637e27 to 5cad62c Compare November 7, 2025 18:44

cpu: rv64: gemm: Implemented variable loop unrolling for GEMM

b488322

Co-authored-by: Fei Zhang <zhangfei@iscas.ac.cn>

xiazhuozhao force-pushed the V-GEMM branch from 5cad62c to b488322 Compare November 19, 2025 10:19

vpirogov requested a review from dzarukin November 20, 2025 23:09

dzarukin approved these changes Nov 21, 2025

View reviewed changes

vpirogov approved these changes Dec 2, 2025

View reviewed changes

vpirogov merged commit d6107dd into uxlfoundation:main Dec 2, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: rv64: gemm: Implemented variable loop unrolling for GEMM#4258

cpu: rv64: gemm: Implemented variable loop unrolling for GEMM#4258
vpirogov merged 1 commit intouxlfoundation:mainfrom
xiazhuozhao:V-GEMM

xiazhuozhao commented Oct 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangjian29 commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xiazhuozhao commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

General

Performance improvements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangjian29 commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiazhuozhao commented Oct 31, 2025 •

edited

Loading