Skip to content

cpu: riscv: matmul: add RVV row/col kernels with bias, ReLU post-op#3784

Merged
vpirogov merged 4 commits intouxlfoundation:mainfrom
krishnasai-mcw:main
Sep 3, 2025
Merged

cpu: riscv: matmul: add RVV row/col kernels with bias, ReLU post-op#3784
vpirogov merged 4 commits intouxlfoundation:mainfrom
krishnasai-mcw:main

Conversation

@krishnasai-mcw
Copy link
Contributor

Description

This PR introduces an optimized MatMul (Matrix Multiplication) kernel for RISC-V. The implementation provides significant performance improvements for GEMM (General Matrix-Matrix Multiplication) operations by utilizing RVV intrinsics.
This initial version lays the groundwork for RVV acceleration in MatMul, focusing on common use cases while establishing a framework for future extensions.

Key Changes

  • RVV-Optimized Kernels: Added rvv_matmul_rowmajor and rvv_matmul_colmajor kernels to handle different weight memory layouts efficiently.
  • FP32 Data Type Support: The implementation is specialized for f32 data type operations.
  • Bias and ReLU Support: Integrates support for bias addition and fused ReLU post-operation, handled via the rvv_postops handler.
  • Layout-Specific Optimizations: The code selects between row-major and column-major weight layouts at runtime for optimal performance.

Implementation Details and Constraints

The current implementation has the following characteristics:

  • Data Types:
    • Source, Weights, Destination, and Bias must all be f32.
    • The accumulation data type is also f32.
  • Memory Layouts:
    • Source and Destination tensors must be dense and in a plain (row-major) format (abx, abcx, etc.).
    • Weights tensors can be in either plain format (abx) or column-major format (ba, acb, abdc,..).
  • Broadcasting:
    • Weights: The leading (batch) dimensions of src and weights (all dims except the final two matrix dims), each corresponding weight dimension must either match the src dimension or be 1. In other words, weights may be broadcast across batch dims.
    • Bias: Bias dims are aligned to the trailing dimensions of dst; for each bias dimension it must be 1 or equal to the corresponding dst dimension (standard trailing-dims broadcasting).
    • Source / Destination: src and dst are expected to be dense, explicit tensors (no implicit broadcasting of src/dst batch dims). Weight broadcasting (leading dims == 1) and bias broadcasting (trailing dims == 1) are supported as described above.
  • Post-Ops:
    • Only a single, fused ReLU operation is supported as a post-op. No other post-ops are currently implemented.
    • Post-op application is optional.
  • Bias:
    • Bias is optional.
    • If present, it must be a dense f32 tensor.
    • Broadcasting is supported(same rules as matmul).
  • Parallelization:
    • The implementation ensures effective multi-threading, enabling optimized parallel execution and improved CPU utilization, while leveraging RVV-vectorized inner loops.
  • Limitations:
    • Runtime dimensions or strides are not supported.
    • Only f32 is supported.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements?

Testing was performed using the RISC-V GNU toolchain version 14.2, and the functionality was validated under the QEMU RISCV64 emulator.
Calls to the implemented matmul can be found by searching for RISCV64GCV in

New features

  • [N/A] Have you published an RFC for the new feature?
  • [N/A] Was the RFC approved?
  • [N/A] Have you added relevant tests?

Bug fixes

  • [N/A] Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
  • [N/A] Have you added relevant regression tests?

RFC PR

  • [N/A] Does RFC document follow the template?
  • [N/A] Have you added a link to the rendered document?

@krishnasai-mcw krishnasai-mcw requested a review from a team as a code owner August 19, 2025 04:21
@krishnasai-mcw
Copy link
Contributor Author

Hi @dzarukin,

Thank you for the feedback! I’ve gone through your suggestions and applied the necessary fixes throughout the code. The changes should now align with your recommendations.

Also, could you please clarify what copyright headers should be added for the newly created files?

@vpirogov
Copy link
Contributor

Also, could you please clarify what copyright headers should be added for the newly created files?

This project is licensed under Apache License 2.0, so new files should carry corresponding license banner (see example here).

@krishnasai-mcw
Copy link
Contributor Author

Hi @vpirogov,
Thank you for the clarification. I have updated the PR to include the appropriate Apache 2.0 license headers in all newly added files.

@vpirogov
Copy link
Contributor

make test linters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants