Skip to content

Add RVV (RISC-V Vector Extension) optimized convolution and pooling kernels for the NCHWc blocked format in MLAS#28411

Open
velonica0 wants to merge 1 commit intomicrosoft:mainfrom
velonica0:rvv_pr
Open

Add RVV (RISC-V Vector Extension) optimized convolution and pooling kernels for the NCHWc blocked format in MLAS#28411
velonica0 wants to merge 1 commit intomicrosoft:mainfrom
velonica0:rvv_pr

Conversation

@velonica0
Copy link
Copy Markdown
Contributor

@velonica0 velonica0 commented May 8, 2026

Description

New kernel files:

  • riscv64/sconv_depthwise_kernel_rvv.cpp — RVV-optimized 3x3 stride-1 depthwise convolution (NCHW format), replacing the MLAS_FLOAT32X4 generic vectorized version
  • riscv64/sconv_nchwc_kernel_rvv.cpp — 7 NCHWc kernels using vfloat32m4_t (LMUL=4, BlockSize=16):
    • Direct NCHW conv (MlasConvNchwFloatKernelRvv)
    • Direct NCHWc conv (MlasConvNchwcFloatKernelRvv)
    • Depthwise NCHWc conv (MlasConvDepthwiseFloatKernelRvv)
    • Pointwise NCHWc conv (MlasConvPointwiseFloatKernelRvv)
    • Max/AvgExcludePad/AvgIncludePad pooling

Motivation and Context

Following #28261, Optimize more MLAS kernels using RISC-V Vector (RVV) extensions.

Please Note:

  • On the K3 (SpacemiT X60), VLEN=256. With LMUL=4 and e32, the hardware can hold (256/32) * 4 = 32 floats per vector register group — but we only request 16. So we're using half the available vector width.

  • The reason is that BlockSize=16 is baked into the NCHWc data layout across the whole framework (matching ARM64 NEON). Changing it to 32 would require a different NCHWc format and is not a localized change.

Benchmark ((SpacemiT K3, VLEN=256, 8-core))

All tests pass with zero numerical error.

Kernel Speedup (RVV vs scalar)
Direct NCHW Conv 1.27–1.29x
Direct NCHWc Conv 1.93–1.95x
Depthwise NCHWc Conv 10.8–12.5x
Pointwise NCHWc Conv 29.4–30.4x
Max Pooling 12.5–20.0x
Avg Pooling (exclude pad) 4.0–4.3x
Avg Pooling (include pad) 5.5–5.8x

@velonica0
Copy link
Copy Markdown
Contributor Author

Hi @hariharans29
Could you please take a look at this PR when you have a moment? I’d really appreciate your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant