Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

iha-taisei · 2024-11-26T02:59:39Z

I provided Pull Request #4803 for SVE-enablement of [SD]GEMV on A64FX, but there is still room for performance improvement.
Therefore, I'd like to propose another patch for such improvement of transposed [SD]GEMV on A64FX and Neoverse V1.

Mousius · 2024-11-26T12:02:49Z

Hi @iha-taisei,

It's always good to keep adding new optimized kernels 😸

How would this be different from https://github.com/OpenMathLib/OpenBLAS/blob/develop/kernel/arm64/gemv_t_sve.c ?

iha-taisei · 2024-12-02T10:47:45Z

Hi @Mousius,

As you see above, I did loop-unrolling too.

iha-taisei mentioned this issue Dec 2, 2024

Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1 #4996

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

iha-taisei commented Nov 26, 2024

Mousius commented Nov 26, 2024

iha-taisei commented Dec 2, 2024

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Comments

iha-taisei commented Nov 26, 2024

Mousius commented Nov 26, 2024

iha-taisei commented Dec 2, 2024