Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement of transposed [SD]GEMV on A64FX and Neoverse V1. #4989

Open
iha-taisei opened this issue Nov 26, 2024 · 2 comments
Open

Comments

@iha-taisei
Copy link

I provided Pull Request #4803 for SVE-enablement of [SD]GEMV on A64FX, but there is still room for performance improvement.
Therefore, I'd like to propose another patch for such improvement of transposed [SD]GEMV on A64FX and Neoverse V1.

@Mousius
Copy link
Contributor

Mousius commented Nov 26, 2024

Hi @iha-taisei,

It's always good to keep adding new optimized kernels 😸

How would this be different from https://github.com/OpenMathLib/OpenBLAS/blob/develop/kernel/arm64/gemv_t_sve.c ?

@iha-taisei
Copy link
Author

Hi @Mousius,

As you see above, I did loop-unrolling too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants