Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Rayon+SIMD support #6

Merged
merged 1 commit into from
Jul 18, 2022
Merged

Rayon+SIMD support #6

merged 1 commit into from
Jul 18, 2022

Conversation

LoganDark
Copy link
Owner

@LoganDark LoganDark commented Jun 22, 2022

Combines #4 and #5

Closes #3

Using both at the same time now actually outperforms rayon on its own.

This is surprising.

Currently a lane count of 8 is the fastest on my machine, but this is tunable.

This is all thanks to imgref-iter's experimental SIMD iterators.

@owenthewizard
Copy link
Contributor

Awesome work!! 🙏

@LoganDark
Copy link
Owner Author

LoganDark commented Jun 22, 2022

I don't foresee broadcast support being merged any time soon, and now that I'm using SIMD it's going to be harder to split things up manually (I'd have to re-implement the work stealing), so I'm going to mark this as ready for review.

I'm planning to merge this instead of #4 and #5, since it combines both of them for greater performance gains.

@LoganDark LoganDark marked this pull request as ready for review June 22, 2022 23:10
@owenthewizard
Copy link
Contributor

In my testing (owenthewizard/i3lockr@fcd8ccf) rayon > simd+rayon > default > simd.

@LoganDark
Copy link
Owner Author

In my testing (owenthewizard/i3lockr@fcd8ccf) rayon > simd+rayon > default > simd.

Did you test with lanes other than 32? For me, 8 lanes is the fastest - lower or higher is slower. YMMV, of course, I'm on 10th gen Intel.

@owenthewizard
Copy link
Contributor

That's what I get for doing this at 5:00 A.M....
8 lanes is fastest for me as well (AMD Ryzen 5 2500U).
With 8 lanes rayon+simd is about the same as rayon.

@LoganDark
Copy link
Owner Author

That's what I get for doing this at 5:00 A.M....
8 lanes is fastest for me as well (AMD Ryzen 5 2500U).
With 8 lanes rayon+simd is about the same as rayon.

Interesting, for me rayon+simd is about 5-10% faster. It's a very small speedup in all honesty though, so I'm still on the fence about actually merging in SIMD (I might just go back to the plain rayon branch).

Using the new SIMD iterators in `imgref-iter`, it's possible to
implement SIMD version of stackblur. Right now SIMD doesn't offer too
much of a benefit, but it is still a benefit nonetheless, and rayon
definitely offers a huge benefit.

This has been sitting around in PR hell for far too long, and I don't
like managing multiple disparate branches, so it's about time this gets
merged and released.
@LoganDark LoganDark merged commit de8f78b into master Jul 18, 2022
@LoganDark LoganDark deleted the rayon-simd branch July 18, 2022 13:29
This was referenced Jul 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Multithreading (rayon)
2 participants