-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SSE2 fillers #2566
Add SSE2 fillers #2566
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I can follow all the SIMD logic and I see the same speedup locally with this PR over main as described in the PR comments. Makes a big difference! Only noticed one small indentation thing that seemed off. I guess the auto-formatter didn't catch it because it was inside a pre-processor macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the comparisons vs a single color surface in the PR description. Although I think it's using SDL blit, not AVX blit, since the surfaces you make in the test script don't have alpha.
I was just talking to somebody about how to implement a "fade to white" effect, and I think with this merged the MULT fillers would officially be a more efficient way to do that then a white surface and set_alpha.
Anyways, the code looks good to me.
This PR is a continuation of #2382 and it adds all blend modes for fillers implemented with SSE2. This should perform better on ARM and on old x86 CPUs that don't support the newest AVX2 instructions (PR for this: #2565).
Results:
From my testing SSE2 is about 2X slower than AVX2 (makes sense since we work on 1/2 the pixels at a time) but is still a lot faster than the current single pixel implementation and is about as fast as the AVX2 blit with cached color surface, so I'd say that's a win either way.
ON MAIN
WITH THIS PR
Test Program: