Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD functionality to the transform submodule (Attempt 2) #2421

Merged
merged 13 commits into from
Sep 8, 2023

Conversation

MyreMylar
Copy link
Member

Hopefully this will be a cleaner PR than the mess of the last one.

The basic idea is to make it fairly easy to add SIMD versions of transform functions to the transform submodule the same way we've been adding them to the blitters in the surface submodule.

The original messy PR is here: #2042

I've quickly tried to address some of the issues highlighted in the reviews of that PR here; so we shall see if it still passes the CI.

I have a follow-up PR to this that builds off it by adding a SIMD version of the current greyscale transform. I'll create that again if we can get this one working.

@MyreMylar MyreMylar requested a review from a team as a code owner August 26, 2023 21:55
@MyreMylar MyreMylar added the SIMD label Aug 27, 2023
@MyreMylar
Copy link
Member Author

MyreMylar commented Aug 28, 2023

It looks like I can't seem to share the body of pg_has_avx2() at the minute. I'm guessing it has something to do with the way avx2 is compiled into only the files 'simd_blitters_avx2' & 'simd_transform_avx2' - but why this only causes a failure in the 32bit builds and multiply blending I have no fully worked out idea, perhaps it is a difference between a compile time and runtime check somewhere causing it to head down a wrong path, there is definitely some 64bit only code which may explain the difference.

Anyway, I feel like we should avoid sharing anything that is AVX2 dependent in the simd_shared header and just stick to SSE2/neon or general shared stuff in there. I'll tinker a bit more to see if I can move more defines into there.

@itzpr3d4t0r
Copy link
Member

Could having a separate file where we compile the has_avx2 function with the AVX2 compile flag solve the issue?

@MyreMylar
Copy link
Member Author

Could having a separate file where we compile the has_avx2 function with the AVX2 compile flag solve the issue?

I'm trying that now, putting the 'simd_shared.c' file in both submodules (transform and surface)

@MyreMylar
Copy link
Member Author

MyreMylar commented Aug 28, 2023

So the problem with this approach of sharing a .c file across two submodules is that if one is linking while the other wants to start linking you get this:

simd_shared.c
    D:\a\pygame-ce\pygame-ce\src_c\simd_shared.c : fatal error C1083: Cannot open compiler generated file: 'D:\a\pygame-ce\pygame-ce\build\temp.win-amd64-cpython-312\Release\src_c\simd_shared.obj': Permission denied

I have no idea if there is any way round this other than going back to having the definition duplicated in separate file for each submodule. Unless anyone has any other ideas?

EDIT: Macros won't work either as both of these functions are filled with hash defines already.

@MyreMylar
Copy link
Member Author

The main things I've learned:

  1. You can share some complete functions across submodules via headers, as long as you are careful to only import that header once (otherwise you can get a function 'int pg_HasSSE_NEON()' already has a body type error).
  2. Some function bodies depend on being compiled with the -mavx2 or /arch:AVX2 compiler flag to work correctly, you can't apply this to a header file so you can't share the bodies of these functions in a function header.
  3. You can't share an object file between submodules without running into parallel race-conditions, linking permissions errors.

Anyway that's how you end up with the code as it is in this PR. I think I discovered most of this the first time I worked on this, but that was a while ago and the process wasn't well reasoned out or documented. I'm open to any suggestions to reduce some of this (fairly minor) code duplication, but it seems like the compiler really wants it to stay duplicated.

Copy link
Member

@Starbuck5 Starbuck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being so incremental with these PRs Myre, it makes it nice and easy to review.

Went through this it all seems reasonable, ran the test suite locally, put in some printfs to make sure SIMD was still getting dispatched as it was. All looks good.

!defined(SDL_DISABLE_IMMINTRIN_H) */
}
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly, these methods are the duplication in code?
pg_has_avx2
pg_avx2_at_runtime_but_uncomplied

Might it help to define a simd_avx2_shared.h and put all avx2 related header stuff in there instead of putting it in simd_shared.h? Or would you have the same problems? (I'm no expert in header includes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what i suggested, myre tried to do that, and 32 bit builds failed for unknown reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well he tried to put the methods in own *.c files. If you suggested what I suggested then it was not clear to me.

My suggestion is to put all avx related stuff into its own *.h file. Not sure what benefit would be gained and its merged already.

Copy link
Contributor

@dr0id dr0id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I'm no expert in header includes, but reading your changes they seem to be reasonable.

@itzpr3d4t0r itzpr3d4t0r merged commit e822283 into main Sep 8, 2023
@itzpr3d4t0r itzpr3d4t0r added this to the 2.4.0 milestone Sep 8, 2023
@Starbuck5 Starbuck5 deleted the simd-transform-setup branch September 9, 2023 04:26
@MyreMylar
Copy link
Member Author

MyreMylar commented Sep 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants