Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup SIMD operations for the transform submodule #1974

Closed
MyreMylar opened this issue Mar 4, 2023 · 3 comments
Closed

Setup SIMD operations for the transform submodule #1974

MyreMylar opened this issue Mar 4, 2023 · 3 comments
Assignees
Labels
Performance Related to the speed or resource usage of the project SIMD transform pygame.transform

Comments

@MyreMylar
Copy link
Member

At the moment Pygame CE uses SIMD (SSE2 & NEON, AVX2) to accelerate the blitting functions. In some cases, particularly with large surfaces, this can make the operations ten times faster than the same operation done with normal SISD code. This is particularly exciting where the performance is increased to such an extent it can enable new effect options to perform a full screen transform operation every frame of the game loop (in real-time) rather than as a one-off, or leaving it only available practically for smaller surfaces.

The obvious next place to take this optimisation strategy is to the transform submodule which also operates on Surfaces, pixel by pixel.

To get started, before beginning to add SSE2 and AVX2 variants of each individual function, there needs to be a setup stage adding the necessary architecture similar to what we have already created for the surface submodule.

Places to look for guidance

  • line 74 of setup.py avx2_filenames = ['simd_blitters_avx2'] - new AVX2 files will need adding to this list to be compiled with avx2 enabled.
  • simd_blitters.h
  • simd_blitters_avx2.h
  • simd_blitters_sse2.h
  • alphablit.c
  • Setup.SDL2.in

I would begin by cloning this structure with new files for the _blitters group and additions to the others where it makes sense.

You could also test the setup with a simple fake transform operation like adding red - (255,0,0) to every pixel in a surface and testing it works the same in each of the three basic modes (no SIMD, SSE2 & AVX2).

After the setup is done, I think the new greyscale transform would be the best option for a smooth conversion to SIMD.

Anyone feel free to take this on

@MyreMylar MyreMylar added Performance Related to the speed or resource usage of the project SIMD transform pygame.transform labels Mar 4, 2023
@MyreMylar MyreMylar added this to the 2.2 milestone Mar 4, 2023
@Starbuck5
Copy link
Member

The image module to/from bytes are also good opportunities for SIMD. Right now they act like they use SSE4.2 but I believe the SSE4.2 is never switched on, because it was added before we solved SIMD build/runtime issues.

Another SIMD project could be to rewrite the transform.smoothscale filters to use intrinsics so we can easily compile them on Neon (using SSE2Neon) and get rid of compiler specific hardcoded assembly in the project. There's also some MMX assembly there we could drop.

@MyreMylar MyreMylar self-assigned this Mar 21, 2023
@Starbuck5 Starbuck5 modified the milestones: 2.2, 2.3 Mar 26, 2023
@Starbuck5 Starbuck5 modified the milestones: 2.3, 2.3.1 May 31, 2023
@Starbuck5 Starbuck5 removed this from the 2.3.1 milestone Jul 28, 2023
@Starbuck5
Copy link
Member

This is partially handled in #2212 and #2213, but #2214 hasn't been resurrected by @MyreMylar yet.

Removing this from the 2.3.1 milestone.

@MyreMylar
Copy link
Member Author

linking to #2421 which closes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project SIMD transform pygame.transform
Projects
None yet
Development

No branches or pull requests

2 participants