Setup SIMD operations for the `transform` submodule #1974

MyreMylar · 2023-03-04T11:52:15Z

At the moment Pygame CE uses SIMD (SSE2 & NEON, AVX2) to accelerate the blitting functions. In some cases, particularly with large surfaces, this can make the operations ten times faster than the same operation done with normal SISD code. This is particularly exciting where the performance is increased to such an extent it can enable new effect options to perform a full screen transform operation every frame of the game loop (in real-time) rather than as a one-off, or leaving it only available practically for smaller surfaces.

The obvious next place to take this optimisation strategy is to the transform submodule which also operates on Surfaces, pixel by pixel.

To get started, before beginning to add SSE2 and AVX2 variants of each individual function, there needs to be a setup stage adding the necessary architecture similar to what we have already created for the surface submodule.

Places to look for guidance

line 74 of setup.py avx2_filenames = ['simd_blitters_avx2'] - new AVX2 files will need adding to this list to be compiled with avx2 enabled.
simd_blitters.h
simd_blitters_avx2.h
simd_blitters_sse2.h
alphablit.c
Setup.SDL2.in

I would begin by cloning this structure with new files for the _blitters group and additions to the others where it makes sense.

You could also test the setup with a simple fake transform operation like adding red - (255,0,0) to every pixel in a surface and testing it works the same in each of the three basic modes (no SIMD, SSE2 & AVX2).

After the setup is done, I think the new greyscale transform would be the best option for a smooth conversion to SIMD.

Anyone feel free to take this on

The text was updated successfully, but these errors were encountered:

Starbuck5 · 2023-03-21T08:27:20Z

The image module to/from bytes are also good opportunities for SIMD. Right now they act like they use SSE4.2 but I believe the SSE4.2 is never switched on, because it was added before we solved SIMD build/runtime issues.

Another SIMD project could be to rewrite the transform.smoothscale filters to use intrinsics so we can easily compile them on Neon (using SSE2Neon) and get rid of compiler specific hardcoded assembly in the project. There's also some MMX assembly there we could drop.

Starbuck5 · 2023-07-28T06:56:12Z

This is partially handled in #2212 and #2213, but #2214 hasn't been resurrected by @MyreMylar yet.

Removing this from the 2.3.1 milestone.

MyreMylar · 2023-09-30T08:15:50Z

linking to #2421 which closes this issue.

MyreMylar added Performance Related to the speed or resource usage of the project SIMD transform pygame.transform labels Mar 4, 2023

MyreMylar added this to the 2.2 milestone Mar 4, 2023

MyreMylar mentioned this issue Mar 21, 2023

Setup SIMD in the transform submodule #2042

Closed

MyreMylar self-assigned this Mar 21, 2023

Starbuck5 modified the milestones: 2.2, 2.3 Mar 26, 2023

Starbuck5 modified the milestones: 2.3, 2.3.1 May 31, 2023

Starbuck5 removed this from the 2.3.1 milestone Jul 28, 2023

MyreMylar closed this as completed Sep 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup SIMD operations for the `transform` submodule #1974

Setup SIMD operations for the `transform` submodule #1974

MyreMylar commented Mar 4, 2023

Starbuck5 commented Mar 21, 2023

Starbuck5 commented Jul 28, 2023

MyreMylar commented Sep 30, 2023

Setup SIMD operations for the transform submodule #1974

Setup SIMD operations for the transform submodule #1974

Comments

MyreMylar commented Mar 4, 2023

Places to look for guidance

Starbuck5 commented Mar 21, 2023

Starbuck5 commented Jul 28, 2023

MyreMylar commented Sep 30, 2023

Setup SIMD operations for the `transform` submodule #1974

Setup SIMD operations for the `transform` submodule #1974