You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment Pygame CE uses SIMD (SSE2 & NEON, AVX2) to accelerate the blitting functions. In some cases, particularly with large surfaces, this can make the operations ten times faster than the same operation done with normal SISD code. This is particularly exciting where the performance is increased to such an extent it can enable new effect options to perform a full screen transform operation every frame of the game loop (in real-time) rather than as a one-off, or leaving it only available practically for smaller surfaces.
The obvious next place to take this optimisation strategy is to the transform submodule which also operates on Surfaces, pixel by pixel.
To get started, before beginning to add SSE2 and AVX2 variants of each individual function, there needs to be a setup stage adding the necessary architecture similar to what we have already created for the surface submodule.
Places to look for guidance
line 74 of setup.py avx2_filenames = ['simd_blitters_avx2'] - new AVX2 files will need adding to this list to be compiled with avx2 enabled.
simd_blitters.h
simd_blitters_avx2.h
simd_blitters_sse2.h
alphablit.c
Setup.SDL2.in
I would begin by cloning this structure with new files for the _blitters group and additions to the others where it makes sense.
You could also test the setup with a simple fake transform operation like adding red - (255,0,0) to every pixel in a surface and testing it works the same in each of the three basic modes (no SIMD, SSE2 & AVX2).
After the setup is done, I think the new greyscale transform would be the best option for a smooth conversion to SIMD.
Anyone feel free to take this on
The text was updated successfully, but these errors were encountered:
The image module to/from bytes are also good opportunities for SIMD. Right now they act like they use SSE4.2 but I believe the SSE4.2 is never switched on, because it was added before we solved SIMD build/runtime issues.
Another SIMD project could be to rewrite the transform.smoothscale filters to use intrinsics so we can easily compile them on Neon (using SSE2Neon) and get rid of compiler specific hardcoded assembly in the project. There's also some MMX assembly there we could drop.
At the moment Pygame CE uses SIMD (SSE2 & NEON, AVX2) to accelerate the blitting functions. In some cases, particularly with large surfaces, this can make the operations ten times faster than the same operation done with normal SISD code. This is particularly exciting where the performance is increased to such an extent it can enable new effect options to perform a full screen transform operation every frame of the game loop (in real-time) rather than as a one-off, or leaving it only available practically for smaller surfaces.
The obvious next place to take this optimisation strategy is to the
transform
submodule which also operates on Surfaces, pixel by pixel.To get started, before beginning to add SSE2 and AVX2 variants of each individual function, there needs to be a setup stage adding the necessary architecture similar to what we have already created for the surface submodule.
Places to look for guidance
avx2_filenames = ['simd_blitters_avx2']
- new AVX2 files will need adding to this list to be compiled with avx2 enabled.simd_blitters.h
simd_blitters_avx2.h
simd_blitters_sse2.h
alphablit.c
Setup.SDL2.in
I would begin by cloning this structure with new files for the
_blitters
group and additions to the others where it makes sense.You could also test the setup with a simple fake transform operation like adding red - (255,0,0) to every pixel in a surface and testing it works the same in each of the three basic modes (no SIMD, SSE2 & AVX2).
After the setup is done, I think the new greyscale transform would be the best option for a smooth conversion to SIMD.
Anyone feel free to take this on
The text was updated successfully, but these errors were encountered: