Skip to content

Conversation

tannergooding
Copy link
Member

No description provided.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 12, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding tannergooding changed the title Improve codegen for Vector512.ExtractMostSignificatBits Improve codegen for ExtractMostSignificatBits Dec 12, 2024
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib

Simple improvement to ExtractMostSignificantBits to use the EVEX mask support for short/ushort. This gives some significant codegen improvements, FullOpts (-10,667 bytes), including to known impactful SIMD functions supporting char. There are no throughput differences.

The diffs typically look like:

; Vector128 Example, saves 3 bytes
-       vpshufb  xmm0, xmm9, xmmword ptr [reloc @RWD00]
-       vpmovmskb ecx, xmm0
+       vpmovw2m k1, xmm9
+       kmovb    ecx, k1

; Vector256 Example, saves 9 bytes
-       vpshufb  ymm0, ymm0, ymmword ptr [reloc @RWD00]
-       vpermq   ymm0, ymm0, -40
-       vpmovmskb eax, xmm0
+       vpmovw2m k1, ymm0
+       kmovw    eax, k1

Additionally it adds the minimal support to DecomposeLongs so that Vector512.ExtractMostSignificantBits can be used on x86 (32-bit).

@tannergooding tannergooding marked this pull request as ready for review December 13, 2024 07:34
@tannergooding
Copy link
Member Author

This should be ready for review now

@BruceForstall BruceForstall changed the title Improve codegen for ExtractMostSignificatBits Improve codegen for ExtractMostSignificantBits Dec 14, 2024
@tannergooding tannergooding merged commit 3aa1ec5 into dotnet:main Dec 14, 2024
115 checks passed
@tannergooding tannergooding deleted the msk-ushort branch December 14, 2024 02:49
hez2010 pushed a commit to hez2010/runtime that referenced this pull request Dec 14, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants