Ensure various scalar cross platform helper APIs are handled directly as intrinsic #80789

tannergooding · 2023-01-18T15:25:52Z

Much like with APIs directly exposed on Vector64/128/256/512, several of the APIs exposed on BitOperations are "cross platform helper APIs" and are used in various perf critical code paths.

However, unlike the vector APIs, the bit operation APIs were not directly handled as intrinsic and only ever executed a software fallback which included manual dispatch to the relevant underlying hardware intrinsics. This works well most of the time, but it does have the side effect of reducing/impacting JIT throughput, being subject to inlining heuristics, and was not able to participate in constant folding.

This PR updates those APIs to be directly imported as the relevant hardware intrinsic, when supported, and to more generally support constant folding.

ghost · 2023-01-18T15:26:03Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Much like with APIs directly exposed on Vector64/128/256/512, several of the APIs exposed on BitOperations are "cross platform helper APIs" and are used in various perf critical code paths.

However, unlike the vector APIs, the bit operation APIs were not directly handled as intrinsic and only ever executed a software fallback which included manual dispatch to the relevant underlying hardware intrinsics. This works well most of the time, but it does have the side effect of reducing/impacting JIT throughput, being subject to inlining heuristics, and was not able to participate in constant folding.

This PR updates those APIs to be directly imported as the relevant hardware intrinsic, when supported, and to more generally support constant folding.

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2023-01-18T15:27:29Z

src/coreclr/jit/importercalls.cpp

+                // Signed needs to throw for negative inputs, so fallback to software impl
+                return nullptr;


We could handle this specially with a check for negative values, but it is a more complex change and so I decided to push it out to a later PR.

tannergooding · 2023-01-18T15:29:53Z

src/coreclr/jit/importercalls.cpp

+            if (compOpportunisticallyDependsOn(InstructionSet_POPCNT))
+            {
+                // Pop the value from the stack
+                impPopStack();
+
+                hwintrinsic = varTypeIsLong(retType) ? NI_POPCNT_X64_PopCount : NI_POPCNT_PopCount;
+                return gtNewScalarHWIntrinsicNode(retType, op1, hwintrinsic);
+            }


Since there isn't an "always available" instruction for x86/x64, we should probably import this as GenTreeIntrinsic much as happens for various Math APIs like Sin, Cos, and other APIs.

Doing so would allow us to still perform post import constant folding and then transform this back into a GT_CALL during rationalization on older hardware.

However, given it is a more complex change I opted to push it out to a later PR.

For Arm64, we could do the same or we could add basic SIMD constant folding support for PopCount, AddAcross, and ToScalar. CreateScalarNode will already generate a GT_CNS_VEC where applicable, including post import.

… as intrinsic

tannergooding · 2023-01-20T19:13:11Z

Not a perfect diff due to 40 missed contexts, but still good overall and showing some TP improvements + positive diffs.

There is notably a very small +0.01% TP regression for Arm64 minopts

tannergooding · 2023-01-20T19:47:48Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2023-01-20T19:48:25Z

Azure Pipelines successfully started running 3 pipeline(s).

tannergooding · 2023-01-26T21:13:47Z

CC. @dotnet/jit-contrib, this is ready for review.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 18, 2023

ghost assigned tannergooding Jan 18, 2023

tannergooding commented Jan 18, 2023

View reviewed changes

tannergooding force-pushed the fold-hwintrin branch 2 times, most recently from 46a98b0 to 20e1693 Compare January 18, 2023 19:00

build-analysis bot mentioned this pull request Jan 18, 2023

Tracking issue for CI build timeouts #76454

Closed

tannergooding force-pushed the fold-hwintrin branch 4 times, most recently from 392fc87 to cd5959a Compare January 19, 2023 00:00

runfoapp bot mentioned this pull request Jan 19, 2023

Infra improvements for Helix #68176

Closed

tannergooding force-pushed the fold-hwintrin branch 6 times, most recently from 37d0475 to 76b84c5 Compare January 19, 2023 23:20

Ensure various scalar cross platform helper APIs are handled directly…

a527d5a

… as intrinsic

tannergooding force-pushed the fold-hwintrin branch from 76b84c5 to a527d5a Compare January 20, 2023 01:13

Small refactoring to lookupNamedIntrinsic and impIntrinsic to improve TP

6483f2e

tannergooding force-pushed the fold-hwintrin branch from e3f2d19 to 6483f2e Compare January 20, 2023 14:36

tannergooding marked this pull request as ready for review January 20, 2023 19:11

build-analysis bot mentioned this pull request Jan 21, 2023

tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror failure #80666

Closed

Merge remote-tracking branch 'dotnet/main' into fold-hwintrin

4736513

BruceForstall approved these changes Jan 27, 2023

View reviewed changes

tannergooding merged commit 9ad75b4 into dotnet:main Jan 27, 2023

tannergooding deleted the fold-hwintrin branch January 27, 2023 20:48

This was referenced Jan 31, 2023

[Perf] Windows/x64: 31 Improvements on 1/27/2023 11:43:56 PM dotnet/perf-autofiling-issues#12331

Closed

[Perf] Windows/x64: 15 Improvements on 1/27/2023 11:43:56 PM dotnet/perf-autofiling-issues#12296

Closed

MichalStrehovsky mentioned this pull request Feb 1, 2023

Assertion failed 'value != 0' in 'System.Tests.DoubleTests_GenericMath:GetExponentShortestBitLengthTest()' during 'Do value numbering' #81460

Closed

ghost locked as resolved and limited conversation to collaborators Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure various scalar cross platform helper APIs are handled directly as intrinsic #80789

Ensure various scalar cross platform helper APIs are handled directly as intrinsic #80789

tannergooding commented Jan 18, 2023

ghost commented Jan 18, 2023

tannergooding Jan 18, 2023

tannergooding Jan 18, 2023

tannergooding Jan 18, 2023

tannergooding commented Jan 20, 2023 •

edited

Loading

tannergooding commented Jan 20, 2023

azure-pipelines bot commented Jan 20, 2023

tannergooding commented Jan 26, 2023

		// Signed needs to throw for negative inputs, so fallback to software impl
		return nullptr;

Ensure various scalar cross platform helper APIs are handled directly as intrinsic #80789

Ensure various scalar cross platform helper APIs are handled directly as intrinsic #80789

Conversation

tannergooding commented Jan 18, 2023

ghost commented Jan 18, 2023

tannergooding Jan 18, 2023

Choose a reason for hiding this comment

tannergooding Jan 18, 2023

Choose a reason for hiding this comment

tannergooding Jan 18, 2023

Choose a reason for hiding this comment

tannergooding commented Jan 20, 2023 • edited Loading

tannergooding commented Jan 20, 2023

azure-pipelines bot commented Jan 20, 2023

tannergooding commented Jan 26, 2023

tannergooding commented Jan 20, 2023 •

edited

Loading