Improve the handling of ToScalar and GetElement(0) #86209

tannergooding · 2023-05-13T16:02:06Z

This should resolve one of the issues found in #86033 and does so by ensuring that ToScalar and GetElement are more equally handled.

In particular, we import GetElement(0) as ToScalar where feasible to avoid carrying around the unnecessary constant (minimally helping throughput). This also allows us to more trivially handle the special case semantics for getting the 0th element.

Likewise, we ensure that ToScalar supports the long on x86 where it needs to be lowered to AsUInt32() + GetElement(0) + GetElement(1), that it supports constant folding in VN, and that it supports containment as part of a store.

This helps separate the two distinct considerations of the APIs and ensures we get the best possible codegen for each case.

ghost · 2023-05-13T16:02:17Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This should resolve one of the issues found in #86033 and does so by ensuring that ToScalar and GetElement are more equally handled.

In particular, we import GetElement(0) as ToScalar where feasible to avoid carrying around the unnecessary constant (minimally helping throughput). This also allows us to more trivially handle the special case semantics for getting the 0th element.

Likewise, we ensure that ToScalar supports the long on x86 where it needs to be lowered to AsUInt32() + GetElement(0) + GetElement(1), that it supports constant folding in VN, and that it supports containment as part of a store.

This helps separate the two distinct considerations of the APIs and ensures we get the best possible codegen for each case.

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2023-05-15T15:42:09Z

cc @dotnet/jit-contrib -- need a reviewer

tannergooding · 2023-05-15T15:43:16Z

There are some bad diffs I'm investigating here. I think I'm missing a bit of handling somewhere.

tannergooding · 2023-05-15T17:11:28Z

There are some bad diffs I'm investigating here. I think I'm missing a bit of handling somewhere.

Should be fixed now. I had put the ToScalar VN handling in FunBinary when it should've been in FunUnary, killing constant folding support for the most prevalent GetElement operation was not good 😆

kunalspathak · 2023-05-15T17:54:05Z

src/coreclr/jit/simdashwintrinsic.cpp

            break;
        }

-        case NI_VectorT128_ToScalar:


This I believe is just for TP reason, right?

Its unnecessary since we're handling decomposition for ToScalar vs GetElement(0) (since it has to be lowered to GetElement(0), GetElement(1))

src/coreclr/jit/lowerxarch.cpp

kunalspathak · 2023-05-15T17:58:03Z

src/coreclr/jit/lowerxarch.cpp

    GenTree* op2 = node->Op(2);

+    if (op2->IsIntegralConst(0))
+    {


we don't need similar change for arm64?

No. Arm64 doesn't have as many considerations since it can't really do containment and so its already doing all the right things as far as I saw

kunalspathak · 2023-05-15T18:02:15Z

src/coreclr/jit/lowerxarch.cpp

            {
                MakeSrcContained(node, src);
-
-                if (intrinsicId == NI_Vector128_GetElement)


Not sure if I follow this change.

Previously we were favoring containing the store over containing the load, which leads to larger and slightly slower codegen.

This changed it to favor preserving the original containment (the load) so that we could do the better thing.

I did need to keep part of the handling to ensure regOptional remained handled and cleared. Since we want to prefer the contained store over a non-contained store + a spilled local.

tannergooding · 2023-05-15T21:47:28Z

Diffs are better now.

We see -2868 bytes on Arm64, largely from constant folding.

We see -1711 bytes on x64 with a couple small regressions. These largely look to be due to different CSE decisions thanks to the different VN or register selection and most often present as movaps reg, reg being inserted somewhere it didn't exist previously.

tannergooding · 2023-05-16T18:25:01Z

Any other questions or feedback on this @kunalspathak ?

kunalspathak · 2023-05-16T18:39:39Z

src/coreclr/jit/lowerxarch.cpp

+        bool foundUse     = BlockRange().TryGetUse(node, &use);
+        bool fromUnsigned = false;
+
+        GenTreeCast* cast = comp->gtNewCastNode(TYP_INT, node, fromUnsigned, simdBaseType);


fromUnsigned is false here?

Yes. pextrb/pextrw already zero-extend to the upper bits. So we only need to insert a cast for sign-extension (e.g. when the source type is TYP_BYTE or TYP_SHORT)

There's short comment a couple lines up that explains why its just for those and not for all small types.

kunalspathak

LGTM

Improve the handling of ToScalar and GetElement(0)

3f6c507

ghost assigned tannergooding May 13, 2023

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 13, 2023

tannergooding mentioned this pull request May 13, 2023

Regressions in System.Numerics.Tests.Perf_Vector3 #86033

Closed

Fix build failure

c70115f

tannergooding added 2 commits May 15, 2023 09:21

Merge remote-tracking branch 'dotnet/main' into getelement-0

44fe69a

Ensure ToScalar handling is in EvalHWIntrinsicFunUnary, not FunBinary

a9c205b

kunalspathak reviewed May 15, 2023

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved

kunalspathak reviewed May 15, 2023

View reviewed changes

runfoapp bot mentioned this pull request May 15, 2023

Infra improvements for Helix #68176

Closed

Ensure we don't regress codegen for small types

e4e7258

kunalspathak reviewed May 16, 2023

View reviewed changes

kunalspathak approved these changes May 16, 2023

View reviewed changes

tannergooding merged commit 81d039e into dotnet:main May 16, 2023

tannergooding deleted the getelement-0 branch May 16, 2023 18:48

This was referenced May 30, 2023

[Perf] Windows/x64: 47 Improvements on 5/16/2023 11:15:26 PM dotnet/perf-autofiling-issues#18215

Closed

[Perf] Windows/x86: 41 Improvements on 5/16/2023 11:15:26 PM dotnet/perf-autofiling-issues#18156

Closed

ghost locked as resolved and limited conversation to collaborators Jun 15, 2023

Improve the handling of ToScalar and GetElement(0) #86209

Improve the handling of ToScalar and GetElement(0) #86209

Uh oh!

Conversation

tannergooding commented May 13, 2023

Uh oh!

ghost commented May 13, 2023

Uh oh!

AndyAyersMS commented May 15, 2023

Uh oh!

tannergooding commented May 15, 2023

Uh oh!

tannergooding commented May 15, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding commented May 15, 2023

Uh oh!

tannergooding commented May 16, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding May 15, 2023 •

edited

Loading

tannergooding May 15, 2023 •

edited

Loading