-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Improve the handling of ToScalar and GetElement(0) #86209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis should resolve one of the issues found in #86033 and does so by ensuring that In particular, we import Likewise, we ensure that This helps separate the two distinct considerations of the APIs and ensures we get the best possible codegen for each case.
|
|
cc @dotnet/jit-contrib -- need a reviewer |
|
There are some bad diffs I'm investigating here. I think I'm missing a bit of handling somewhere. |
Should be fixed now. I had put the |
| break; | ||
| } | ||
|
|
||
| case NI_VectorT128_ToScalar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I believe is just for TP reason, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its unnecessary since we're handling decomposition for ToScalar vs GetElement(0) (since it has to be lowered to GetElement(0), GetElement(1))
| GenTree* op2 = node->Op(2); | ||
|
|
||
| if (op2->IsIntegralConst(0)) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need similar change for arm64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Arm64 doesn't have as many considerations since it can't really do containment and so its already doing all the right things as far as I saw
| { | ||
| MakeSrcContained(node, src); | ||
|
|
||
| if (intrinsicId == NI_Vector128_GetElement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I follow this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we were favoring containing the store over containing the load, which leads to larger and slightly slower codegen.
This changed it to favor preserving the original containment (the load) so that we could do the better thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did need to keep part of the handling to ensure regOptional remained handled and cleared. Since we want to prefer the contained store over a non-contained store + a spilled local.
|
Diffs are better now. We see We see |
|
Any other questions or feedback on this @kunalspathak ? |
| bool foundUse = BlockRange().TryGetUse(node, &use); | ||
| bool fromUnsigned = false; | ||
|
|
||
| GenTreeCast* cast = comp->gtNewCastNode(TYP_INT, node, fromUnsigned, simdBaseType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fromUnsigned is false here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. pextrb/pextrw already zero-extend to the upper bits. So we only need to insert a cast for sign-extension (e.g. when the source type is TYP_BYTE or TYP_SHORT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's short comment a couple lines up that explains why its just for those and not for all small types.
kunalspathak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This should resolve one of the issues found in #86033 and does so by ensuring that
ToScalarandGetElementare more equally handled.In particular, we import
GetElement(0)asToScalarwhere feasible to avoid carrying around the unnecessary constant (minimally helping throughput). This also allows us to more trivially handle the special case semantics for getting the 0th element.Likewise, we ensure that
ToScalarsupports thelongon x86 where it needs to be lowered toAsUInt32()+GetElement(0)+GetElement(1), that it supports constant folding in VN, and that it supports containment as part of a store.This helps separate the two distinct considerations of the APIs and ensures we get the best possible codegen for each case.