-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let lerp lowering incorporate a final cast. #6480
Conversation
This lets it save a few instructions on x86 and arm. cast(UInt(16), lerp(some_u8s)) produces the following, before and after this PR Before: x86: vmovdqu (%r15,%r13), %xmm4 vpmovzxbw -2(%r15,%r13), %ymm5 vpxor %xmm0, %xmm4, %xmm6 vpmovzxbw %xmm6, %ymm6 vpmovzxbw -1(%r15,%r13), %ymm7 vpmullw %ymm6, %ymm5, %ymm5 vpmovzxbw %xmm4, %ymm4 vpmullw %ymm4, %ymm7, %ymm4 vpaddw %ymm4, %ymm5, %ymm4 vpaddw %ymm1, %ymm4, %ymm4 vpmulhuw %ymm2, %ymm4, %ymm4 vpsrlw $7, %ymm4, %ymm4 vpand %ymm3, %ymm4, %ymm4 vmovdqu %ymm4, (%rbx,%r13,2) addq $16, %r13 decq %r10 jne .LBB0_10 arm: ldr q0, [x17] ldur q2, [x17, #-1] ldur q1, [x17, #-2] subs x0, x0, #1 // =1 mvn v3.16b, v0.16b umull v4.8h, v2.8b, v0.8b umull2 v0.8h, v2.16b, v0.16b umlal v4.8h, v1.8b, v3.8b umlal2 v0.8h, v1.16b, v3.16b urshr v1.8h, v4.8h, #8 urshr v2.8h, v0.8h, #8 raddhn v1.8b, v1.8h, v4.8h raddhn v0.8b, v2.8h, v0.8h ushll v0.8h, v0.8b, #0 ushll v1.8h, v1.8b, #0 add x17, x17, #16 // =16 stp q1, q0, [x18, #-16] add x18, x18, #32 // =32 b.ne .LBB0_10 After: x86: vpmovzxbw -2(%r15,%r13), %ymm3 vmovdqu (%r15,%r13), %xmm4 vpxor %xmm0, %xmm4, %xmm5 vpmovzxbw %xmm5, %ymm5 vpmullw %ymm5, %ymm3, %ymm3 vpmovzxbw -1(%r15,%r13), %ymm5 vpmovzxbw %xmm4, %ymm4 vpmullw %ymm4, %ymm5, %ymm4 vpaddw %ymm4, %ymm3, %ymm3 vpaddw %ymm1, %ymm3, %ymm3 vpmulhuw %ymm2, %ymm3, %ymm3 vpsrlw $7, %ymm3, %ymm3 vmovdqu %ymm3, (%rbp,%r13,2) addq $16, %r13 decq %r10 jne .LBB0_10 arm: ldr q0, [x17] ldur q2, [x17, #-1] ldur q1, [x17, #-2] subs x0, x0, #1 // =1 mvn v3.16b, v0.16b umull v4.8h, v2.8b, v0.8b umull2 v0.8h, v2.16b, v0.16b umlal v4.8h, v1.8b, v3.8b umlal2 v0.8h, v1.16b, v3.16b ursra v4.8h, v4.8h, #8 ursra v0.8h, v0.8h, #8 urshr v1.8h, v4.8h, #8 urshr v0.8h, v0.8h, #8 add x17, x17, #16 // =16 stp q1, q0, [x18, #-16] add x18, x18, #32 // =32 b.ne .LBB0_10 So on X86 we skip a pointless and instruction, and on ARM we get a rounding add and shift right instead of a rounding narrowing add shift right followed by a widen.
OSX-x86 bot is down for repairs, probably OK to ignore for this |
Looks all green, ok to land? |
I could have sworn I saw a real test failure yesterday. Let me run the fuzzer test I added overnight to see if I can find one before we merge. |
Running overnight found failures. Do not merge. |
There was a "bug" in the test, but really a difference with master and raises a question about the definition. What should lerp return when the weight is a float and it's greater than one? Is it an undefined value? With this PR we sometimes manage to successfully extrapolate in some cases where we used to wrap. I could make it wrap, as it did before, fairly easily. I think @zvookin originally wrote this lerping code, so tagging him for an opinion on what the behavior should be. I imagine the issue is that on some shader targets the native lerp has UB if the weight is outside [0, 1]? |
I think for widening lerps, extrapolation makes complete sense, and I'd be very surprised with the wrapping behavior. Hopefully that doesn't conflict with some native lerp implementations, perhaps we shouldn't use the native lerp on such platforms if so. |
Specifically it's exprs like ``cast<uint16_t>(lerp(some_u8, some_u8, 1.2f))``` That does not look like it should be able to produce values outside the range of a uint8, but it does with this PR. |
Oh, yeah, that is weird...
…On Thu, Dec 9, 2021 at 3:56 PM Andrew Adams ***@***.***> wrote:
Specifically it's exprs like ``cast<uint16_t>(lerp(some_u8, some_u8,
1.2f))```
That does not look like it should be able to produce values outside the
range of a uint8, but it does with this PR.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6480 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZFZDBLLLXQGT5KPQH6GGTUQE6ZXANCNFSM5JSH7VOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I think even if lerp returns an undefined value for out of range weights, it should produce a uint8 output, even if there's a surrounding cast. I'll change it. |
I'd know why I saw this PR, but what about a hint for the lerp in terms of behavior? Like: |
This lets it save a few instructions on x86 and arm, and probably other CPU targets.
cast(UInt(16), lerp(some_u8s)) produces the following, before and after
this PR
Before:
x86:
arm:
After:
x86:
arm:
So on X86 we skip a pointless and instruction, and on ARM we get a
rounding add and shift right instead of a rounding narrowing add shift
right followed by a widen.