-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative output for i32x4_relaxed_trunc.wast tests #140
base: main
Are you sure you want to change the base?
Conversation
|
||
(assert_return (invoke "i32x4.relaxed_trunc_f32x4_u" | ||
(v128.const f32x4 nan -nan nan:0x444444 -nan:0x444444)) | ||
;; nans -> 0 or UINT32_MAX | ||
(either (v128.const i32x4 0 0 0 0) | ||
(v128.const i32x4 0xffffffff 0xffffffff 0xffffffff 0xffffffff))) | ||
(v128.const i32x4 0xffffffff 0xffffffff 0xffffffff 0xffffffff) | ||
(v128.const i32x4 0x80000000 0x80000000 0x80000000 0x80000000))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x800000 comes from VCVTTPS2UDQ?
Think will be nice to comment here to say which cases lead to which results.
I should have documented it more properly in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that but I cannot refer the algorithm/recipe by name, or issue that defines these. I'll just refer SM and V8 as source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it is different on V8, it looks like c0000000, c0000000, c4444400, c4444400
. SM has 0x8000000
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -73,7 +75,8 @@ | |||
(v128.const f64x2 -1.0 4294967296.0)) | |||
;; out of range -> saturate or UINT32_MAX | |||
(either (v128.const i32x4 0 0xffffffff 0 0) | |||
(v128.const i32x4 0xffffffff 0xffffffff 0 0))) | |||
(v128.const i32x4 0xffffffff 0xffffffff 0 0) | |||
(v128.const i32x4 0xfffffffe 0 0 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we get 0xfffffffe
? This isn't valid, it should only be UINT32_MAX or saturate (based on #21)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC when I was stepping in the debugger: 0xfffffffe comes from addpd
after -1.0 + 4503599627370496.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the addpd from implementing this relaxed instruction using the simd128 one? (i32x4.trunc_sat_f64x2_s_zero)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V8 (turbofan?) generates:
0x9923509f796 16 c4e37909c00b vroundpd xmm0,xmm0,0xb
0x9923509f79c 1c 49ba90e7fa2001000000 REX.W movq r10,0x120fae790
0x9923509f7a6 26 c4c1795802 vaddpd xmm0,xmm0,[r10]
0x9923509f7ab 2b c4c178c6c788 vshufps xmm0,xmm0,xmm15,0x88
(notice that liftoff generates trunc_sat_f64x2_s_zero code, not relaxed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that implementation comes from WebAssembly/simd#383 "x86/x86-64 processors with AVX instruction set".
That codegen should return a 0. If it doesn't then maybe there's a bug in the generated code i think.
Using the trunc_sat instruction to implement the relaxed instruction is always correct (for all the relaxed instructions). This is so that in deterministic mode, we can fall back to SIMD128 instructions.
c8f02b4
to
e7a2158
Compare
I wonder if it will be easier to (formally?) define algorithms and then use that as a base. |
More analysis, V8 and SpiderMonkey somewhat similar algorithms for i32x4.relaxed_trunc_f32x4_u : i32x4.relaxed_trunc_f32x4_u (v8):
i32x4.relaxed_trunc_f32x4_u (sm):
The i32x4.relaxed_trunc_f64x2_u_zero pretty much identical: i32x4.relaxed_trunc_f64x2_u_zero (v8):
i32x4.relaxed_trunc_f64x2_u_zero (sm):
The
|
Additional variants for i32x4.relaxed_trunc_f32x4_u and i32x4.relaxed_trunc_f64x2_u_zero based on algorithms implemented by SpiderMonkey and v8.
e7a2158
to
0f40696
Compare
For i32x4.relaxed_trunc_f64x2_u_zero, are we missing some instructions? Marat's suggested codegen is:
I don't see xorpd, vmaxpd, vminpd, in your analysis. The vmaxpd should get rid of the |
|
Oh yes, thanks for the pointer. Was looking at the saturated one. |
The algorithm used in V8 is wrong, it should fall back to SIMD trunc + saturate semantics. So really, the implementation is the same pre AVX512F. Edit: had offline discussion with Marat, he prefers to add allow these constants, as long as they don't open up more fingerprinting. |
Let's wait for #144 to land (spec changes) then we can merge this. |
Additional variants for i32x4.relaxed_trunc_f32x4_u and i32x4.relaxed_trunc_f64x2_u_zero based on algorithms implemented by SpiderMonkey and v8.