Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x86 & wasm] Split up double saturating-narrows from i32 #7280

Merged
merged 7 commits into from
Jan 20, 2023

Conversation

rootjalex
Copy link
Member

We can get much better codegen for double saturating narrows from i32 on x86 and wasm. HVX and ARM backends both already do this. Also added tests to simd_op_check

Fixes #7069

Here's an example from x86 (wasm is similar):

Var x("x");
ImageParam in(Int(32), 1);
Func f("f");

f(x) = u8_sat(in(x));
f.vectorize(x, 32);

Target x86("x86-64-linux-avx-avx2-fma-sse41");
f.compile_to_assembly("vpack.asm", f.infer_arguments(), x86);

Previously:

	vpminsd	-96(%rax,%rcx,4), %ymm0, %ymm2
	vpminsd	-64(%rax,%rcx,4), %ymm0, %ymm3
	vpminsd	-32(%rax,%rcx,4), %ymm0, %ymm4
	vpminsd	(%rax,%rcx,4), %ymm0, %ymm5
	vpmaxsd	%ymm1, %ymm2, %ymm2
	vpmaxsd	%ymm1, %ymm3, %ymm3
	vpackusdw	%ymm3, %ymm2, %ymm2
	vpmaxsd	%ymm1, %ymm4, %ymm3
	vpmaxsd	%ymm1, %ymm5, %ymm4
	vpackusdw	%ymm4, %ymm3, %ymm3
	vpermq	$216, %ymm3, %ymm3              # ymm3 = ymm3[0,2,1,3]
	vpermq	$216, %ymm2, %ymm2              # ymm2 = ymm2[0,2,1,3]
	vpackuswb	%ymm3, %ymm2, %ymm2
	vpermq	$216, %ymm2, %ymm2              # ymm2 = ymm2[0,2,1,3]
	vmovdqu	%ymm2, (%r8,%rcx)

Now:

	vmovdqu	-96(%rax,%rcx,4), %ymm0
	vmovdqu	-32(%rax,%rcx,4), %ymm1
	vpackssdw	-64(%rax,%rcx,4), %ymm0, %ymm0
	vpackssdw	(%rax,%rcx,4), %ymm1, %ymm1
	vpermq	$216, %ymm0, %ymm0              # ymm0 = ymm0[0,2,1,3]
	vpermq	$216, %ymm1, %ymm1              # ymm1 = ymm1[0,2,1,3]
	vpackuswb	%ymm1, %ymm0, %ymm0
	vpermq	$216, %ymm0, %ymm0              # ymm0 = ymm0[0,2,1,3]
	vmovdqu	%ymm0, (%r8,%rcx)

@steven-johnson
Copy link
Contributor

https://buildbot.halide-lang.org/master/#/builders/42/builds/693 looks like perhaps a real failure

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootjalex
Copy link
Member Author

@steven-johnson I see the same failure on #7279 , so I don't think either PR is responsible for the failure
https://buildbot.halide-lang.org/master/#/builders/42/builds/692

@steven-johnson
Copy link
Contributor

steven-johnson commented Jan 17, 2023

@steven-johnson I see the same failure on #7279 , so I don't think either PR is responsible for the failure https://buildbot.halide-lang.org/master/#/builders/42/builds/692

great, it's probably another LLVM injection :-/

Let me try to confirm that first

@steven-johnson
Copy link
Contributor

The predicated-load failure isn't happening for me locally with top-of-tree LLVM, so maybe it's a temporary flake; I'm forcing rebuilds on the x64 bots to see if it recurs

@rootjalex
Copy link
Member Author

Only failing test appears unrelated. @steven-johnson think it’s good to go?

@steven-johnson
Copy link
Contributor

Failure is vectorized_gpu_allocation, which I've never seen before as a flake or even an ordinary failure, so let me retry it just a bit first.

@steven-johnson
Copy link
Contributor

The failure is now in our old friend, correctness_atomics, aka "Mr. Flaky", so I think we're good to go

@rootjalex rootjalex merged commit bafd60f into main Jan 20, 2023
@rootjalex rootjalex deleted the rootjalex/x86-double-sat branch January 20, 2023 18:03
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* better x86 double sat-cast + add test

* fix wasm too + test

Co-authored-by: Steven Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[x86] Codegen should split up double saturating-narrows
3 participants