Commit 5c367b8
committed
[libclc] Optimize generic CLC fmin/fmax
The CLC fmin/fmax builtins now use clang's
__builtin_elementwise_(min|max) which helps us generate
llvm.(min|max)num intrinsics directly. These intrinsics select the
non-NAN input over the NAN input, which adheres to the OpenCL
specification. Note that the OpenCL specification doesn't require
support for sNAN, so returning qNAN over sNAN is acceptable. Note also
that the intrinsics don't differentiate between -0.0 and +0.0; this does
not appear to be required - going by the OpenCL CTS, at least.
These intrinsics maintain the vector types, as opposed to scalarizing,
which was previously happening. This commit therefore helps to optimize
codegen for those targets.1 parent 4609b6a commit 5c367b8
2 files changed
+8
-50
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | 9 | | |
11 | | - | |
12 | 10 | | |
13 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
14 | 14 | | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 15 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | 9 | | |
11 | | - | |
12 | 10 | | |
13 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
14 | 14 | | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 15 | + | |
0 commit comments