Commit a4ee0e3
committed
AArch64: support bf16 to sf extensions [PR121853]
It looks like during the upstreaming of BF16 we didn't implement the extend
optab for it.
As a result we go through soft-float emulation which results in massive
performance drop in projects using BF16.
As an example, for
float convert(__bf16 value) {
return (float)value;
}
we generate:
convert(__bf16):
stp x29, x30, [sp, -16]!
mov x29, sp
bl __extendbfsf2
ldp x29, x30, [sp], 16
ret
and after this patch
convert:
movi v31.4s, 0
ext v0.16b, v31.16b, v0.16b, #14
ret
We generate an ext with movi because this has same latency as a shift however
it has twice the throughput. The zero vector is zero latency as such in real
workloads this codegen is much better than using shifts.
As a reminder, BF16 -> FP32 is just shifting left 16 bits.
The expand pattern has to rely on generating multiple subregs due to a
restriction that subregs can't chang floating point size and type at the same
time.
I've tried alternative approaches like using the EXT as SF mode, but the
paradoxical subreg of BF -> SF isn't allowed and using an extend doesn't work
because extend is what we're defining.
gcc/ChangeLog:
PR target/121853
* config/aarch64/aarch64-simd.md (extendbfsf2): New.
gcc/testsuite/ChangeLog:
PR target/121853
* gcc.target/aarch64/pr121853_1.c: New test.
* gcc.target/aarch64/pr121853_2.c: New test.
(cherry picked from commit 58ee207)1 parent 21866f2 commit a4ee0e3
File tree
3 files changed
+102
-0
lines changed- gcc
- config/aarch64
- testsuite/gcc.target/aarch64
3 files changed
+102
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3181 | 3181 | | |
3182 | 3182 | | |
3183 | 3183 | | |
| 3184 | + | |
3184 | 3185 | | |
3185 | 3186 | | |
3186 | 3187 | | |
| |||
3190 | 3191 | | |
3191 | 3192 | | |
3192 | 3193 | | |
| 3194 | + | |
| 3195 | + | |
| 3196 | + | |
| 3197 | + | |
| 3198 | + | |
| 3199 | + | |
| 3200 | + | |
| 3201 | + | |
| 3202 | + | |
| 3203 | + | |
| 3204 | + | |
| 3205 | + | |
| 3206 | + | |
| 3207 | + | |
| 3208 | + | |
| 3209 | + | |
| 3210 | + | |
| 3211 | + | |
| 3212 | + | |
| 3213 | + | |
| 3214 | + | |
| 3215 | + | |
| 3216 | + | |
3193 | 3217 | | |
3194 | 3218 | | |
3195 | 3219 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
0 commit comments