-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SystemZ] Add support for half (fp16) #109164
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
@llvm/pr-subscribers-backend-systemz Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations. |
Thanks for pointing that out - patch updated. |
I think we should define and implement a proper ABI for the half type as well. |
Patch updated after some progress... With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers. At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h... Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability. Any other comments also welcome... |
My understanding is that in GCC's From your first two sentences it sounds like @uweigand mentioned figuring out an ABI for A quick check seems to show that GCC 13 does not support Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975. |
From what I can see in the libgcc sources, I never see
Yes, we're working on that. What we're planning to do is to have
Yes, we'd have to add those. I don't think we want
Thanks for pointing this out! |
I think this is accurate, libgcc just appears to (reasonably) not provide any f16-related symbols on platforms where GCC doesn't support For that reason we just always provide the symbols in rust's compiler-builtins (though we let LLVM figure out that
That is great news, especially considering how problematic the target-feature-dependent ABI on x86-32 has been. |
Patch reworked:
(twoaddr-kill.mir test updated as the hard-coded register class enum value for GRH32BitRegClass has changed.) Still some more points to go over, but it seems to be working fairly well at this point.
|
Patch improved further:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review, but some general comments inline.
setOperationAction(ISD::FCOS, VT, Expand); | ||
setOperationAction(ISD::FSINCOS, VT, Expand); | ||
setOperationAction(ISD::FREM, VT, Expand); | ||
setOperationAction(ISD::FPOW, VT, Expand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be Promote just like all the other f16 operations? Expand triggers a libcall, which doesn't match the excess-precision setting - also, we actually don't have f16 libcalls in libm ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, if there are no f16 libcalls it works to have them be promoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just crosslinking that there is an effort to add f16 libcalls #95250 but I have no clue what the plan is as far as lowering to them.
Updated per review.
Note on compiler-rt: not sure how to build llvm conversion functions and link them (have not tried this yet), but added the mapping in RuntimeLibcalls.cpp. |
if (TT.isSystemZ()) { | ||
setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); | ||
setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these names need to be set, aren't these the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may be right - as I wrote above I have not really tried this before. Would you happen to know how to build and link these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I see they default to the __gnu_
functions in this file. Some targets (wasm, hexagon) manually set it to __extendhfsf2
and __truncsfhf2
in *SelfLowering.cpp
but why do targets like x86 correctly lower to these as well without an override either in this file or in selflowering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding how to build and link, they are in compiler-rt if that can be built
COMPILER_RT_ABI NOINLINE dst_t __truncsfhf2(float a) { |
__trunc
and __extend
are what you want to emit here, I'm just not sure what exactly this file needs to do because it seems like HasLegalHalfType
controls __extend
/__trunc
vs. __gnu_
lowering somehow #109164 (comment).
Improved handling to utilize vector instructions when present. New VR16 regclass, but v8f16 not legal. It might make sense to have it as a legal type and e.g. do VL;VST when moving vectors in memory, and also set all vector ops to "Expand". Not sure how trivial that change would be, given some special handlings of vector nodes, so not done as of now: only the scalar f16 is legal. Seems to work fine to add "16" versions for loads, stores and lzer/lcdfr in case of vector support. Without vector support, it might make sense to have load/store pseudos with an extra GPR def operand, so that these loads/stores can be expanded as a PostRA pseudo. Then it would only need handling in one place, but OTOH having a second explicit def operand is also undesired, maybe. f16 immediates handled like f32:
Should fp16 inline asm operands also be supported at this point? |
Thanks!
Agreed. I don't think we need vector f16 at this point.
I don't think we need to spend much effort optimizing for pre-z13 machines at this point.
The sign-operations are the only instructions we even could semantically use with f16, right? We certainly could do so, but I agree it's probably not important.
Good point. I think so, yes. Also, looks like the clang-format check is complaining a bit ... |
Experiment with soft-promotion in FP regs (not working). Try to make f16 legal instead Atomic loads/stores, spill/reload, tests for __fp16 and half vectors. strict f16 with tests. Review Make use of vector facility if present.
This wouldn't be an optimization, rather just handling the shift + insert sequences needed for older machines in one place, instead of as of now in both lowerLoadF16(), and in loadRegFromStackSlot() (and similarly for stores).
Fixed. Inline-assembly support for half with tests added. Implemented bitcast i16<->f16 with vector support but letting it fall back to store+load bitcasting for older archs. This type of bitcasting is used with inline-assembly. |
Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.
Patch in progress...
Fixes #50374