-
Notifications
You must be signed in to change notification settings - Fork 15k
AMDGPU: Use ELF mangling in data layout #163011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lld @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesCloses #95219 Full diff: https://github.com/llvm/llvm-project/pull/163011.diff 6 Files Affected:
diff --git a/llvm/lib/TargetParser/TargetDataLayout.cpp b/llvm/lib/TargetParser/TargetDataLayout.cpp
index cea246e9527bd..950bb2bc557b4 100644
--- a/llvm/lib/TargetParser/TargetDataLayout.cpp
+++ b/llvm/lib/TargetParser/TargetDataLayout.cpp
@@ -258,7 +258,7 @@ static std::string computePowerDataLayout(const Triple &T) {
static std::string computeAMDDataLayout(const Triple &TT) {
if (TT.getArch() == Triple::r600) {
// 32-bit pointers.
- return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
+ return "e-m:e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1";
}
@@ -268,7 +268,7 @@ static std::string computeAMDDataLayout(const Triple &TT) {
// (address space 7), and 128-bit non-integral buffer resourcees (address
// space 8) which cannot be non-trivilally accessed by LLVM memory operations
// like getelementptr.
- return "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
+ return "e-m:e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
"-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-"
"v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-"
"v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9";
diff --git a/llvm/test/CodeGen/AMDGPU/global-constant.ll b/llvm/test/CodeGen/AMDGPU/global-constant.ll
index 866d3a1e30890..b04602aff8e6a 100644
--- a/llvm/test/CodeGen/AMDGPU/global-constant.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-constant.ll
@@ -12,21 +12,21 @@
; Non-R600 OSes use relocations.
; GCN-DEFAULT: s_getpc_b64 s[[[PC0_LO:[0-9]+]]:[[PC0_HI:[0-9]+]]]
-; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC0_LO]], private1@rel32@lo+4
-; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC0_HI]], private1@rel32@hi+12
+; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC0_LO]], .Lprivate1@rel32@lo+4
+; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC0_HI]], .Lprivate1@rel32@hi+12
; GCN-DEFAULT: s_getpc_b64 s[[[PC1_LO:[0-9]+]]:[[PC1_HI:[0-9]+]]]
-; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC1_LO]], private2@rel32@lo+4
-; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC1_HI]], private2@rel32@hi+12
+; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC1_LO]], .Lprivate2@rel32@lo+4
+; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC1_HI]], .Lprivate2@rel32@hi+12
; MESA uses absolute relocations.
-; GCN-MESA: s_add_u32 s2, private1@abs32@lo, s4
-; GCN-MESA: s_addc_u32 s3, private1@abs32@hi, s5
+; GCN-MESA: s_add_u32 s2, .Lprivate1@abs32@lo, s4
+; GCN-MESA: s_addc_u32 s3, .Lprivate1@abs32@hi, s5
; PAL uses absolute relocations.
-; GCN-PAL: s_add_u32 s2, private1@abs32@lo, s4
-; GCN-PAL: s_addc_u32 s3, private1@abs32@hi, s5
-; GCN-PAL: s_add_u32 s4, private2@abs32@lo, s4
-; GCN-PAL: s_addc_u32 s5, private2@abs32@hi, s5
+; GCN-PAL: s_add_u32 s2, .Lprivate1@abs32@lo, s4
+; GCN-PAL: s_addc_u32 s3, .Lprivate1@abs32@hi, s5
+; GCN-PAL: s_add_u32 s4, .Lprivate2@abs32@lo, s4
+; GCN-PAL: s_addc_u32 s5, .Lprivate2@abs32@hi, s5
; R600-LABEL: private_test
define amdgpu_kernel void @private_test(i32 %index, ptr addrspace(1) %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll b/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll
index b8cfcbf2d2665..6d55e79edbef6 100644
--- a/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll
@@ -14,8 +14,8 @@
; CHECK-LABEL: private_test:
; CHECK: s_getpc_b64 s[[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]]
-; CHECK: s_add_u32 s[[ADDR_LO:[0-9]+]], s[[PC_LO]], private@rel32@lo+8
-; CHECK: s_addc_u32 s[[ADDR_HI:[0-9]+]], s[[PC_HI]], private@rel32@hi+16
+; CHECK: s_add_u32 s[[ADDR_LO:[0-9]+]], s[[PC_LO]], .Lprivate@rel32@lo+8
+; CHECK: s_addc_u32 s[[ADDR_HI:[0-9]+]], s[[PC_HI]], .Lprivate@rel32@hi+16
; CHECK: s_load_dword s{{[0-9]+}}, s[[[ADDR_LO]]:[[ADDR_HI]]]
define amdgpu_kernel void @private_test(ptr addrspace(1) %out) {
%ptr = getelementptr [256 x i32], ptr addrspace(1) @private, i32 0, i32 1
@@ -153,7 +153,7 @@ define amdgpu_kernel void @external_w_init_test(ptr addrspace(1) %out) {
ret void
}
-; CHECK: .local private
+; CHECK: .local .Lprivate
; CHECK: .local internal
; CHECK: .weak linkonce
; CHECK: .weak weak
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll b/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
index 63e9eef3297a1..66b795876d70e 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
@@ -315,7 +315,7 @@ define amdgpu_kernel void @test_small_memcpy_i64_global_to_global_align16(ptr ad
; FUNC-LABEL: {{^}}test_memcpy_const_string_align4:
; SI: s_getpc_b64
-; SI: s_add_u32 s{{[0-9]+}}, s{{[0-9]+}}, hello.align4@rel32@lo+4
+; SI: s_add_u32 s{{[0-9]+}}, s{{[0-9]+}}, .Lhello.align4@rel32@lo+4
; SI: s_addc_u32
; SI-DAG: s_load_dwordx8
; SI-DAG: s_load_dwordx2
diff --git a/llvm/test/CodeGen/AMDGPU/naked-fn-with-frame-pointer.ll b/llvm/test/CodeGen/AMDGPU/naked-fn-with-frame-pointer.ll
index 5ff2d82c1464f..2509497bbcde7 100644
--- a/llvm/test/CodeGen/AMDGPU/naked-fn-with-frame-pointer.ll
+++ b/llvm/test/CodeGen/AMDGPU/naked-fn-with-frame-pointer.ll
@@ -5,8 +5,8 @@ declare dso_local void @main()
define dso_local void @naked() naked "frame-pointer"="all" {
; CHECK-LABEL: naked:
-; CHECK: naked$local:
-; CHECK-NEXT: .type naked$local,@function
+; CHECK: .Lnaked$local:
+; CHECK-NEXT: .type .Lnaked$local,@function
; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_getpc_b64 s[16:17]
@@ -19,8 +19,8 @@ define dso_local void @naked() naked "frame-pointer"="all" {
define dso_local void @normal() "frame-pointer"="all" {
; CHECK-LABEL: normal:
-; CHECK: normal$local:
-; CHECK-NEXT: .type normal$local,@function
+; CHECK: .Lnormal$local:
+; CHECK-NEXT: .type .Lnormal$local,@function
; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_mov_b32 s16, s33
diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected
index 429bee4195fa9..a8c2531117f42 100644
--- a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected
+++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected
@@ -65,8 +65,8 @@ define dso_local i32 @main() #0 {
attributes #0 = { noredzone nounwind ssp uwtable "frame-pointer"="all" }
; CHECK-LABEL: check_boundaries:
-; CHECK: check_boundaries$local:
-; CHECK-NEXT: .type check_boundaries$local,@function
+; CHECK: .Lcheck_boundaries$local:
+; CHECK-NEXT: .type .Lcheck_boundaries$local,@function
; CHECK-NEXT: .cfi_startproc
; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
@@ -107,8 +107,8 @@ attributes #0 = { noredzone nounwind ssp uwtable "frame-pointer"="all" }
; CHECK-NEXT: s_setpc_b64 s[30:31]
;
; CHECK-LABEL: main:
-; CHECK: main$local:
-; CHECK-NEXT: .type main$local,@function
+; CHECK: .Lmain$local:
+; CHECK-NEXT: .type .Lmain$local,@function
; CHECK-NEXT: .cfi_startproc
; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
|
Clang still has the data layouts hard-coded in lib/Basic/Targets. In an ideal world we'd get rid of that first... some tests also include the data layout, I think that could be removed. |
d4aca48
to
0a83907
Compare
0a83907
to
a23bc0d
Compare
@rnk seems to have gotten as far as moving it to TargetParser, but clang isn't using it yet.
Not sure if lld is one of the tools that auto-fill the datalayout from the triple. Also it appears we're missing a proper error handling for the datalayout mismatch, these tests are hitting an assertion |
a23bc0d
to
22eaf7d
Compare
LGTM once all tests pass
Yeah, that's particularly annoying when working with a release build of LLVM. Is there a use case for specifying a data layout in a module that doesn't match the implied data layout from the triple? |
22eaf7d
to
fc32794
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the unfortunate byproduct of making every HIP/OpenMP compilation emit like 50 lines of warnings about conflicting data layouts with the ROCm Device libs.
We probably need an exception or something in the IR linker to avoid printing warnings for a case like this. I think we do the exact same thing for the CUDA device library IR. |
We should just delete the hardcoded datalayouts out of device libs (really we should delete the IR there altogether, but the datalayout is unnecessary) |
It will take quite some time for those changes to land and get propagated from ROCm and it's unacceptable for a trivial OpenMP / HIP program to emit 50+ lines of warnings for six months. |
Summary: The changes in llvm#163011 caused all ELF platforms to default to ELF mangling. We want to auto upgrade this for linking in new programs to old ones.
Summary: The changes in llvm#163011 caused all ELF platforms to default to ELF mangling. We want to auto upgrade this for linking in new programs to old ones. contains
Summary: The changes in #163011 caused all ELF platforms to default to ELF mangling. We want to auto upgrade this for linking in new programs to old ones.
…3644) Summary: The changes in llvm/llvm-project#163011 caused all ELF platforms to default to ELF mangling. We want to auto upgrade this for linking in new programs to old ones.
Closes #95219