Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PPC][AIX] Save/restore r31 when using base pointer #100182

Merged
merged 5 commits into from
Aug 7, 2024

Conversation

syzaara
Copy link
Contributor

@syzaara syzaara commented Jul 23, 2024

When the base pointer r30 is used to hold the stack pointer, r30 is spilled in the prologue. On AIX registers are saved from highest to lowest, so r31 also needs to be saved.

Fixes #96411

…epilogue

When the base pointer r30 is used to hold the stack pointer, r30 is spilled
in the prologue. On AIX registers are saved from highest to lowest, so r31
also needs to be saved. Setting needsFP to true on AIX when the base pointer
is used allows r31 to also be saved and restored.
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 23, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Zaara Syeda (syzaara)

Changes

…epilogue

When the base pointer r30 is used to hold the stack pointer, r30 is spilled in the prologue. On AIX registers are saved from highest to lowest, so r31 also needs to be saved. Setting needsFP to true on AIX when the base pointer is used allows r31 to also be saved and restored.


Patch is 31.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100182.diff

3 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCFrameLowering.cpp (+3)
  • (modified) llvm/test/CodeGen/PowerPC/aix-base-pointer.ll (+10-3)
  • (modified) llvm/test/CodeGen/PowerPC/ppc64-rop-protection-aix.ll (+236-200)
diff --git a/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp b/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
index 1963582ce6863..11332dbd8147c 100644
--- a/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
@@ -376,6 +376,9 @@ bool PPCFrameLowering::needsFP(const MachineFunction &MF) const {
   if (MF.getFunction().hasFnAttribute(Attribute::Naked))
     return false;
 
+  if (Subtarget.isAIXABI() && Subtarget.getRegisterInfo()->hasBasePointer(MF))
+    return true;
+
   return MF.getTarget().Options.DisableFramePointerElim(MF) ||
          MFI.hasVarSizedObjects() || MFI.hasStackMap() || MFI.hasPatchPoint() ||
          MF.exposesReturnsTwice() ||
diff --git a/llvm/test/CodeGen/PowerPC/aix-base-pointer.ll b/llvm/test/CodeGen/PowerPC/aix-base-pointer.ll
index ab222d770360c..cc4f0ee92c5dc 100644
--- a/llvm/test/CodeGen/PowerPC/aix-base-pointer.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-base-pointer.ll
@@ -6,8 +6,9 @@
 
 ; Use an overaligned buffer to force base-pointer usage. Test verifies:
 ; - base pointer register (r30) is saved/defined/restored.
+; - frame pointer register (r31) is saved/defined/restored.
 ; - stack frame is allocated with correct alignment.
-; - Address of %AlignedBuffer is calculated based off offset from the stack
+; - Address of %AlignedBuffer is calculated based off offset from the frame
 ;   pointer.
 
 define float @caller(float %f) {
@@ -19,23 +20,29 @@ define float @caller(float %f) {
 declare void @callee(ptr)
 
 ; 32BIT-LABEL: .caller:
+; 32BIT:         stw 31, -12(1)
 ; 32BIT:         stw 30, -16(1)
 ; 32BIT:         mr 30, 1
 ; 32BIT:         clrlwi  0, 1, 27
 ; 32BIT:         subfic 0, 0, -224
 ; 32BIT:         stwux 1, 1, 0
-; 32BIT:         addi 3, 1, 64
+; 32BIT:         mr 31, 1
+; 32BIT:         addi 3, 31, 64
 ; 32BIT:         bl .callee
 ; 32BIT:         mr 1, 30
+; 32BIT:         lwz 31, -12(1)
 ; 32BIT:         lwz 30, -16(1)
 
 ; 64BIT-LABEL: .caller:
+; 64BIT:         std 31, -16(1)
 ; 64BIT:         std 30, -24(1)
 ; 64BIT:         mr 30, 1
 ; 64BIT:         clrldi  0, 1, 59
 ; 64BIT:         subfic 0, 0, -288
 ; 64BIT:         stdux 1, 1, 0
-; 64BIT:         addi 3, 1, 128
+; 64BIT:         mr 31, 1
+; 64BIT:         addi 3, 31, 128
 ; 64BIT:         bl .callee
 ; 64BIT:         mr 1, 30
+; 64BIT:         ld 31, -16(1)
 ; 64BIT:         ld 30, -24(1)
diff --git a/llvm/test/CodeGen/PowerPC/ppc64-rop-protection-aix.ll b/llvm/test/CodeGen/PowerPC/ppc64-rop-protection-aix.ll
index 8955835f41ea6..318b6d2fc6aa3 100644
--- a/llvm/test/CodeGen/PowerPC/ppc64-rop-protection-aix.ll
+++ b/llvm/test/CodeGen/PowerPC/ppc64-rop-protection-aix.ll
@@ -2297,510 +2297,546 @@ define dso_local zeroext i32 @aligned(ptr nocapture readonly %in) #0 {
 ; BE-P10-LABEL: aligned:
 ; BE-P10:       # %bb.0: # %entry
 ; BE-P10-NEXT:    mflr r0
+; BE-P10-NEXT:    std r31, -8(r1)
 ; BE-P10-NEXT:    std r30, -16(r1)
 ; BE-P10-NEXT:    lis r12, -1
 ; BE-P10-NEXT:    mr r30, r1
 ; BE-P10-NEXT:    std r0, 16(r1)
-; BE-P10-NEXT:    hashst r0, -24(r1)
+; BE-P10-NEXT:    hashst r0, -32(r1)
 ; BE-P10-NEXT:    clrldi r0, r1, 49
 ; BE-P10-NEXT:    subc r0, r12, r0
 ; BE-P10-NEXT:    stdux r1, r1, r0
-; BE-P10-NEXT:    std r31, -8(r30) # 8-byte Folded Spill
-; BE-P10-NEXT:    mr r31, r3
+; BE-P10-NEXT:    std r29, -24(r30) # 8-byte Folded Spill
+; BE-P10-NEXT:    mr r29, r3
 ; BE-P10-NEXT:    lwz r3, 4(r3)
 ; BE-P10-NEXT:    lis r4, 0
-; BE-P10-NEXT:    addi r5, r1, 32764
-; BE-P10-NEXT:    ori r4, r4, 65508
-; BE-P10-NEXT:    stwx r3, r1, r4
-; BE-P10-NEXT:    lwz r3, 12(r31)
+; BE-P10-NEXT:    mr r31, r1
+; BE-P10-NEXT:    ori r4, r4, 65500
+; BE-P10-NEXT:    stwx r3, r31, r4
+; BE-P10-NEXT:    lwz r3, 12(r29)
 ; BE-P10-NEXT:    lis r4, 0
 ; BE-P10-NEXT:    ori r4, r4, 32768
-; BE-P10-NEXT:    stwx r3, r1, r4
-; BE-P10-NEXT:    lwz r3, 20(r31)
+; BE-P10-NEXT:    stwx r3, r31, r4
+; BE-P10-NEXT:    lwz r3, 20(r29)
 ; BE-P10-NEXT:    lis r4, 0
-; BE-P10-NEXT:    ori r4, r4, 65508
-; BE-P10-NEXT:    add r4, r1, r4
-; BE-P10-NEXT:    stw r3, 32764(r1)
+; BE-P10-NEXT:    ori r4, r4, 65500
+; BE-P10-NEXT:    stw r3, 32764(r31)
 ; BE-P10-NEXT:    lis r3, 0
 ; BE-P10-NEXT:    ori r3, r3, 32768
-; BE-P10-NEXT:    add r3, r1, r3
+; BE-P10-NEXT:    add r3, r31, r3
+; BE-P10-NEXT:    add r4, r31, r4
+; BE-P10-NEXT:    addi r5, r31, 32764
 ; BE-P10-NEXT:    bl .callee3[PR]
 ; BE-P10-NEXT:    nop
-; BE-P10-NEXT:    lwz r4, 16(r31)
-; BE-P10-NEXT:    ld r31, -8(r30) # 8-byte Folded Reload
+; BE-P10-NEXT:    lwz r4, 16(r29)
+; BE-P10-NEXT:    ld r29, -24(r30) # 8-byte Folded Reload
 ; BE-P10-NEXT:    add r3, r4, r3
 ; BE-P10-NEXT:    clrldi r3, r3, 32
 ; BE-P10-NEXT:    mr r1, r30
 ; BE-P10-NEXT:    ld r0, 16(r1)
-; BE-P10-NEXT:    ld r30, -16(r1)
+; BE-P10-NEXT:    ld r31, -8(r1)
 ; BE-P10-NEXT:    mtlr r0
-; BE-P10-NEXT:    hashchk r0, -24(r1)
+; BE-P10-NEXT:    ld r30, -16(r1)
+; BE-P10-NEXT:    hashchk r0, -32(r1)
 ; BE-P10-NEXT:    blr
 ;
 ; BE-P9-LABEL: aligned:
 ; BE-P9:       # %bb.0: # %entry
 ; BE-P9-NEXT:    mflr r0
-; BE-P9-NEXT:    std r30, -16(r1)
+; BE-P9-NEXT:    std r31, -8(r1)
 ; BE-P9-NEXT:    lis r12, -1
+; BE-P9-NEXT:    std r30, -16(r1)
 ; BE-P9-NEXT:    mr r30, r1
 ; BE-P9-NEXT:    std r0, 16(r1)
-; BE-P9-NEXT:    hashst r0, -24(r1)
+; BE-P9-NEXT:    hashst r0, -32(r1)
 ; BE-P9-NEXT:    clrldi r0, r1, 49
 ; BE-P9-NEXT:    subc r0, r12, r0
 ; BE-P9-NEXT:    stdux r1, r1, r0
-; BE-P9-NEXT:    std r31, -8(r30) # 8-byte Folded Spill
-; BE-P9-NEXT:    mr r31, r3
+; BE-P9-NEXT:    std r29, -24(r30) # 8-byte Folded Spill
+; BE-P9-NEXT:    mr r29, r3
 ; BE-P9-NEXT:    lwz r3, 4(r3)
 ; BE-P9-NEXT:    lis r4, 0
-; BE-P9-NEXT:    addi r5, r1, 32764
-; BE-P9-NEXT:    ori r4, r4, 65508
-; BE-P9-NEXT:    stwx r3, r1, r4
-; BE-P9-NEXT:    lwz r3, 12(r31)
+; BE-P9-NEXT:    mr r31, r1
+; BE-P9-NEXT:    ori r4, r4, 65500
+; BE-P9-NEXT:    addi r5, r31, 32764
+; BE-P9-NEXT:    stwx r3, r31, r4
+; BE-P9-NEXT:    lwz r3, 12(r29)
 ; BE-P9-NEXT:    lis r4, 0
 ; BE-P9-NEXT:    ori r4, r4, 32768
-; BE-P9-NEXT:    stwx r3, r1, r4
-; BE-P9-NEXT:    lwz r3, 20(r31)
+; BE-P9-NEXT:    stwx r3, r31, r4
+; BE-P9-NEXT:    lwz r3, 20(r29)
 ; BE-P9-NEXT:    lis r4, 0
-; BE-P9-NEXT:    ori r4, r4, 65508
-; BE-P9-NEXT:    stw r3, 32764(r1)
+; BE-P9-NEXT:    ori r4, r4, 65500
+; BE-P9-NEXT:    stw r3, 32764(r31)
 ; BE-P9-NEXT:    lis r3, 0
-; BE-P9-NEXT:    add r4, r1, r4
+; BE-P9-NEXT:    add r4, r31, r4
 ; BE-P9-NEXT:    ori r3, r3, 32768
-; BE-P9-NEXT:    add r3, r1, r3
+; BE-P9-NEXT:    add r3, r31, r3
 ; BE-P9-NEXT:    bl .callee3[PR]
 ; BE-P9-NEXT:    nop
-; BE-P9-NEXT:    lwz r4, 16(r31)
-; BE-P9-NEXT:    ld r31, -8(r30) # 8-byte Folded Reload
+; BE-P9-NEXT:    lwz r4, 16(r29)
+; BE-P9-NEXT:    ld r29, -24(r30) # 8-byte Folded Reload
 ; BE-P9-NEXT:    add r3, r4, r3
 ; BE-P9-NEXT:    clrldi r3, r3, 32
 ; BE-P9-NEXT:    mr r1, r30
 ; BE-P9-NEXT:    ld r0, 16(r1)
+; BE-P9-NEXT:    ld r31, -8(r1)
 ; BE-P9-NEXT:    ld r30, -16(r1)
 ; BE-P9-NEXT:    mtlr r0
-; BE-P9-NEXT:    hashchk r0, -24(r1)
+; BE-P9-NEXT:    hashchk r0, -32(r1)
 ; BE-P9-NEXT:    blr
 ;
 ; BE-P8-LABEL: aligned:
 ; BE-P8:       # %bb.0: # %entry
 ; BE-P8-NEXT:    mflr r0
+; BE-P8-NEXT:    std r31, -8(r1)
 ; BE-P8-NEXT:    std r30, -16(r1)
 ; BE-P8-NEXT:    lis r12, -1
 ; BE-P8-NEXT:    mr r30, r1
 ; BE-P8-NEXT:    std r0, 16(r1)
-; BE-P8-NEXT:    hashst r0, -24(r1)
+; BE-P8-NEXT:    hashst r0, -32(r1)
 ; BE-P8-NEXT:    clrldi r0, r1, 49
 ; BE-P8-NEXT:    subc r0, r12, r0
 ; BE-P8-NEXT:    stdux r1, r1, r0
 ; BE-P8-NEXT:    lis r4, 0
-; BE-P8-NEXT:    std r31, -8(r30) # 8-byte Folded Spill
-; BE-P8-NEXT:    mr r31, r3
+; BE-P8-NEXT:    std r29, -24(r30) # 8-byte Folded Spill
+; BE-P8-NEXT:    mr r29, r3
 ; BE-P8-NEXT:    lwz r3, 4(r3)
-; BE-P8-NEXT:    addi r5, r1, 32764
-; BE-P8-NEXT:    ori r4, r4, 65508
-; BE-P8-NEXT:    stwx r3, r1, r4
+; BE-P8-NEXT:    mr r31, r1
+; BE-P8-NEXT:    ori r4, r4, 65500
+; BE-P8-NEXT:    addi r5, r31, 32764
+; BE-P8-NEXT:    stwx r3, r31, r4
 ; BE-P8-NEXT:    lis r4, 0
-; BE-P8-NEXT:    lwz r3, 12(r31)
+; BE-P8-NEXT:    lwz r3, 12(r29)
 ; BE-P8-NEXT:    ori r4, r4, 32768
-; BE-P8-NEXT:    stwx r3, r1, r4
-; BE-P8-NEXT:    lwz r3, 20(r31)
+; BE-P8-NEXT:    stwx r3, r31, r4
+; BE-P8-NEXT:    lwz r3, 20(r29)
 ; BE-P8-NEXT:    lis r4, 0
-; BE-P8-NEXT:    ori r4, r4, 65508
-; BE-P8-NEXT:    stw r3, 32764(r1)
+; BE-P8-NEXT:    ori r4, r4, 65500
+; BE-P8-NEXT:    stw r3, 32764(r31)
 ; BE-P8-NEXT:    lis r3, 0
-; BE-P8-NEXT:    add r4, r1, r4
+; BE-P8-NEXT:    add r4, r31, r4
 ; BE-P8-NEXT:    ori r3, r3, 32768
-; BE-P8-NEXT:    add r3, r1, r3
+; BE-P8-NEXT:    add r3, r31, r3
 ; BE-P8-NEXT:    bl .callee3[PR]
 ; BE-P8-NEXT:    nop
-; BE-P8-NEXT:    lwz r4, 16(r31)
-; BE-P8-NEXT:    ld r31, -8(r30) # 8-byte Folded Reload
+; BE-P8-NEXT:    lwz r4, 16(r29)
+; BE-P8-NEXT:    ld r29, -24(r30) # 8-byte Folded Reload
 ; BE-P8-NEXT:    add r3, r4, r3
 ; BE-P8-NEXT:    clrldi r3, r3, 32
 ; BE-P8-NEXT:    mr r1, r30
 ; BE-P8-NEXT:    ld r0, 16(r1)
+; BE-P8-NEXT:    ld r31, -8(r1)
 ; BE-P8-NEXT:    ld r30, -16(r1)
-; BE-P8-NEXT:    hashchk r0, -24(r1)
+; BE-P8-NEXT:    hashchk r0, -32(r1)
 ; BE-P8-NEXT:    mtlr r0
 ; BE-P8-NEXT:    blr
 ;
 ; BE-32BIT-P10-LABEL: aligned:
 ; BE-32BIT-P10:       # %bb.0: # %entry
 ; BE-32BIT-P10-NEXT:    mflr r0
+; BE-32BIT-P10-NEXT:    stw r31, -4(r1)
 ; BE-32BIT-P10-NEXT:    stw r30, -8(r1)
 ; BE-32BIT-P10-NEXT:    lis r12, -1
 ; BE-32BIT-P10-NEXT:    mr r30, r1
 ; BE-32BIT-P10-NEXT:    stw r0, 8(r1)
-; BE-32BIT-P10-NEXT:    hashst r0, -16(r1)
+; BE-32BIT-P10-NEXT:    hashst r0, -24(r1)
 ; BE-32BIT-P10-NEXT:    clrlwi r0, r1, 17
 ; BE-32BIT-P10-NEXT:    subc r0, r12, r0
 ; BE-32BIT-P10-NEXT:    stwux r1, r1, r0
-; BE-32BIT-P10-NEXT:    stw r31, -4(r30) # 4-byte Folded Spill
-; BE-32BIT-P10-NEXT:    mr r31, r3
+; BE-32BIT-P10-NEXT:    stw r29, -12(r30) # 4-byte Folded Spill
+; BE-32BIT-P10-NEXT:    mr r29, r3
 ; BE-32BIT-P10-NEXT:    lwz r3, 4(r3)
 ; BE-32BIT-P10-NEXT:    lis r4, 0
-; BE-32BIT-P10-NEXT:    addi r5, r1, 32764
-; BE-32BIT-P10-NEXT:    ori r4, r4, 65516
-; BE-32BIT-P10-NEXT:    stwx r3, r1, r4
-; BE-32BIT-P10-NEXT:    lwz r3, 12(r31)
+; BE-32BIT-P10-NEXT:    mr r31, r1
+; BE-32BIT-P10-NEXT:    ori r4, r4, 65508
+; BE-32BIT-P10-NEXT:    stwx r3, r31, r4
+; BE-32BIT-P10-NEXT:    lwz r3, 12(r29)
 ; BE-32BIT-P10-NEXT:    lis r4, 0
 ; BE-32BIT-P10-NEXT:    ori r4, r4, 32768
-; BE-32BIT-P10-NEXT:    stwx r3, r1, r4
-; BE-32BIT-P10-NEXT:    lwz r3, 20(r31)
+; BE-32BIT-P10-NEXT:    stwx r3, r31, r4
+; BE-32BIT-P10-NEXT:    lwz r3, 20(r29)
 ; BE-32BIT-P10-NEXT:    lis r4, 0
-; BE-32BIT-P10-NEXT:    ori r4, r4, 65516
-; BE-32BIT-P10-NEXT:    add r4, r1, r4
-; BE-32BIT-P10-NEXT:    stw r3, 32764(r1)
+; BE-32BIT-P10-NEXT:    ori r4, r4, 65508
+; BE-32BIT-P10-NEXT:    stw r3, 32764(r31)
 ; BE-32BIT-P10-NEXT:    lis r3, 0
 ; BE-32BIT-P10-NEXT:    ori r3, r3, 32768
-; BE-32BIT-P10-NEXT:    add r3, r1, r3
+; BE-32BIT-P10-NEXT:    add r3, r31, r3
+; BE-32BIT-P10-NEXT:    add r4, r31, r4
+; BE-32BIT-P10-NEXT:    addi r5, r31, 32764
 ; BE-32BIT-P10-NEXT:    bl .callee3[PR]
 ; BE-32BIT-P10-NEXT:    nop
-; BE-32BIT-P10-NEXT:    lwz r4, 16(r31)
-; BE-32BIT-P10-NEXT:    lwz r31, -4(r30) # 4-byte Folded Reload
+; BE-32BIT-P10-NEXT:    lwz r4, 16(r29)
+; BE-32BIT-P10-NEXT:    lwz r29, -12(r30) # 4-byte Folded Reload
 ; BE-32BIT-P10-NEXT:    add r3, r4, r3
 ; BE-32BIT-P10-NEXT:    mr r1, r30
 ; BE-32BIT-P10-NEXT:    lwz r0, 8(r1)
-; BE-32BIT-P10-NEXT:    lwz r30, -8(r1)
+; BE-32BIT-P10-NEXT:    lwz r31, -4(r1)
 ; BE-32BIT-P10-NEXT:    mtlr r0
-; BE-32BIT-P10-NEXT:    hashchk r0, -16(r1)
+; BE-32BIT-P10-NEXT:    lwz r30, -8(r1)
+; BE-32BIT-P10-NEXT:    hashchk r0, -24(r1)
 ; BE-32BIT-P10-NEXT:    blr
 ;
 ; BE-32BIT-P9-LABEL: aligned:
 ; BE-32BIT-P9:       # %bb.0: # %entry
 ; BE-32BIT-P9-NEXT:    mflr r0
-; BE-32BIT-P9-NEXT:    stw r30, -8(r1)
+; BE-32BIT-P9-NEXT:    stw r31, -4(r1)
 ; BE-32BIT-P9-NEXT:    lis r12, -1
+; BE-32BIT-P9-NEXT:    stw r30, -8(r1)
 ; BE-32BIT-P9-NEXT:    mr r30, r1
 ; BE-32BIT-P9-NEXT:    stw r0, 8(r1)
-; BE-32BIT-P9-NEXT:    hashst r0, -16(r1)
+; BE-32BIT-P9-NEXT:    hashst r0, -24(r1)
 ; BE-32BIT-P9-NEXT:    clrlwi r0, r1, 17
 ; BE-32BIT-P9-NEXT:    subc r0, r12, r0
 ; BE-32BIT-P9-NEXT:    stwux r1, r1, r0
-; BE-32BIT-P9-NEXT:    stw r31, -4(r30) # 4-byte Folded Spill
-; BE-32BIT-P9-NEXT:    mr r31, r3
+; BE-32BIT-P9-NEXT:    stw r29, -12(r30) # 4-byte Folded Spill
+; BE-32BIT-P9-NEXT:    mr r29, r3
 ; BE-32BIT-P9-NEXT:    lwz r3, 4(r3)
 ; BE-32BIT-P9-NEXT:    lis r4, 0
-; BE-32BIT-P9-NEXT:    addi r5, r1, 32764
-; BE-32BIT-P9-NEXT:    ori r4, r4, 65516
-; BE-32BIT-P9-NEXT:    stwx r3, r1, r4
-; BE-32BIT-P9-NEXT:    lwz r3, 12(r31)
+; BE-32BIT-P9-NEXT:    mr r31, r1
+; BE-32BIT-P9-NEXT:    ori r4, r4, 65508
+; BE-32BIT-P9-NEXT:    addi r5, r31, 32764
+; BE-32BIT-P9-NEXT:    stwx r3, r31, r4
+; BE-32BIT-P9-NEXT:    lwz r3, 12(r29)
 ; BE-32BIT-P9-NEXT:    lis r4, 0
 ; BE-32BIT-P9-NEXT:    ori r4, r4, 32768
-; BE-32BIT-P9-NEXT:    stwx r3, r1, r4
-; BE-32BIT-P9-NEXT:    lwz r3, 20(r31)
+; BE-32BIT-P9-NEXT:    stwx r3, r31, r4
+; BE-32BIT-P9-NEXT:    lwz r3, 20(r29)
 ; BE-32BIT-P9-NEXT:    lis r4, 0
-; BE-32BIT-P9-NEXT:    ori r4, r4, 65516
-; BE-32BIT-P9-NEXT:    stw r3, 32764(r1)
+; BE-32BIT-P9-NEXT:    ori r4, r4, 65508
+; BE-32BIT-P9-NEXT:    stw r3, 32764(r31)
 ; BE-32BIT-P9-NEXT:    lis r3, 0
-; BE-32BIT-P9-NEXT:    add r4, r1, r4
+; BE-32BIT-P9-NEXT:    add r4, r31, r4
 ; BE-32BIT-P9-NEXT:    ori r3, r3, 32768
-; BE-32BIT-P9-NEXT:    add r3, r1, r3
+; BE-32BIT-P9-NEXT:    add r3, r31, r3
 ; BE-32BIT-P9-NEXT:    bl .callee3[PR]
 ; BE-32BIT-P9-NEXT:    nop
-; BE-32BIT-P9-NEXT:    lwz r4, 16(r31)
-; BE-32BIT-P9-NEXT:    lwz r31, -4(r30) # 4-byte Folded Reload
+; BE-32BIT-P9-NEXT:    lwz r4, 16(r29)
+; BE-32BIT-P9-NEXT:    lwz r29, -12(r30) # 4-byte Folded Reload
 ; BE-32BIT-P9-NEXT:    add r3, r4, r3
 ; BE-32BIT-P9-NEXT:    mr r1, r30
 ; BE-32BIT-P9-NEXT:    lwz r0, 8(r1)
+; BE-32BIT-P9-NEXT:    lwz r31, -4(r1)
 ; BE-32BIT-P9-NEXT:    lwz r30, -8(r1)
 ; BE-32BIT-P9-NEXT:    mtlr r0
-; BE-32BIT-P9-NEXT:    hashchk r0, -16(r1)
+; BE-32BIT-P9-NEXT:    hashchk r0, -24(r1)
 ; BE-32BIT-P9-NEXT:    blr
 ;
 ; BE-32BIT-P8-LABEL: aligned:
 ; BE-32BIT-P8:       # %bb.0: # %entry
 ; BE-32BIT-P8-NEXT:    mflr r0
+; BE-32BIT-P8-NEXT:    stw r31, -4(r1)
 ; BE-32BIT-P8-NEXT:    stw r30, -8(r1)
 ; BE-32BIT-P8-NEXT:    lis r12, -1
 ; BE-32BIT-P8-NEXT:    mr r30, r1
 ; BE-32BIT-P8-NEXT:    stw r0, 8(r1)
-; BE-32BIT-P8-NEXT:    hashst r0, -16(r1)
+; BE-32BIT-P8-NEXT:    hashst r0, -24(r1)
 ; BE-32BIT-P8-NEXT:    clrlwi r0, r1, 17
 ; BE-32BIT-P8-NEXT:    subc r0, r12, r0
 ; BE-32BIT-P8-NEXT:    stwux r1, r1, r0
 ; BE-32BIT-P8-NEXT:    lis r4, 0
-; BE-32BIT-P8-NEXT:    stw r31, -4(r30) # 4-byte Folded Spill
-; BE-32BIT-P8-NEXT:    mr r31, r3
+; BE-32BIT-P8-NEXT:    stw r29, -12(r30) # 4-byte Folded Spill
+; BE-32BIT-P8-NEXT:    mr r29, r3
 ; BE-32BIT-P8-NEXT:    lwz r3, 4(r3)
-; BE-32BIT-P8-NEXT:    addi r5, r1, 32764
-; BE-32BIT-P8-NEXT:    ori r4, r4, 65516
-; BE-32BIT-P8-NEXT:    stwx r3, r1, r4
+; BE-32BIT-P8-NEXT:    mr r31, r1
+; BE-32BIT-P8-NEXT:    ori r4, r4, 65508
+; BE-32BIT-P8-NEXT:    addi r5, r31, 32764
+; BE-32BIT-P8-NEXT:    stwx r3, r31, r4
 ; BE-32BIT-P8-NEXT:    lis r4, 0
-; BE-32BIT-P8-NEXT:    lwz r3, 12(r31)
+; BE-32BIT-P8-NEXT:    lwz r3, 12(r29)
 ; BE-32BIT-P8-NEXT:    ori r4, r4, 32768
-; BE-32BIT-P8-NEXT:    stwx r3, r1, r4
-; BE-32BIT-P8-NEXT:    lwz r3, 20(r31)
+; BE-32BIT-P8-NEXT:    stwx r3, r31, r4
+; BE-32BIT-P8-NEXT:    lwz r3, 20(r29)
 ; BE-32BIT-P8-NEXT:    lis r4, 0
-; BE-32BIT-P8-NEXT:    ori r4, r4, 65516
-; BE-32BIT-P8-NEXT:    stw r3, 32764(r1)
+; BE-32BIT-P8-NEXT:    ori r4, r4, 65508
+; BE-32BIT-P8-NEXT:    stw r3, 32764(r31)
 ; BE-32BIT-P8-NEXT:    lis r3, 0
-; BE-32BIT-P8-NEXT:    add r4, r1, r4
+; BE-32BIT-P8-NEXT:    add r4, r31, r4
 ; BE-32BIT-P8-NEXT:    ori r3, r3, 32768
-; BE-32BIT-P8-NEXT:    add r3, r1, r3
+; BE-32BIT-P8-NEXT:    add r3, r31, r3
 ; BE-32BIT-P8-NEXT:    bl .callee3[PR]
 ; BE-32BIT-P8-NEXT:    nop
-; BE-32BIT-P8-NEXT:    lwz r4, 16(r31)
-; BE-32BIT-P8-NEXT:    lwz r31, -4(r30) # 4-byte Folded Reload
+; BE-32BIT-P8-NEXT:    lwz r4, 16(r29)
+; BE-32BIT-P8-NEXT:    lwz r29, -12(r30) # 4-byte Folded Reload
 ; BE-32BIT-P8-NEXT:    add r3, r4, r3
 ; BE-32BIT-P8-NEXT:    mr r1, r30
 ; BE-32BIT-P8-NEXT:    lwz r0, 8(r1)
+; BE-32BIT-P8-NEXT:    lwz r31, -4(r1)
 ; BE-32BIT-P8-NEXT:    lwz r30, -8(r1)
-; BE-32BIT-P8-NEXT:    hashchk r0, -16(r1)
+; BE-32BIT-P8-NEXT:    hashchk r0, -24(r1)
 ; BE-32BIT-P8-NEXT:    mtlr r0
 ; BE-32BIT-P8-NEXT:    blr
 ;
 ; BE-P10-PRIV-LABEL: aligned:
 ; BE-P10-PRIV:       # %bb.0: # %entry
 ; BE-P10-PRIV-NEXT:    mflr r0
+; BE-P10-PRIV-NEXT:    std r31, -8(r1)
 ; BE-P10-PRIV-NEXT:    std r30, -16(r1)
 ; BE-P10-PRIV-NEXT:    lis r12, -1
 ; BE-P10-PRIV-NEXT:    mr r30, r1
 ; BE-P10-PRIV-NEXT:    std r0, 16(r1)
-; BE-P10-PRIV-NEXT:    hashstp r0, -24(r1)
+; BE-P10-PRIV-NEXT:    hashstp r0, -32(r1)
 ; BE-P10-PRIV-NEXT:    clrldi r0, r1, 49
 ; BE-P10-PRIV-NEXT:    subc r0, r12, r0
 ; BE-P10-PRIV-NEXT:    stdux r1, r1, r0
-; BE-P10-PRIV-NEXT:    std r31, -8(r30) # 8-byte Folded Spill
-; BE-P10-PRIV-NEXT:    mr r31, r3
+; BE-P10-PRIV-NEXT:    std r29, -24(r30) # 8-byte Folded Spill
+; BE-P10-PRIV-NEXT:    mr r29, r3
 ; BE-P10-PRIV-NEXT:    lwz r3, 4(r3)
 ; BE-P10-PRIV-NEXT:    lis r4, 0
-; BE-P10-PRIV-NEXT:    addi r5, r1, 32764
-; BE-P10-PRIV-NEXT:    ori r4, r4, 65508
-; BE-P10-PRIV-NEXT:    stwx r3, r1, r4
-; BE-P10-PRIV-NEXT:    lwz r3, 12(r31)
+; BE-P10-PRIV-NEXT:    mr r31, r1
+; BE-P10-PRIV-NEXT:    ori r4, r4, 65500
+; BE-P10-PRIV-NEXT:    stwx r3, r31, r4
+; BE-P10-PRIV-NEXT:    lwz r3, 12(r29)
 ; BE-P10-PRIV-NEXT:    lis r4, 0
 ; BE-P10-PRIV-NEXT:    ori r4, r4, 32768
-; BE-P10-PRIV-NEXT:    stwx r3, r1, r4
-; BE-P10-PRIV-NEXT:    lwz r3, 20(r31)
+; BE-P10-PRIV-NEXT:    stwx r3, r31, r4
+; BE-P10-PRIV-NEXT:    lwz r3, 20(r29)
 ; BE-P10-PRIV-NEXT:    lis r4, 0
-; BE-P10-PRIV-NEXT:    ori r4, r4, 65508
-; BE-P10-PRIV-NEXT:    add r4, r1, r4
-; BE-P10-PRIV-NEXT:    stw r3, 32764(r1)
+; BE-P10-PRIV-NEXT:    ori r4, r4, 65500
+; BE-P10-PRIV-NEXT:    stw r3, 32764(r31)
 ; BE-P10-PRIV-NEXT:    lis r3, 0
 ; BE-P10-PRIV-NEXT:    ori r3, r3, 32768
-; BE-P10-PRIV-NEXT:    add r3, r1, r3
+; BE-P10-PRIV-NEXT:    add r3, r31, r3
+; BE-P10-PRIV-NEXT:    add r4, r31, r4
+; BE-P10-PRIV-NEXT:    addi r5, r31, 32764
 ; BE-P10-PRIV-NEXT:    bl .callee3[PR]
 ; BE-P10-PRIV-NEXT:    nop
-; BE-P10-PRIV-NEXT:    lwz r4, 16(r31)
-; BE-P10-PRIV-NEXT:    ld r31, -8(r30) # 8-byte Folded Reload
+; BE-P10-PRIV-NEXT:    lwz r4, 16(r29)
+; BE-P10-PRIV-NEXT:    ld r29, -24(r30) # 8-byte Folded Reload
 ; BE-P10-PRIV-NEXT:    add r3, r4, r3
 ; BE-P10-PRIV-NEXT:    clrldi r3, r3, 32
 ; BE-P10-PRIV-NEXT:    mr r1, r30
 ; BE-P10-PRIV-NEXT:    ld r0, 16(r1)
-; BE-P10-PRIV-NEXT:    ld r30, -16(r1)
+; BE-P10-PRIV-NEXT:    ld r31, -8(r1)
 ; BE-P10-PRIV-NEXT:    mtlr r0
-; BE-P10-PRIV-NEXT:    hashchkp r0, -24(r1)
+; BE-P10-PRIV-NEXT:    ld r30, -16(r1)
+; BE-P10-PRIV-NEXT:    hashchkp r0, -32(r1)
 ; BE-P10-PRIV-NEXT:    blr
 ;
 ; BE-P9-PRIV-LABEL: aligned:
 ; BE-P9-PRIV:       # %bb.0: # %entry
 ; BE-P9-PRIV-NEXT:    mflr r0
-; BE-P9-PRIV-NEXT:    std r30, -16(r1)
+; BE-P9-PRIV-NEXT:    std r31, -8(r1)
 ; BE-P9-PRIV-NEXT:    lis r12, -1
+; BE-P9-PRIV-NEXT:    std r30, -16(r1)
 ; BE-P9-PRIV-NEXT:    mr r30, r1
 ; BE-P9-PRIV-NEXT:    std r0, 16(r1)
-; BE-P9-PRIV-NEXT:    hashstp r0, -24(r1)
+; BE-P9-PRIV-NEXT:    hashstp r0, -32(r1)
 ; BE-P9-PRIV-NEXT:    clrldi r0, r1, 49
 ; BE-P9-PRIV-NEXT:    subc r0, r12, r0
 ; BE-P9-PRIV-NEXT:    stdux r1, r1, r0
-; BE-P9-PRIV-NEXT:    std r31, -8(r30) # 8-byte Folded Spill
-; BE-P9-PRIV-NEXT:    mr r31, r3
+; BE-P9-PRIV-NEXT:    std r29, -24(r30) # 8-byte Folded Spill
+; BE-P9-PRIV-NEXT:    mr r29, r3
 ; BE-P9-PRIV-NEXT:    lwz r3, 4(r3)
 ; BE-P9-PRIV-NEXT:    lis r4, 0
-; BE-P9-PRIV-NEXT:    addi r5, r1, 32764
-; BE-P9-PRIV-NEXT:    ori r4, r4, 65508
-; BE-P9-PRIV-NEXT:    stwx r3, r1, r4
-; BE-P9-PRIV-NEXT:    lwz r3, 12(r31)
+; BE-P9-PRIV-NEXT:    mr r31, r1
+; BE-P9-PRIV-NEXT:    ori r4, r4, 65500
+; BE-P9-PRIV-NEXT:    addi r5, r31, 32764
+; BE-P9-PRIV-NEXT:    stwx r3, r31, r4
+; BE-P9-PRIV-NEXT:    lwz r3, 12(r29)
 ; BE-P9-PRIV-NEXT:    lis r4, 0
 ; BE-P9-PRIV-NEXT:    ori r4, r4, 32768
-; BE-P9-PRIV-NEXT:    stwx r3, r1, r4
-; BE-P9-PRIV-NEXT:    lwz r3, 20(r31)
+; BE-P9-PR...
[truncated]

Copy link

github-actions bot commented Jul 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@syzaara syzaara changed the title [PPC][AIX] Set needsFP to true when base pointer is used in prologue/… [PPC][AIX] Save/restore r31 in prolog/epilog when using base pointer Jul 29, 2024
Copy link
Collaborator

@chenzheng1030 chenzheng1030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks very much for fixing this.

Copy link
Contributor

@mandlebug mandlebug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR has already been approved but would it be possible to instead update the initialization of HasFP to include the Subtarget.isAIXABI()) && RegInfo->hasBasePointer(MF), as well as modify the needsFP helper function to take RegInfo and Subtarget as arguments and put the check of RegInfo->hasBasePointer(MF) && Subtarget.isAIXABI()) into its body? My worry is that under refactoring or updating we could add a new check that missies the AIX condition. having them in the init/helper-body removes that possibility.

@chenzheng1030
Copy link
Collaborator

I know this PR has already been approved but would it be possible to instead update the initialization of HasFP to include the Subtarget.isAIXABI()) && RegInfo->hasBasePointer(MF), as well as modify the needsFP helper function to take RegInfo and Subtarget as arguments and put the check of RegInfo->hasBasePointer(MF) && Subtarget.isAIXABI()) into its body? My worry is that under refactoring or updating we could add a new check that missies the AIX condition. having them in the init/helper-body removes that possibility.

Thanks for taking a look. It is always good to do more discussion : )

For your recommendation, @syzaara did exactly same thing in her first commit, see e5fd2aa .

I suggested we do not backup/restore the r31 by treating r31 as the frame pointer. For this case, we don't actually need FP, what we need here is to just backup/restore the r31. Treating it as frame pointer has at least two below negative impacts for now:

  • a redundant copy from r1 to r31. (frame pointer needs to be initialized with r1.)
  • r31 is not allocatable, see getReservedRegs()

I agree the new change is hard to maintain. I took another look. Since backup/restore r31 through the emitPrologue()/emitEpilogue() path needs to change many places and hard to maintain, can we backup/restore r31 through the normal csr backup/restore path, like:

diff --git a/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp b/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
index 1963582ce686..2f787e5c731d 100644
--- a/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
@@ -2025,8 +2025,18 @@ void PPCFrameLowering::determineCalleeSaves(MachineFunction &MF,
   // code. Same goes for the base pointer and the PIC base register.
   if (needsFP(MF))
     SavedRegs.reset(isPPC64 ? PPC::X31 : PPC::R31);
-  if (RegInfo->hasBasePointer(MF))
+  if (RegInfo->hasBasePointer(MF)) {
     SavedRegs.reset(RegInfo->getBaseRegister(MF));
+    // On AIX, when BaseRegister(R30) is used, need to spill r31 too to match
+    // AIX trackback table requirement.
+    if (!needsFP(MF) && !SavedRegs.test(isPPC64 ? PPC::X31 : PPC::R31) &&
+        Subtarget.isAIXABI()) {
+      assert(
+          (RegInfo->getBaseRegister(MF) == (isPPC64 ? PPC::X30 : PPC::R30)) &&
+          "Invalid base register on AIX!");
+      SavedRegs.set(isPPC64 ? PPC::X31 : PPC::R31);
+    }
+  }

(This patch is not well tested!)

Sorry, @syzaara for the solution change...

What do you guys think?

@syzaara syzaara changed the title [PPC][AIX] Save/restore r31 in prolog/epilog when using base pointer [PPC][AIX] Save/restore r31 when using base pointer Aug 1, 2024
Copy link
Collaborator

@chenzheng1030 chenzheng1030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM. Thanks very much.

SavedRegs.reset(RegInfo->getBaseRegister(MF));
// On AIX, when BaseRegister(R30) is used, need to spill r31 too to match
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it's good to make R30 and r31 be consistent, my bad : )

@mandlebug
Copy link
Contributor

Thanks for the insight Zheng - I never considered that we would be wasting an allocatable register. I think the new direction alleviates my concerns as well.

@syzaara syzaara merged commit d07f106 into llvm:main Aug 7, 2024
5 of 7 checks passed
TIFitis pushed a commit that referenced this pull request Aug 8, 2024
When the base pointer r30 is used to hold the stack pointer, r30 is
spilled in the prologue. On AIX registers are saved from highest to
lowest, so r31 also needs to be saved.

Fixes #96411
@syzaara
Copy link
Contributor Author

syzaara commented Aug 13, 2024

/cherry-pick 953d1f1

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 13, 2024

/cherry-pick 953d1f1

Error: Command failed due to missing milestone.

@syzaara syzaara added this to the LLVM 19.X Release milestone Aug 13, 2024
@syzaara
Copy link
Contributor Author

syzaara commented Aug 13, 2024

/cherry-pick 953d1f1

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 13, 2024

Failed to cherry-pick: 953d1f1

https://github.com/llvm/llvm-project/actions/runs/10372003179

Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request

@syzaara
Copy link
Contributor Author

syzaara commented Aug 13, 2024

/cherry-pick d07f106

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Aug 13, 2024
When the base pointer r30 is used to hold the stack pointer, r30 is
spilled in the prologue. On AIX registers are saved from highest to
lowest, so r31 also needs to be saved.

Fixes llvm#96411

(cherry picked from commit d07f106)
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 13, 2024

/pull-request #103301

tru pushed a commit to llvmbot/llvm-project that referenced this pull request Aug 15, 2024
When the base pointer r30 is used to hold the stack pointer, r30 is
spilled in the prologue. On AIX registers are saved from highest to
lowest, so r31 also needs to be saved.

Fixes llvm#96411

(cherry picked from commit d07f106)
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Aug 15, 2024
When the base pointer r30 is used to hold the stack pointer, r30 is
spilled in the prologue. On AIX registers are saved from highest to
lowest, so r31 also needs to be saved.

Fixes llvm#96411
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

[PPC][AIX] Local variable needing higher alignment shows r31 not saved with r30 as required by ABI
4 participants