[DA] runtime predicates for delinearization bounds checks #170713

sebpop · 2025-12-04T18:20:38Z

When compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions.

llvmbot · 2025-12-04T18:21:12Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Sebastian Pop (sebpop)

Changes

When compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions.

Patch is 84.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170713.diff

29 Files Affected:

(modified) llvm/include/llvm/Analysis/Delinearization.h (+8-5)
(modified) llvm/include/llvm/Analysis/DependenceAnalysis.h (+6-3)
(modified) llvm/lib/Analysis/Delinearization.cpp (+37-9)
(modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+21-18)
(modified) llvm/test/Analysis/DependenceAnalysis/Banerjee.ll (+58-14)
(modified) llvm/test/Analysis/DependenceAnalysis/Constraints.ll (+28-7)
(modified) llvm/test/Analysis/DependenceAnalysis/DADelin.ll (+32-1)
(modified) llvm/test/Analysis/DependenceAnalysis/DifferentOffsets.ll (+6)
(modified) llvm/test/Analysis/DependenceAnalysis/ExactRDIV.ll (+16-4)
(modified) llvm/test/Analysis/DependenceAnalysis/GCD.ll (+57-9)
(modified) llvm/test/Analysis/DependenceAnalysis/Invariant.ll (+1-2)
(modified) llvm/test/Analysis/DependenceAnalysis/MismatchingNestLevels.ll (+4)
(modified) llvm/test/Analysis/DependenceAnalysis/NonCanonicalizedSubscript.ll (+5-2)
(modified) llvm/test/Analysis/DependenceAnalysis/Preliminary.ll (+9)
(modified) llvm/test/Analysis/DependenceAnalysis/PreliminaryNoValidityCheckFixedSize.ll (+9)
(modified) llvm/test/Analysis/DependenceAnalysis/Propagating.ll (+36-10)
(modified) llvm/test/Analysis/DependenceAnalysis/SimpleSIVNoValidityCheck.ll (+27)
(modified) llvm/test/Analysis/DependenceAnalysis/StrongSIV.ll (+3)
(modified) llvm/test/Analysis/DependenceAnalysis/SymbolicSIV.ll (+14)
(modified) llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll (+4)
(modified) llvm/test/Analysis/DependenceAnalysis/WeakZeroDstSIV.ll (+1)
(modified) llvm/test/Analysis/DependenceAnalysis/WeakZeroSrcSIV.ll (+1)
(modified) llvm/test/Analysis/DependenceAnalysis/becount-couldnotcompute.ll (+1)
(modified) llvm/test/Analysis/DependenceAnalysis/compute-absolute-value.ll (+4)
(modified) llvm/test/Analysis/DependenceAnalysis/gcd-miv-overflow.ll (+7)
(modified) llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll (+2)
(modified) llvm/test/Analysis/DependenceAnalysis/monotonicity-no-wrap-flags.ll (+2)
(modified) llvm/test/Analysis/DependenceAnalysis/zero-coefficient.ll (+1)
(modified) llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll (+7-7)

diff --git a/llvm/include/llvm/Analysis/Delinearization.h b/llvm/include/llvm/Analysis/Delinearization.h
index 8fb30925b1ba7..7346128c0b510 100644
--- a/llvm/include/llvm/Analysis/Delinearization.h
+++ b/llvm/include/llvm/Analysis/Delinearization.h
@@ -26,6 +26,7 @@ class GetElementPtrInst;
 class Instruction;
 class ScalarEvolution;
 class SCEV;
+class SCEVPredicate;
 
 /// Compute the array dimensions Sizes from the set of Terms extracted from
 /// the memory access function of this SCEVAddRecExpr (second step of
@@ -144,11 +145,13 @@ bool delinearizeFixedSizeArray(ScalarEvolution &SE, const SCEV *Expr,
 /// Check that each subscript in \p Subscripts is within the corresponding size
 /// in \p Sizes. For the outermost dimension, the subscript being negative is
 /// allowed. If \p Ptr is not nullptr, it may be used to get information from
-/// the IR pointer value, which may help in the validation.
-bool validateDelinearizationResult(ScalarEvolution &SE,
-                                   ArrayRef<const SCEV *> Sizes,
-                                   ArrayRef<const SCEV *> Subscripts,
-                                   const Value *Ptr = nullptr);
+/// the IR pointer value, which may help in the validation. If \p Assume is not
+/// nullptr and a compile-time check fails, runtime predicates are added to
+/// \p Assume instead of returning false.
+bool validateDelinearizationResult(
+    ScalarEvolution &SE, ArrayRef<const SCEV *> Sizes,
+    ArrayRef<const SCEV *> Subscripts, const Value *Ptr = nullptr,
+    SmallVectorImpl<const SCEVPredicate *> *Assume = nullptr);
 
 /// Gathers the individual index expressions from a GEP instruction.
 ///
diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 6dec24fc9f104..80095e91fcc6b 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -754,7 +754,8 @@ class DependenceInfo {
   /// Given a linear access function, tries to recover subscripts
   /// for each dimension of the array element access.
   bool tryDelinearize(Instruction *Src, Instruction *Dst,
-                      SmallVectorImpl<Subscript> &Pair);
+                      SmallVectorImpl<Subscript> &Pair,
+                      SmallVectorImpl<const SCEVPredicate *> &Assume);
 
   /// Tries to delinearize \p Src and \p Dst access functions for a fixed size
   /// multi-dimensional array. Calls delinearizeFixedSizeArray() to delinearize
@@ -762,7 +763,8 @@ class DependenceInfo {
   bool tryDelinearizeFixedSize(Instruction *Src, Instruction *Dst,
                                const SCEV *SrcAccessFn, const SCEV *DstAccessFn,
                                SmallVectorImpl<const SCEV *> &SrcSubscripts,
-                               SmallVectorImpl<const SCEV *> &DstSubscripts);
+                               SmallVectorImpl<const SCEV *> &DstSubscripts,
+                               SmallVectorImpl<const SCEVPredicate *> &Assume);
 
   /// Tries to delinearize access function for a multi-dimensional array with
   /// symbolic runtime sizes.
@@ -771,7 +773,8 @@ class DependenceInfo {
   tryDelinearizeParametricSize(Instruction *Src, Instruction *Dst,
                                const SCEV *SrcAccessFn, const SCEV *DstAccessFn,
                                SmallVectorImpl<const SCEV *> &SrcSubscripts,
-                               SmallVectorImpl<const SCEV *> &DstSubscripts);
+                               SmallVectorImpl<const SCEV *> &DstSubscripts,
+                               SmallVectorImpl<const SCEVPredicate *> &Assume);
 
   /// checkSubscript - Helper function for checkSrcSubscript and
   /// checkDstSubscript to avoid duplicate code
diff --git a/llvm/lib/Analysis/Delinearization.cpp b/llvm/lib/Analysis/Delinearization.cpp
index 7bf83ccf9c172..68928c62ab569 100644
--- a/llvm/lib/Analysis/Delinearization.cpp
+++ b/llvm/lib/Analysis/Delinearization.cpp
@@ -753,24 +753,34 @@ static bool isKnownLessThan(ScalarEvolution *SE, const SCEV *S,
   return SE->isKnownNegative(LimitedBound);
 }
 
-bool llvm::validateDelinearizationResult(ScalarEvolution &SE,
-                                         ArrayRef<const SCEV *> Sizes,
-                                         ArrayRef<const SCEV *> Subscripts,
-                                         const Value *Ptr) {
+bool llvm::validateDelinearizationResult(
+    ScalarEvolution &SE, ArrayRef<const SCEV *> Sizes,
+    ArrayRef<const SCEV *> Subscripts, const Value *Ptr,
+    SmallVectorImpl<const SCEVPredicate *> *Assume) {
   // Sizes and Subscripts are as follows:
-  //
   //   Sizes:      [UNK][S_2]...[S_n]
   //   Subscripts: [I_1][I_2]...[I_n]
   //
   // where the size of the outermost dimension is unknown (UNK).
 
+  // Unify types of two SCEVs to the wider type.
+  auto UnifyTypes =
+      [&](const SCEV *&A,
+          const SCEV *&B) -> std::pair<const SCEV *, const SCEV *> {
+    Type *WiderType = SE.getWiderType(A->getType(), B->getType());
+    return {SE.getNoopOrSignExtend(A, WiderType),
+            SE.getNoopOrSignExtend(B, WiderType)};
+  };
+
   auto AddOverflow = [&](const SCEV *A, const SCEV *B) -> const SCEV * {
+    std::tie(A, B) = UnifyTypes(A, B);
     if (!SE.willNotOverflow(Instruction::Add, /*IsSigned=*/true, A, B))
       return nullptr;
     return SE.getAddExpr(A, B);
   };
 
   auto MulOverflow = [&](const SCEV *A, const SCEV *B) -> const SCEV * {
+    std::tie(A, B) = UnifyTypes(A, B);
     if (!SE.willNotOverflow(Instruction::Mul, /*IsSigned=*/true, A, B))
       return nullptr;
     return SE.getMulExpr(A, B);
@@ -780,10 +790,28 @@ bool llvm::validateDelinearizationResult(ScalarEvolution &SE,
   for (size_t I = 1; I < Sizes.size(); ++I) {
     const SCEV *Size = Sizes[I - 1];
     const SCEV *Subscript = Subscripts[I];
-    if (!isKnownNonNegative(&SE, Subscript, Ptr))
-      return false;
-    if (!isKnownLessThan(&SE, Subscript, Size))
-      return false;
+
+    // Check Subscript >= 0.
+    if (!isKnownNonNegative(&SE, Subscript, Ptr)) {
+      if (!Assume)
+        return false;
+      const SCEVPredicate *Pred = SE.getComparePredicate(
+          ICmpInst::ICMP_SGE, Subscript, SE.getZero(Subscript->getType()));
+      Assume->push_back(Pred);
+    }
+
+    // Check Subscript < Size.
+    if (!isKnownLessThan(&SE, Subscript, Size)) {
+      if (!Assume)
+        return false;
+      // Need to unify types before creating the predicate.
+      Type *WiderType = SE.getWiderType(Subscript->getType(), Size->getType());
+      const SCEV *SubscriptExt = SE.getNoopOrSignExtend(Subscript, WiderType);
+      const SCEV *SizeExt = SE.getNoopOrSignExtend(Size, WiderType);
+      const SCEVPredicate *Pred =
+          SE.getComparePredicate(ICmpInst::ICMP_SLT, SubscriptExt, SizeExt);
+      Assume->push_back(Pred);
+    }
   }
 
   // The offset computation is as follows:
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 9b9c80a9b3266..858cbafdc3a0a 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -3176,8 +3176,9 @@ const SCEV *DependenceInfo::getUpperBound(BoundInfo *Bound) const {
 /// source and destination array references are recurrences on a nested loop,
 /// this function flattens the nested recurrences into separate recurrences
 /// for each loop level.
-bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
-                                    SmallVectorImpl<Subscript> &Pair) {
+bool DependenceInfo::tryDelinearize(
+    Instruction *Src, Instruction *Dst, SmallVectorImpl<Subscript> &Pair,
+    SmallVectorImpl<const SCEVPredicate *> &Assume) {
   assert(isLoadOrStore(Src) && "instruction is not load or store");
   assert(isLoadOrStore(Dst) && "instruction is not load or store");
   Value *SrcPtr = getLoadStorePointerOperand(Src);
@@ -3197,9 +3198,9 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
   SmallVector<const SCEV *, 4> SrcSubscripts, DstSubscripts;
 
   if (!tryDelinearizeFixedSize(Src, Dst, SrcAccessFn, DstAccessFn,
-                               SrcSubscripts, DstSubscripts) &&
+                               SrcSubscripts, DstSubscripts, Assume) &&
       !tryDelinearizeParametricSize(Src, Dst, SrcAccessFn, DstAccessFn,
-                                    SrcSubscripts, DstSubscripts))
+                                    SrcSubscripts, DstSubscripts, Assume))
     return false;
 
   assert(isLoopInvariant(SrcBase, SrcLoop) &&
@@ -3245,7 +3246,8 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
 bool DependenceInfo::tryDelinearizeFixedSize(
     Instruction *Src, Instruction *Dst, const SCEV *SrcAccessFn,
     const SCEV *DstAccessFn, SmallVectorImpl<const SCEV *> &SrcSubscripts,
-    SmallVectorImpl<const SCEV *> &DstSubscripts) {
+    SmallVectorImpl<const SCEV *> &DstSubscripts,
+    SmallVectorImpl<const SCEVPredicate *> &Assume) {
   LLVM_DEBUG({
     const SCEVUnknown *SrcBase =
         dyn_cast<SCEVUnknown>(SE->getPointerBase(SrcAccessFn));
@@ -3285,10 +3287,12 @@ bool DependenceInfo::tryDelinearizeFixedSize(
   // dimensions. For example some C language usage/interpretation make it
   // impossible to verify this at compile-time. As such we can only delinearize
   // iff the subscripts are positive and are less than the range of the
-  // dimension.
+  // dimension. If compile-time checks fail, add runtime predicates.
   if (!DisableDelinearizationChecks) {
-    if (!validateDelinearizationResult(*SE, SrcSizes, SrcSubscripts, SrcPtr) ||
-        !validateDelinearizationResult(*SE, DstSizes, DstSubscripts, DstPtr)) {
+    if (!validateDelinearizationResult(*SE, SrcSizes, SrcSubscripts, SrcPtr,
+                                       &Assume) ||
+        !validateDelinearizationResult(*SE, DstSizes, DstSubscripts, DstPtr,
+                                       &Assume)) {
       SrcSubscripts.clear();
       DstSubscripts.clear();
       return false;
@@ -3305,7 +3309,8 @@ bool DependenceInfo::tryDelinearizeFixedSize(
 bool DependenceInfo::tryDelinearizeParametricSize(
     Instruction *Src, Instruction *Dst, const SCEV *SrcAccessFn,
     const SCEV *DstAccessFn, SmallVectorImpl<const SCEV *> &SrcSubscripts,
-    SmallVectorImpl<const SCEV *> &DstSubscripts) {
+    SmallVectorImpl<const SCEV *> &DstSubscripts,
+    SmallVectorImpl<const SCEVPredicate *> &Assume) {
 
   Value *SrcPtr = getLoadStorePointerOperand(Src);
   Value *DstPtr = getLoadStorePointerOperand(Dst);
@@ -3346,15 +3351,13 @@ bool DependenceInfo::tryDelinearizeParametricSize(
       SrcSubscripts.size() != DstSubscripts.size())
     return false;
 
-  // Statically check that the array bounds are in-range. The first subscript we
-  // don't have a size for and it cannot overflow into another subscript, so is
-  // always safe. The others need to be 0 <= subscript[i] < bound, for both src
-  // and dst.
-  // FIXME: It may be better to record these sizes and add them as constraints
-  // to the dependency checks.
+  // Check that the array bounds are in-range. If compile-time checks fail,
+  // add runtime predicates.
   if (!DisableDelinearizationChecks)
-    if (!validateDelinearizationResult(*SE, Sizes, SrcSubscripts, SrcPtr) ||
-        !validateDelinearizationResult(*SE, Sizes, DstSubscripts, DstPtr))
+    if (!validateDelinearizationResult(*SE, Sizes, SrcSubscripts, SrcPtr,
+                                       &Assume) ||
+        !validateDelinearizationResult(*SE, Sizes, DstSubscripts, DstPtr,
+                                       &Assume))
       return false;
 
   return true;
@@ -3507,7 +3510,7 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
                                           SCEVUnionPredicate(Assume, *SE));
 
   if (Delinearize) {
-    if (tryDelinearize(Src, Dst, Pair)) {
+    if (tryDelinearize(Src, Dst, Pair, Assume)) {
       LLVM_DEBUG(dbgs() << "    delinearized\n");
       Pairs = Pair.size();
     }
diff --git a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
index 6dde8844c6040..6e75887db06d4 100644
--- a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
@@ -46,9 +46,14 @@ define void @banerjee0(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ;
 ; DELIN-LABEL: 'banerjee0'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent output [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT:    da analyze - flow [<= <>]!
+; DELIN-NEXT:    da analyze - consistent flow [0 1]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
@@ -132,12 +137,18 @@ define void @banerjee1(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-LABEL: 'banerjee1'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - output [* *]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %2 = load i64, ptr %arrayidx6, align 8
 ; DELIN-NEXT:    da analyze - flow [* <>]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %2 = load i64, ptr %arrayidx6, align 8 --> Dst: %2 = load i64, ptr %arrayidx6, align 8
 ; DELIN-NEXT:    da analyze - input [* *]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {0,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %2 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %2, ptr %B.addr.12, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
@@ -320,11 +331,16 @@ define void @banerjee3(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - flow [> >]!
+; DELIN-NEXT:    da analyze - consistent flow [-9 -9]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -490,11 +506,16 @@ define void @banerjee5(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT:    da analyze - flow [< <]!
+; DELIN-NEXT:    da analyze - consistent flow [9 9]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
+; DELIN-NEXT:    Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -575,11 +596,16 @@ define void @banerjee6(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - flow [=> <>]!
+; DELIN-NEXT:    da analyze - consistent flow [0 -9]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -745,11 +771,16 @@ define void @banerjee8(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - flow [> <>]!
+; DELIN-NEXT:    da analyze - consistent flow [-1 -1]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -828,9 +859,14 @@ define void @banerjee9(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ;
 ; DELIN-LABEL: 'banerjee9'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
-; DELIN-NEXT:    da analyze - output [* *]!
+; DELIN-NEXT:    da analyze - consistent output [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELI...
[truncated]

When compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions.

When compile-time overflow checks (for Prod, Min, and Max offset computations) fail, add runtime SCEV predicates using the equality-based overflow detection pattern: (sext A) op (sext B) == sext(A op B). This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions. This extends the runtime predicate support from PR llvm#170713 to also cover the overflow validation checks added in PR llvm#169902.

llvm/lib/Analysis/Delinearization.cpp

github-actions · 2025-12-05T16:29:24Z

🐧 Linux x64 Test Results

166848 tests passed
2911 tests skipped
2 tests failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.Analysis/DDG/basic-loopnest.ll

Exit Code: 0

Command Output (stdout):
--
# RUN: at line 1
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt < /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Analysis/DDG/basic-loopnest.ll -disable-output "-passes=print<ddg>" 2>&1 | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Analysis/DDG/basic-loopnest.ll
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -disable-output '-passes=print<ddg>'
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Analysis/DDG/basic-loopnest.ll
# note: command had no output on stdout or stderr

--

LLVM.Transforms/LICM/lnicm.ll

Exit Code: 0

Command Output (stdout):
--
# RUN: at line 2
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa -passes='loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes INTC
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa '-passes=loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes INTC
# note: command had no output on stdout or stderr
# RUN: at line 3
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa -passes='loop-mssa(lnicm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LNICM
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa '-passes=loop-mssa(lnicm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LNICM
# note: command had no output on stdout or stderr
# RUN: at line 4
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa -passes='loop-mssa(licm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LICM
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa '-passes=loop-mssa(licm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LICM
# note: command had no output on stdout or stderr

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

kasuga-fj

High level thought: I think the runtime prediction feature should be removed for now, and re-introduced after the functionality like BatchDA is implemented. Current implementation looks a bit ugly to me, since each time the depends is invoked, same predicates will keep getting added. Probably a better design would be to use a single set of assumptions throughout the analysis.

kasuga-fj · 2025-12-05T17:14:38Z

llvm/test/Analysis/DependenceAnalysis/Banerjee.ll

+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10


I think assumptions are added too excessively. For example, these two predicates obviously don't hold. Probably we should not insert a predicate if we know it to be false at compile time.

The redundant predicates happen because we validate both source and destination subscripts separately.

Not duplication. {1,+,1}<nuw><nsw><%for.body3> slt) 10 will never be satisfied because the BTC of for.body3 is 10. I think such predication should not be added.

Refactor the overflow check runtime predicate generation to reuse the same pattern used by ScalarEvolution::willNotOverflow(). Instead of duplicating the ext(LHS op RHS) == ext(LHS) op ext(RHS) pattern in Delinearization.cpp, add a new getNoOverflowPredicate() method to ScalarEvolution that returns the predicate (or nullptr if no-overflow is already provable at compile time). This addresses review feedback to avoid code duplication between willNotOverflow() and the runtime predicate generation in delinearization.

When validating both source and destination subscripts, the same predicate can be generated multiple times (e.g., when both access A[i][j]). Add a helper AddPredicate() that checks if a predicate already exists before adding it to avoid duplicates.

sebpop · 2025-12-05T21:50:17Z

functionality like BatchDA is implemented

I know how "batch delinearization" works: collect all data references in the current function, sort by base pointers, and perform delinearization on all arrays accessing the same base pointer.

"batchDA" is different, and I believe the loop optimizations would need to bound or focus DA to a given region containing loops.
DA is currently not bound to a given loop nest or set of loops.
In contrast, Polly is bound to a SCoP which is a single-entry / single-exit region of the CFG.

nikic · 2025-12-06T06:27:40Z

Just as a high-level question: Do we want to introduce more runtime predicates at this point in time? Given that there is a lot of active ongoing work on correctness issues in DA/Delinearization, it may be better to delay this until a later time, just to make it easier to do further code changes, without also having to maintain the predicates along the way.

I've been out for a while and am not up to date on the DA/Delinearization work, but IIRC these runtime predicates are not yet used anywhere, right? They only appear in the analysis output for now? I expect that actually making use of them will be quite challenging, because of the need to trade off the value of the transform with the probability of passing the runtime check and the code size increase of performing loop versioning. Having a lot of runtime checks implemented will probably make this harder than introducing them gradually and evaluating which of them are actually valuable in practice.

On the other hand, I could see value in doing this now if this helps with testing somehow, e.g. because it allows us to easily test situations in DA that would otherwise be hard to test because of delinearization failures. Is something like this the case?

sjoerdmeijer · 2025-12-07T11:34:43Z

Just as a high-level question: Do we want to introduce more runtime predicates at this point in time? Given that there is a lot of active ongoing work on correctness issues in DA/Delinearization, it may be better to delay this until a later time, just to make it easier to do further code changes, without also having to maintain the predicates along the way.

I've been out for a while and am not up to date on the DA/Delinearization work, but IIRC these runtime predicates are not yet used anywhere, right? They only appear in the analysis output for now? I expect that actually making use of them will be quite challenging, because of the need to trade off the value of the transform with the probability of passing the runtime check and the code size increase of performing loop versioning. Having a lot of runtime checks implemented will probably make this harder than introducing them gradually and evaluating which of them are actually valuable in practice.

On the other hand, I could see value in doing this now if this helps with testing somehow, e.g. because it allows us to easily test situations in DA that would otherwise be hard to test because of delinearization failures. Is something like this the case?

Welcome back @nikic . :-)

I think the above is fair assessment of the situation. I would like to add a few things though.

First of all, there has been a lot of discussions about two different approaches to deal with wrapping behaviour. This spans different tickets and many weeks, and it is difficult to catch up on. The following is my attempt to briefly summarise the situation. @kasuga-fj found a problem with monotonicity as it is currently defined/implemented. At different program scopes, monotonicty may or may not hold due to conditions or loop guards. He is looking at defining "iteration domains" where monotonicity holds. Another school of thought pursued by @amehsan is saying that monotonicity is a special case of wrapping or the lack thereof, so monotonicity as concept may not be necessary if dependence test can be adapted to be accurate for these cases. Both are prototyping their approach, which then allows us to look at this and see what the way forward is.

Besides this, we also have the runtime predicates. The way I look at this as follows:

this is mostly orthogonal to the work mentioned above, because when dependence tests can't prove (in)dependence, we can proceed dependence checks under these assumptions, so we can handle a larger class of applications. And for now, yes, it also allows us to continue the work when delinearization fails.
runtime predicates isn't a new concept, my understanding is that Polly also uses this. Maybe @Meinersbur can confirm and elaborate.
The implementation to add runtime checks isn't very difficult, I doubt it will really get into the way of other things.

But I agree that we haven't shown how to use these runtime predicates when it comes to actually performing a transformation. What I propose is the following:

I will create an IR reproducer of our proxy workload and motivating example and create a merge request to add this as a loop-interchange regression test,
This allows us to see what is necessary to get it interchanged, what dependence checks are doing, and how transforming it could look like using runtime predicates.

So I will add the regression test first which I will do early this week, and after that start working on using the runtime predicates in interchange.

amehsan · 2025-12-08T20:06:58Z

First of all, there has been a lot of discussions about two different approaches to deal with wrapping behaviour. This spans different tickets and many weeks, and it is difficult to catch up on. The following is my attempt to briefly summarise the situation. @kasuga-fj found a problem with monotonicity as it is currently defined/implemented. At different program scopes, monotonicty may or may not hold due to conditions or loop guards. He is looking at defining "iteration domains" where monotonicity holds. Another school of thought pursued by @amehsan is saying that monotonicity is a special case of wrapping or the lack thereof, so monotonicity as concept may not be necessary if dependence test can be adapted to be accurate for these cases. Both are prototyping their approach, which then allows us to look at this and see what the way forward is.

I am not planning to comment on this PR. But I thought to share how I think about the discussion between myself @kasuga-fj

Original concern of @kasuga-fj (for example stated here: #162281 (comment)) was that "each dependence test assumes monotonicity over the entire iteration space".

Now he agrees that we can prove the correctness of dependence tests without assuming monotonicity over the entire iteration space. So for one of the two tests in strong SIV, he has dropped this requirement from his implementation.

For the other test in Strong SIV or Symbolic RDIV, that he checks this requirement in his code, the reason is not correctness. His point of view at this time is that using the assumption of "monotonicy over the entire iteration space" results in simpler code and simpler proof.

So at this point the views are closer together. My objection to the concept of "montonicity" is a somewhat different issue.

kasuga-fj · 2025-12-09T02:21:37Z

Now he agrees that we can prove the correctness of dependence tests without assuming monotonicity over the entire iteration space.

I don't think we can prove the correctness without monotonicity over the entire iteration space. As for SIV (non-MIV), that condition means both of the following hold:

The addrec has nsw
The exact BTC is computable

What I agree with is that some part of the code doesn't use an exact BTC.

That said, this is off-topic for this PR, so I don't intend to continue it further here.

sebpop · 2025-12-09T14:17:30Z

I expect that actually making use of them will be quite challenging, because of the need to trade off the value of the transform with the probability of passing the runtime check and the code size increase of performing loop versioning

The decision to perform versioning is with the loop transform passes. DA is supposed to only provide a minimal list of predicates under which the DA result holds. Currently, when the info is missing, the result of DA is just "don't know" on which the LNO gives up with no choice.

Fewer runtime predicates will be generated with more info sent from the front-ends.

runtime predicates isn't a new concept, my understanding is that Polly also uses this.

Correct, both Polly and the vectorizer use runtime predicates to check for alias, dependences, and possible values of parameters.

kasuga-fj · 2025-12-09T18:18:00Z

Since I hadn't answered the questions, let me answer them here.

IIRC these runtime predicates are not yet used anywhere, right? They only appear in the analysis output for now?

Yes, that's correct.

On the other hand, I could see value in doing this now if this helps with testing somehow, e.g. because it allows us to easily test situations in DA that would otherwise be hard to test because of delinearization failures. Is something like this the case?

At least for this PR, this is not the case. This PR tries to add predicates to make the validation of delinearization succeed, but we already have the option -da-disable-delinearization-checks to skip the validation. For testing purpose, this option is sufficient. Also, I don't think that adding runtime predicates would make DA testing easier, not only for delinearization.

Basically, I agree with postponing the runtime predicates functionality. Even if it appears orthogonal to other DA works, I think that generally increasing code complexity can make those works more difficult. Also, it's unclear which predicates we should use in runtime checks. At least we need to think about the followings:

Of course, it's better to have a small number of predicates to increase the possibility of passing them.
It's not necessarily required to remove the dependency completely. For example, eliminating specific directions from the dependency (e.g., all directions * -> only positive direction <) would be sufficient for some loop transformations.
Fixing the correctness issues will degrade the analysis precision. It may change the necessary runtime predicates to achieve the motivated transformations.
With the current design, almost every loop transformation that uses DA will likely generate the same set of runtime predicates. This could easily lead to a code-size explosion. I think we need some mechanism to cache/share the results of DA, but it will be quite challenging.

sebpop requested review from Meinersbur, kasuga-fj and sjoerdmeijer December 4, 2025 18:20

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Dec 4, 2025

sebpop mentioned this pull request Dec 5, 2025

[DA] batch delinearization #170519

Open

sebpop added 2 commits December 5, 2025 09:45

sebpop force-pushed the 4-runtime-predicates branch from acfe574 to 240badd Compare December 5, 2025 16:05

sjoerdmeijer reviewed Dec 5, 2025

View reviewed changes

llvm/lib/Analysis/Delinearization.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/Delinearization.cpp Outdated Show resolved Hide resolved

kasuga-fj reviewed Dec 5, 2025

View reviewed changes

sebpop requested a review from nikic as a code owner December 5, 2025 21:22

sebpop mentioned this pull request Dec 11, 2025

[LoopInterchange] Motivating example for interchange. NFC. #171631

Merged

		; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
		; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10

[DA] runtime predicates for delinearization bounds checks #170713

Are you sure you want to change the base?

[DA] runtime predicates for delinearization bounds checks #170713

Uh oh!

Conversation

sebpop commented Dec 4, 2025

Uh oh!

llvmbot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Failed Tests

LLVM

Uh oh!

kasuga-fj left a comment

Choose a reason for hiding this comment

Uh oh!

kasuga-fj Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

sebpop Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

kasuga-fj Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

sebpop commented Dec 5, 2025

Uh oh!

nikic commented Dec 6, 2025

Uh oh!

sjoerdmeijer commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amehsan commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Dec 9, 2025

Uh oh!

sebpop commented Dec 9, 2025

Uh oh!

kasuga-fj commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

llvmbot commented Dec 4, 2025 •

edited

Loading

github-actions bot commented Dec 5, 2025 •

edited

Loading

sjoerdmeijer commented Dec 7, 2025 •

edited

Loading

amehsan commented Dec 8, 2025 •

edited

Loading