Skip to content

Conversation

@sebpop
Copy link
Contributor

@sebpop sebpop commented Dec 4, 2025

When compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions.

@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Dec 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 4, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Sebastian Pop (sebpop)

Changes

When compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions.


Patch is 84.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170713.diff

29 Files Affected:

  • (modified) llvm/include/llvm/Analysis/Delinearization.h (+8-5)
  • (modified) llvm/include/llvm/Analysis/DependenceAnalysis.h (+6-3)
  • (modified) llvm/lib/Analysis/Delinearization.cpp (+37-9)
  • (modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+21-18)
  • (modified) llvm/test/Analysis/DependenceAnalysis/Banerjee.ll (+58-14)
  • (modified) llvm/test/Analysis/DependenceAnalysis/Constraints.ll (+28-7)
  • (modified) llvm/test/Analysis/DependenceAnalysis/DADelin.ll (+32-1)
  • (modified) llvm/test/Analysis/DependenceAnalysis/DifferentOffsets.ll (+6)
  • (modified) llvm/test/Analysis/DependenceAnalysis/ExactRDIV.ll (+16-4)
  • (modified) llvm/test/Analysis/DependenceAnalysis/GCD.ll (+57-9)
  • (modified) llvm/test/Analysis/DependenceAnalysis/Invariant.ll (+1-2)
  • (modified) llvm/test/Analysis/DependenceAnalysis/MismatchingNestLevels.ll (+4)
  • (modified) llvm/test/Analysis/DependenceAnalysis/NonCanonicalizedSubscript.ll (+5-2)
  • (modified) llvm/test/Analysis/DependenceAnalysis/Preliminary.ll (+9)
  • (modified) llvm/test/Analysis/DependenceAnalysis/PreliminaryNoValidityCheckFixedSize.ll (+9)
  • (modified) llvm/test/Analysis/DependenceAnalysis/Propagating.ll (+36-10)
  • (modified) llvm/test/Analysis/DependenceAnalysis/SimpleSIVNoValidityCheck.ll (+27)
  • (modified) llvm/test/Analysis/DependenceAnalysis/StrongSIV.ll (+3)
  • (modified) llvm/test/Analysis/DependenceAnalysis/SymbolicSIV.ll (+14)
  • (modified) llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll (+4)
  • (modified) llvm/test/Analysis/DependenceAnalysis/WeakZeroDstSIV.ll (+1)
  • (modified) llvm/test/Analysis/DependenceAnalysis/WeakZeroSrcSIV.ll (+1)
  • (modified) llvm/test/Analysis/DependenceAnalysis/becount-couldnotcompute.ll (+1)
  • (modified) llvm/test/Analysis/DependenceAnalysis/compute-absolute-value.ll (+4)
  • (modified) llvm/test/Analysis/DependenceAnalysis/gcd-miv-overflow.ll (+7)
  • (modified) llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll (+2)
  • (modified) llvm/test/Analysis/DependenceAnalysis/monotonicity-no-wrap-flags.ll (+2)
  • (modified) llvm/test/Analysis/DependenceAnalysis/zero-coefficient.ll (+1)
  • (modified) llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll (+7-7)
diff --git a/llvm/include/llvm/Analysis/Delinearization.h b/llvm/include/llvm/Analysis/Delinearization.h
index 8fb30925b1ba7..7346128c0b510 100644
--- a/llvm/include/llvm/Analysis/Delinearization.h
+++ b/llvm/include/llvm/Analysis/Delinearization.h
@@ -26,6 +26,7 @@ class GetElementPtrInst;
 class Instruction;
 class ScalarEvolution;
 class SCEV;
+class SCEVPredicate;
 
 /// Compute the array dimensions Sizes from the set of Terms extracted from
 /// the memory access function of this SCEVAddRecExpr (second step of
@@ -144,11 +145,13 @@ bool delinearizeFixedSizeArray(ScalarEvolution &SE, const SCEV *Expr,
 /// Check that each subscript in \p Subscripts is within the corresponding size
 /// in \p Sizes. For the outermost dimension, the subscript being negative is
 /// allowed. If \p Ptr is not nullptr, it may be used to get information from
-/// the IR pointer value, which may help in the validation.
-bool validateDelinearizationResult(ScalarEvolution &SE,
-                                   ArrayRef<const SCEV *> Sizes,
-                                   ArrayRef<const SCEV *> Subscripts,
-                                   const Value *Ptr = nullptr);
+/// the IR pointer value, which may help in the validation. If \p Assume is not
+/// nullptr and a compile-time check fails, runtime predicates are added to
+/// \p Assume instead of returning false.
+bool validateDelinearizationResult(
+    ScalarEvolution &SE, ArrayRef<const SCEV *> Sizes,
+    ArrayRef<const SCEV *> Subscripts, const Value *Ptr = nullptr,
+    SmallVectorImpl<const SCEVPredicate *> *Assume = nullptr);
 
 /// Gathers the individual index expressions from a GEP instruction.
 ///
diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 6dec24fc9f104..80095e91fcc6b 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -754,7 +754,8 @@ class DependenceInfo {
   /// Given a linear access function, tries to recover subscripts
   /// for each dimension of the array element access.
   bool tryDelinearize(Instruction *Src, Instruction *Dst,
-                      SmallVectorImpl<Subscript> &Pair);
+                      SmallVectorImpl<Subscript> &Pair,
+                      SmallVectorImpl<const SCEVPredicate *> &Assume);
 
   /// Tries to delinearize \p Src and \p Dst access functions for a fixed size
   /// multi-dimensional array. Calls delinearizeFixedSizeArray() to delinearize
@@ -762,7 +763,8 @@ class DependenceInfo {
   bool tryDelinearizeFixedSize(Instruction *Src, Instruction *Dst,
                                const SCEV *SrcAccessFn, const SCEV *DstAccessFn,
                                SmallVectorImpl<const SCEV *> &SrcSubscripts,
-                               SmallVectorImpl<const SCEV *> &DstSubscripts);
+                               SmallVectorImpl<const SCEV *> &DstSubscripts,
+                               SmallVectorImpl<const SCEVPredicate *> &Assume);
 
   /// Tries to delinearize access function for a multi-dimensional array with
   /// symbolic runtime sizes.
@@ -771,7 +773,8 @@ class DependenceInfo {
   tryDelinearizeParametricSize(Instruction *Src, Instruction *Dst,
                                const SCEV *SrcAccessFn, const SCEV *DstAccessFn,
                                SmallVectorImpl<const SCEV *> &SrcSubscripts,
-                               SmallVectorImpl<const SCEV *> &DstSubscripts);
+                               SmallVectorImpl<const SCEV *> &DstSubscripts,
+                               SmallVectorImpl<const SCEVPredicate *> &Assume);
 
   /// checkSubscript - Helper function for checkSrcSubscript and
   /// checkDstSubscript to avoid duplicate code
diff --git a/llvm/lib/Analysis/Delinearization.cpp b/llvm/lib/Analysis/Delinearization.cpp
index 7bf83ccf9c172..68928c62ab569 100644
--- a/llvm/lib/Analysis/Delinearization.cpp
+++ b/llvm/lib/Analysis/Delinearization.cpp
@@ -753,24 +753,34 @@ static bool isKnownLessThan(ScalarEvolution *SE, const SCEV *S,
   return SE->isKnownNegative(LimitedBound);
 }
 
-bool llvm::validateDelinearizationResult(ScalarEvolution &SE,
-                                         ArrayRef<const SCEV *> Sizes,
-                                         ArrayRef<const SCEV *> Subscripts,
-                                         const Value *Ptr) {
+bool llvm::validateDelinearizationResult(
+    ScalarEvolution &SE, ArrayRef<const SCEV *> Sizes,
+    ArrayRef<const SCEV *> Subscripts, const Value *Ptr,
+    SmallVectorImpl<const SCEVPredicate *> *Assume) {
   // Sizes and Subscripts are as follows:
-  //
   //   Sizes:      [UNK][S_2]...[S_n]
   //   Subscripts: [I_1][I_2]...[I_n]
   //
   // where the size of the outermost dimension is unknown (UNK).
 
+  // Unify types of two SCEVs to the wider type.
+  auto UnifyTypes =
+      [&](const SCEV *&A,
+          const SCEV *&B) -> std::pair<const SCEV *, const SCEV *> {
+    Type *WiderType = SE.getWiderType(A->getType(), B->getType());
+    return {SE.getNoopOrSignExtend(A, WiderType),
+            SE.getNoopOrSignExtend(B, WiderType)};
+  };
+
   auto AddOverflow = [&](const SCEV *A, const SCEV *B) -> const SCEV * {
+    std::tie(A, B) = UnifyTypes(A, B);
     if (!SE.willNotOverflow(Instruction::Add, /*IsSigned=*/true, A, B))
       return nullptr;
     return SE.getAddExpr(A, B);
   };
 
   auto MulOverflow = [&](const SCEV *A, const SCEV *B) -> const SCEV * {
+    std::tie(A, B) = UnifyTypes(A, B);
     if (!SE.willNotOverflow(Instruction::Mul, /*IsSigned=*/true, A, B))
       return nullptr;
     return SE.getMulExpr(A, B);
@@ -780,10 +790,28 @@ bool llvm::validateDelinearizationResult(ScalarEvolution &SE,
   for (size_t I = 1; I < Sizes.size(); ++I) {
     const SCEV *Size = Sizes[I - 1];
     const SCEV *Subscript = Subscripts[I];
-    if (!isKnownNonNegative(&SE, Subscript, Ptr))
-      return false;
-    if (!isKnownLessThan(&SE, Subscript, Size))
-      return false;
+
+    // Check Subscript >= 0.
+    if (!isKnownNonNegative(&SE, Subscript, Ptr)) {
+      if (!Assume)
+        return false;
+      const SCEVPredicate *Pred = SE.getComparePredicate(
+          ICmpInst::ICMP_SGE, Subscript, SE.getZero(Subscript->getType()));
+      Assume->push_back(Pred);
+    }
+
+    // Check Subscript < Size.
+    if (!isKnownLessThan(&SE, Subscript, Size)) {
+      if (!Assume)
+        return false;
+      // Need to unify types before creating the predicate.
+      Type *WiderType = SE.getWiderType(Subscript->getType(), Size->getType());
+      const SCEV *SubscriptExt = SE.getNoopOrSignExtend(Subscript, WiderType);
+      const SCEV *SizeExt = SE.getNoopOrSignExtend(Size, WiderType);
+      const SCEVPredicate *Pred =
+          SE.getComparePredicate(ICmpInst::ICMP_SLT, SubscriptExt, SizeExt);
+      Assume->push_back(Pred);
+    }
   }
 
   // The offset computation is as follows:
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 9b9c80a9b3266..858cbafdc3a0a 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -3176,8 +3176,9 @@ const SCEV *DependenceInfo::getUpperBound(BoundInfo *Bound) const {
 /// source and destination array references are recurrences on a nested loop,
 /// this function flattens the nested recurrences into separate recurrences
 /// for each loop level.
-bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
-                                    SmallVectorImpl<Subscript> &Pair) {
+bool DependenceInfo::tryDelinearize(
+    Instruction *Src, Instruction *Dst, SmallVectorImpl<Subscript> &Pair,
+    SmallVectorImpl<const SCEVPredicate *> &Assume) {
   assert(isLoadOrStore(Src) && "instruction is not load or store");
   assert(isLoadOrStore(Dst) && "instruction is not load or store");
   Value *SrcPtr = getLoadStorePointerOperand(Src);
@@ -3197,9 +3198,9 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
   SmallVector<const SCEV *, 4> SrcSubscripts, DstSubscripts;
 
   if (!tryDelinearizeFixedSize(Src, Dst, SrcAccessFn, DstAccessFn,
-                               SrcSubscripts, DstSubscripts) &&
+                               SrcSubscripts, DstSubscripts, Assume) &&
       !tryDelinearizeParametricSize(Src, Dst, SrcAccessFn, DstAccessFn,
-                                    SrcSubscripts, DstSubscripts))
+                                    SrcSubscripts, DstSubscripts, Assume))
     return false;
 
   assert(isLoopInvariant(SrcBase, SrcLoop) &&
@@ -3245,7 +3246,8 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
 bool DependenceInfo::tryDelinearizeFixedSize(
     Instruction *Src, Instruction *Dst, const SCEV *SrcAccessFn,
     const SCEV *DstAccessFn, SmallVectorImpl<const SCEV *> &SrcSubscripts,
-    SmallVectorImpl<const SCEV *> &DstSubscripts) {
+    SmallVectorImpl<const SCEV *> &DstSubscripts,
+    SmallVectorImpl<const SCEVPredicate *> &Assume) {
   LLVM_DEBUG({
     const SCEVUnknown *SrcBase =
         dyn_cast<SCEVUnknown>(SE->getPointerBase(SrcAccessFn));
@@ -3285,10 +3287,12 @@ bool DependenceInfo::tryDelinearizeFixedSize(
   // dimensions. For example some C language usage/interpretation make it
   // impossible to verify this at compile-time. As such we can only delinearize
   // iff the subscripts are positive and are less than the range of the
-  // dimension.
+  // dimension. If compile-time checks fail, add runtime predicates.
   if (!DisableDelinearizationChecks) {
-    if (!validateDelinearizationResult(*SE, SrcSizes, SrcSubscripts, SrcPtr) ||
-        !validateDelinearizationResult(*SE, DstSizes, DstSubscripts, DstPtr)) {
+    if (!validateDelinearizationResult(*SE, SrcSizes, SrcSubscripts, SrcPtr,
+                                       &Assume) ||
+        !validateDelinearizationResult(*SE, DstSizes, DstSubscripts, DstPtr,
+                                       &Assume)) {
       SrcSubscripts.clear();
       DstSubscripts.clear();
       return false;
@@ -3305,7 +3309,8 @@ bool DependenceInfo::tryDelinearizeFixedSize(
 bool DependenceInfo::tryDelinearizeParametricSize(
     Instruction *Src, Instruction *Dst, const SCEV *SrcAccessFn,
     const SCEV *DstAccessFn, SmallVectorImpl<const SCEV *> &SrcSubscripts,
-    SmallVectorImpl<const SCEV *> &DstSubscripts) {
+    SmallVectorImpl<const SCEV *> &DstSubscripts,
+    SmallVectorImpl<const SCEVPredicate *> &Assume) {
 
   Value *SrcPtr = getLoadStorePointerOperand(Src);
   Value *DstPtr = getLoadStorePointerOperand(Dst);
@@ -3346,15 +3351,13 @@ bool DependenceInfo::tryDelinearizeParametricSize(
       SrcSubscripts.size() != DstSubscripts.size())
     return false;
 
-  // Statically check that the array bounds are in-range. The first subscript we
-  // don't have a size for and it cannot overflow into another subscript, so is
-  // always safe. The others need to be 0 <= subscript[i] < bound, for both src
-  // and dst.
-  // FIXME: It may be better to record these sizes and add them as constraints
-  // to the dependency checks.
+  // Check that the array bounds are in-range. If compile-time checks fail,
+  // add runtime predicates.
   if (!DisableDelinearizationChecks)
-    if (!validateDelinearizationResult(*SE, Sizes, SrcSubscripts, SrcPtr) ||
-        !validateDelinearizationResult(*SE, Sizes, DstSubscripts, DstPtr))
+    if (!validateDelinearizationResult(*SE, Sizes, SrcSubscripts, SrcPtr,
+                                       &Assume) ||
+        !validateDelinearizationResult(*SE, Sizes, DstSubscripts, DstPtr,
+                                       &Assume))
       return false;
 
   return true;
@@ -3507,7 +3510,7 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
                                           SCEVUnionPredicate(Assume, *SE));
 
   if (Delinearize) {
-    if (tryDelinearize(Src, Dst, Pair)) {
+    if (tryDelinearize(Src, Dst, Pair, Assume)) {
       LLVM_DEBUG(dbgs() << "    delinearized\n");
       Pairs = Pair.size();
     }
diff --git a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
index 6dde8844c6040..6e75887db06d4 100644
--- a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
@@ -46,9 +46,14 @@ define void @banerjee0(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ;
 ; DELIN-LABEL: 'banerjee0'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent output [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT:    da analyze - flow [<= <>]!
+; DELIN-NEXT:    da analyze - consistent flow [0 1]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
@@ -132,12 +137,18 @@ define void @banerjee1(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-LABEL: 'banerjee1'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - output [* *]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %2 = load i64, ptr %arrayidx6, align 8
 ; DELIN-NEXT:    da analyze - flow [* <>]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %2 = load i64, ptr %arrayidx6, align 8 --> Dst: %2 = load i64, ptr %arrayidx6, align 8
 ; DELIN-NEXT:    da analyze - input [* *]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {0,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %2 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %2, ptr %B.addr.12, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
@@ -320,11 +331,16 @@ define void @banerjee3(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - flow [> >]!
+; DELIN-NEXT:    da analyze - consistent flow [-9 -9]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -490,11 +506,16 @@ define void @banerjee5(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT:    da analyze - flow [< <]!
+; DELIN-NEXT:    da analyze - consistent flow [9 9]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
+; DELIN-NEXT:    Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -575,11 +596,16 @@ define void @banerjee6(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - flow [=> <>]!
+; DELIN-NEXT:    da analyze - consistent flow [0 -9]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -745,11 +771,16 @@ define void @banerjee8(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
 ; DELIN-NEXT:    da analyze - none!
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - flow [> <>]!
+; DELIN-NEXT:    da analyze - consistent flow [-1 -1]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - consistent input [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT:    Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -828,9 +859,14 @@ define void @banerjee9(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ;
 ; DELIN-LABEL: 'banerjee9'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
-; DELIN-NEXT:    da analyze - output [* *]!
+; DELIN-NEXT:    da analyze - consistent output [0 0]!
+; DELIN-NEXT:    Runtime Assumptions:
+; DELI...
[truncated]

When compile-time checks fail, rely on runtime SCEV predicates, instead of
failing delinearization entirely.  This allows delinearization to succeed in
more cases where compile-time proofs are not possible, enabling more precise
dependence analysis under runtime assumptions.
When compile-time overflow checks (for Prod, Min, and Max offset
computations) fail, add runtime SCEV predicates using the equality-based
overflow detection pattern: (sext A) op (sext B) == sext(A op B). This
allows delinearization to succeed in more cases where compile-time proofs
are not possible, enabling more precise dependence analysis under runtime
assumptions.

This extends the runtime predicate support from PR llvm#170713 to also
cover the overflow validation checks added in PR llvm#169902.
@sebpop sebpop force-pushed the 4-runtime-predicates branch from acfe574 to 240badd Compare December 5, 2025 16:05
@github-actions
Copy link

github-actions bot commented Dec 5, 2025

🐧 Linux x64 Test Results

  • 166848 tests passed
  • 2911 tests skipped
  • 2 tests failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.Analysis/DDG/basic-loopnest.ll
Exit Code: 0

Command Output (stdout):
--
# RUN: at line 1
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt < /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Analysis/DDG/basic-loopnest.ll -disable-output "-passes=print<ddg>" 2>&1 | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Analysis/DDG/basic-loopnest.ll
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -disable-output '-passes=print<ddg>'
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Analysis/DDG/basic-loopnest.ll
# note: command had no output on stdout or stderr

--

LLVM.Transforms/LICM/lnicm.ll
Exit Code: 0

Command Output (stdout):
--
# RUN: at line 2
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa -passes='loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes INTC
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa '-passes=loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes INTC
# note: command had no output on stdout or stderr
# RUN: at line 3
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa -passes='loop-mssa(lnicm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LNICM
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa '-passes=loop-mssa(lnicm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LNICM
# note: command had no output on stdout or stderr
# RUN: at line 4
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa -passes='loop-mssa(licm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LICM
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -aa-pipeline=basic-aa '-passes=loop-mssa(licm),loop(loop-interchange)' -cache-line-size=64 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LICM/lnicm.ll --check-prefixes LICM
# note: command had no output on stdout or stderr

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level thought: I think the runtime prediction feature should be removed for now, and re-introduced after the functionality like BatchDA is implemented. Current implementation looks a bit ugly to me, since each time the depends is invoked, same predicates will keep getting added. Probably a better design would be to use a single set of assumptions throughout the analysis.

Comment on lines 51 to 52
; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assumptions are added too excessively. For example, these two predicates obviously don't hold. Probably we should not insert a predicate if we know it to be false at compile time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant predicates happen because we validate both source and destination subscripts separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not duplication. {1,+,1}<nuw><nsw><%for.body3> slt) 10 will never be satisfied because the BTC of for.body3 is 10. I think such predication should not be added.

Refactor the overflow check runtime predicate generation to reuse the
same pattern used by ScalarEvolution::willNotOverflow(). Instead of
duplicating the ext(LHS op RHS) == ext(LHS) op ext(RHS) pattern in
Delinearization.cpp, add a new getNoOverflowPredicate() method to
ScalarEvolution that returns the predicate (or nullptr if no-overflow
is already provable at compile time).

This addresses review feedback to avoid code duplication between
willNotOverflow() and the runtime predicate generation in delinearization.
@sebpop sebpop requested a review from nikic as a code owner December 5, 2025 21:22
When validating both source and destination subscripts, the same
predicate can be generated multiple times (e.g., when both access
A[i][j]). Add a helper AddPredicate() that checks if a predicate
already exists before adding it to avoid duplicates.
@sebpop
Copy link
Contributor Author

sebpop commented Dec 5, 2025

functionality like BatchDA is implemented

I know how "batch delinearization" works: collect all data references in the current function, sort by base pointers, and perform delinearization on all arrays accessing the same base pointer.

"batchDA" is different, and I believe the loop optimizations would need to bound or focus DA to a given region containing loops.
DA is currently not bound to a given loop nest or set of loops.
In contrast, Polly is bound to a SCoP which is a single-entry / single-exit region of the CFG.

@nikic
Copy link
Contributor

nikic commented Dec 6, 2025

Just as a high-level question: Do we want to introduce more runtime predicates at this point in time? Given that there is a lot of active ongoing work on correctness issues in DA/Delinearization, it may be better to delay this until a later time, just to make it easier to do further code changes, without also having to maintain the predicates along the way.

I've been out for a while and am not up to date on the DA/Delinearization work, but IIRC these runtime predicates are not yet used anywhere, right? They only appear in the analysis output for now? I expect that actually making use of them will be quite challenging, because of the need to trade off the value of the transform with the probability of passing the runtime check and the code size increase of performing loop versioning. Having a lot of runtime checks implemented will probably make this harder than introducing them gradually and evaluating which of them are actually valuable in practice.

On the other hand, I could see value in doing this now if this helps with testing somehow, e.g. because it allows us to easily test situations in DA that would otherwise be hard to test because of delinearization failures. Is something like this the case?

@sjoerdmeijer
Copy link
Collaborator

sjoerdmeijer commented Dec 7, 2025

Just as a high-level question: Do we want to introduce more runtime predicates at this point in time? Given that there is a lot of active ongoing work on correctness issues in DA/Delinearization, it may be better to delay this until a later time, just to make it easier to do further code changes, without also having to maintain the predicates along the way.

I've been out for a while and am not up to date on the DA/Delinearization work, but IIRC these runtime predicates are not yet used anywhere, right? They only appear in the analysis output for now? I expect that actually making use of them will be quite challenging, because of the need to trade off the value of the transform with the probability of passing the runtime check and the code size increase of performing loop versioning. Having a lot of runtime checks implemented will probably make this harder than introducing them gradually and evaluating which of them are actually valuable in practice.

On the other hand, I could see value in doing this now if this helps with testing somehow, e.g. because it allows us to easily test situations in DA that would otherwise be hard to test because of delinearization failures. Is something like this the case?

Welcome back @nikic . :-)

I think the above is fair assessment of the situation. I would like to add a few things though.

First of all, there has been a lot of discussions about two different approaches to deal with wrapping behaviour. This spans different tickets and many weeks, and it is difficult to catch up on. The following is my attempt to briefly summarise the situation. @kasuga-fj found a problem with monotonicity as it is currently defined/implemented. At different program scopes, monotonicty may or may not hold due to conditions or loop guards. He is looking at defining "iteration domains" where monotonicity holds. Another school of thought pursued by @amehsan is saying that monotonicity is a special case of wrapping or the lack thereof, so monotonicity as concept may not be necessary if dependence test can be adapted to be accurate for these cases. Both are prototyping their approach, which then allows us to look at this and see what the way forward is.

Besides this, we also have the runtime predicates. The way I look at this as follows:

  • this is mostly orthogonal to the work mentioned above, because when dependence tests can't prove (in)dependence, we can proceed dependence checks under these assumptions, so we can handle a larger class of applications. And for now, yes, it also allows us to continue the work when delinearization fails.
  • runtime predicates isn't a new concept, my understanding is that Polly also uses this. Maybe @Meinersbur can confirm and elaborate.
  • The implementation to add runtime checks isn't very difficult, I doubt it will really get into the way of other things.

But I agree that we haven't shown how to use these runtime predicates when it comes to actually performing a transformation. What I propose is the following:

  • I will create an IR reproducer of our proxy workload and motivating example and create a merge request to add this as a loop-interchange regression test,
  • This allows us to see what is necessary to get it interchanged, what dependence checks are doing, and how transforming it could look like using runtime predicates.

So I will add the regression test first which I will do early this week, and after that start working on using the runtime predicates in interchange.

@amehsan
Copy link
Contributor

amehsan commented Dec 8, 2025

First of all, there has been a lot of discussions about two different approaches to deal with wrapping behaviour. This spans different tickets and many weeks, and it is difficult to catch up on. The following is my attempt to briefly summarise the situation. @kasuga-fj found a problem with monotonicity as it is currently defined/implemented. At different program scopes, monotonicty may or may not hold due to conditions or loop guards. He is looking at defining "iteration domains" where monotonicity holds. Another school of thought pursued by @amehsan is saying that monotonicity is a special case of wrapping or the lack thereof, so monotonicity as concept may not be necessary if dependence test can be adapted to be accurate for these cases. Both are prototyping their approach, which then allows us to look at this and see what the way forward is.

I am not planning to comment on this PR. But I thought to share how I think about the discussion between myself @kasuga-fj

Original concern of @kasuga-fj (for example stated here: #162281 (comment)) was that "each dependence test assumes monotonicity over the entire iteration space".

Now he agrees that we can prove the correctness of dependence tests without assuming monotonicity over the entire iteration space. So for one of the two tests in strong SIV, he has dropped this requirement from his implementation.

For the other test in Strong SIV or Symbolic RDIV, that he checks this requirement in his code, the reason is not correctness. His point of view at this time is that using the assumption of "monotonicy over the entire iteration space" results in simpler code and simpler proof.

So at this point the views are closer together. My objection to the concept of "montonicity" is a somewhat different issue.

@kasuga-fj
Copy link
Contributor

Now he agrees that we can prove the correctness of dependence tests without assuming monotonicity over the entire iteration space.

I don't think we can prove the correctness without monotonicity over the entire iteration space. As for SIV (non-MIV), that condition means both of the following hold:

  • The addrec has nsw
  • The exact BTC is computable

What I agree with is that some part of the code doesn't use an exact BTC.

That said, this is off-topic for this PR, so I don't intend to continue it further here.

@sebpop
Copy link
Contributor Author

sebpop commented Dec 9, 2025

I expect that actually making use of them will be quite challenging, because of the need to trade off the value of the transform with the probability of passing the runtime check and the code size increase of performing loop versioning

The decision to perform versioning is with the loop transform passes. DA is supposed to only provide a minimal list of predicates under which the DA result holds. Currently, when the info is missing, the result of DA is just "don't know" on which the LNO gives up with no choice.

Fewer runtime predicates will be generated with more info sent from the front-ends.

  • runtime predicates isn't a new concept, my understanding is that Polly also uses this.

Correct, both Polly and the vectorizer use runtime predicates to check for alias, dependences, and possible values of parameters.

@kasuga-fj
Copy link
Contributor

Since I hadn't answered the questions, let me answer them here.

IIRC these runtime predicates are not yet used anywhere, right? They only appear in the analysis output for now?

Yes, that's correct.

On the other hand, I could see value in doing this now if this helps with testing somehow, e.g. because it allows us to easily test situations in DA that would otherwise be hard to test because of delinearization failures. Is something like this the case?

At least for this PR, this is not the case. This PR tries to add predicates to make the validation of delinearization succeed, but we already have the option -da-disable-delinearization-checks to skip the validation. For testing purpose, this option is sufficient. Also, I don't think that adding runtime predicates would make DA testing easier, not only for delinearization.

Basically, I agree with postponing the runtime predicates functionality. Even if it appears orthogonal to other DA works, I think that generally increasing code complexity can make those works more difficult. Also, it's unclear which predicates we should use in runtime checks. At least we need to think about the followings:

  • Of course, it's better to have a small number of predicates to increase the possibility of passing them.
  • It's not necessarily required to remove the dependency completely. For example, eliminating specific directions from the dependency (e.g., all directions * -> only positive direction <) would be sufficient for some loop transformations.
  • Fixing the correctness issues will degrade the analysis precision. It may change the necessary runtime predicates to achieve the motivated transformations.
  • With the current design, almost every loop transformation that uses DA will likely generate the same set of runtime predicates. This could easily lead to a code-size explosion. I think we need some mechanism to cache/share the results of DA, but it will be quite challenging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants