-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[DA] runtime predicates for delinearization bounds checks #170713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Sebastian Pop (sebpop) ChangesWhen compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions. Patch is 84.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170713.diff 29 Files Affected:
diff --git a/llvm/include/llvm/Analysis/Delinearization.h b/llvm/include/llvm/Analysis/Delinearization.h
index 8fb30925b1ba7..7346128c0b510 100644
--- a/llvm/include/llvm/Analysis/Delinearization.h
+++ b/llvm/include/llvm/Analysis/Delinearization.h
@@ -26,6 +26,7 @@ class GetElementPtrInst;
class Instruction;
class ScalarEvolution;
class SCEV;
+class SCEVPredicate;
/// Compute the array dimensions Sizes from the set of Terms extracted from
/// the memory access function of this SCEVAddRecExpr (second step of
@@ -144,11 +145,13 @@ bool delinearizeFixedSizeArray(ScalarEvolution &SE, const SCEV *Expr,
/// Check that each subscript in \p Subscripts is within the corresponding size
/// in \p Sizes. For the outermost dimension, the subscript being negative is
/// allowed. If \p Ptr is not nullptr, it may be used to get information from
-/// the IR pointer value, which may help in the validation.
-bool validateDelinearizationResult(ScalarEvolution &SE,
- ArrayRef<const SCEV *> Sizes,
- ArrayRef<const SCEV *> Subscripts,
- const Value *Ptr = nullptr);
+/// the IR pointer value, which may help in the validation. If \p Assume is not
+/// nullptr and a compile-time check fails, runtime predicates are added to
+/// \p Assume instead of returning false.
+bool validateDelinearizationResult(
+ ScalarEvolution &SE, ArrayRef<const SCEV *> Sizes,
+ ArrayRef<const SCEV *> Subscripts, const Value *Ptr = nullptr,
+ SmallVectorImpl<const SCEVPredicate *> *Assume = nullptr);
/// Gathers the individual index expressions from a GEP instruction.
///
diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 6dec24fc9f104..80095e91fcc6b 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -754,7 +754,8 @@ class DependenceInfo {
/// Given a linear access function, tries to recover subscripts
/// for each dimension of the array element access.
bool tryDelinearize(Instruction *Src, Instruction *Dst,
- SmallVectorImpl<Subscript> &Pair);
+ SmallVectorImpl<Subscript> &Pair,
+ SmallVectorImpl<const SCEVPredicate *> &Assume);
/// Tries to delinearize \p Src and \p Dst access functions for a fixed size
/// multi-dimensional array. Calls delinearizeFixedSizeArray() to delinearize
@@ -762,7 +763,8 @@ class DependenceInfo {
bool tryDelinearizeFixedSize(Instruction *Src, Instruction *Dst,
const SCEV *SrcAccessFn, const SCEV *DstAccessFn,
SmallVectorImpl<const SCEV *> &SrcSubscripts,
- SmallVectorImpl<const SCEV *> &DstSubscripts);
+ SmallVectorImpl<const SCEV *> &DstSubscripts,
+ SmallVectorImpl<const SCEVPredicate *> &Assume);
/// Tries to delinearize access function for a multi-dimensional array with
/// symbolic runtime sizes.
@@ -771,7 +773,8 @@ class DependenceInfo {
tryDelinearizeParametricSize(Instruction *Src, Instruction *Dst,
const SCEV *SrcAccessFn, const SCEV *DstAccessFn,
SmallVectorImpl<const SCEV *> &SrcSubscripts,
- SmallVectorImpl<const SCEV *> &DstSubscripts);
+ SmallVectorImpl<const SCEV *> &DstSubscripts,
+ SmallVectorImpl<const SCEVPredicate *> &Assume);
/// checkSubscript - Helper function for checkSrcSubscript and
/// checkDstSubscript to avoid duplicate code
diff --git a/llvm/lib/Analysis/Delinearization.cpp b/llvm/lib/Analysis/Delinearization.cpp
index 7bf83ccf9c172..68928c62ab569 100644
--- a/llvm/lib/Analysis/Delinearization.cpp
+++ b/llvm/lib/Analysis/Delinearization.cpp
@@ -753,24 +753,34 @@ static bool isKnownLessThan(ScalarEvolution *SE, const SCEV *S,
return SE->isKnownNegative(LimitedBound);
}
-bool llvm::validateDelinearizationResult(ScalarEvolution &SE,
- ArrayRef<const SCEV *> Sizes,
- ArrayRef<const SCEV *> Subscripts,
- const Value *Ptr) {
+bool llvm::validateDelinearizationResult(
+ ScalarEvolution &SE, ArrayRef<const SCEV *> Sizes,
+ ArrayRef<const SCEV *> Subscripts, const Value *Ptr,
+ SmallVectorImpl<const SCEVPredicate *> *Assume) {
// Sizes and Subscripts are as follows:
- //
// Sizes: [UNK][S_2]...[S_n]
// Subscripts: [I_1][I_2]...[I_n]
//
// where the size of the outermost dimension is unknown (UNK).
+ // Unify types of two SCEVs to the wider type.
+ auto UnifyTypes =
+ [&](const SCEV *&A,
+ const SCEV *&B) -> std::pair<const SCEV *, const SCEV *> {
+ Type *WiderType = SE.getWiderType(A->getType(), B->getType());
+ return {SE.getNoopOrSignExtend(A, WiderType),
+ SE.getNoopOrSignExtend(B, WiderType)};
+ };
+
auto AddOverflow = [&](const SCEV *A, const SCEV *B) -> const SCEV * {
+ std::tie(A, B) = UnifyTypes(A, B);
if (!SE.willNotOverflow(Instruction::Add, /*IsSigned=*/true, A, B))
return nullptr;
return SE.getAddExpr(A, B);
};
auto MulOverflow = [&](const SCEV *A, const SCEV *B) -> const SCEV * {
+ std::tie(A, B) = UnifyTypes(A, B);
if (!SE.willNotOverflow(Instruction::Mul, /*IsSigned=*/true, A, B))
return nullptr;
return SE.getMulExpr(A, B);
@@ -780,10 +790,28 @@ bool llvm::validateDelinearizationResult(ScalarEvolution &SE,
for (size_t I = 1; I < Sizes.size(); ++I) {
const SCEV *Size = Sizes[I - 1];
const SCEV *Subscript = Subscripts[I];
- if (!isKnownNonNegative(&SE, Subscript, Ptr))
- return false;
- if (!isKnownLessThan(&SE, Subscript, Size))
- return false;
+
+ // Check Subscript >= 0.
+ if (!isKnownNonNegative(&SE, Subscript, Ptr)) {
+ if (!Assume)
+ return false;
+ const SCEVPredicate *Pred = SE.getComparePredicate(
+ ICmpInst::ICMP_SGE, Subscript, SE.getZero(Subscript->getType()));
+ Assume->push_back(Pred);
+ }
+
+ // Check Subscript < Size.
+ if (!isKnownLessThan(&SE, Subscript, Size)) {
+ if (!Assume)
+ return false;
+ // Need to unify types before creating the predicate.
+ Type *WiderType = SE.getWiderType(Subscript->getType(), Size->getType());
+ const SCEV *SubscriptExt = SE.getNoopOrSignExtend(Subscript, WiderType);
+ const SCEV *SizeExt = SE.getNoopOrSignExtend(Size, WiderType);
+ const SCEVPredicate *Pred =
+ SE.getComparePredicate(ICmpInst::ICMP_SLT, SubscriptExt, SizeExt);
+ Assume->push_back(Pred);
+ }
}
// The offset computation is as follows:
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 9b9c80a9b3266..858cbafdc3a0a 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -3176,8 +3176,9 @@ const SCEV *DependenceInfo::getUpperBound(BoundInfo *Bound) const {
/// source and destination array references are recurrences on a nested loop,
/// this function flattens the nested recurrences into separate recurrences
/// for each loop level.
-bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
- SmallVectorImpl<Subscript> &Pair) {
+bool DependenceInfo::tryDelinearize(
+ Instruction *Src, Instruction *Dst, SmallVectorImpl<Subscript> &Pair,
+ SmallVectorImpl<const SCEVPredicate *> &Assume) {
assert(isLoadOrStore(Src) && "instruction is not load or store");
assert(isLoadOrStore(Dst) && "instruction is not load or store");
Value *SrcPtr = getLoadStorePointerOperand(Src);
@@ -3197,9 +3198,9 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
SmallVector<const SCEV *, 4> SrcSubscripts, DstSubscripts;
if (!tryDelinearizeFixedSize(Src, Dst, SrcAccessFn, DstAccessFn,
- SrcSubscripts, DstSubscripts) &&
+ SrcSubscripts, DstSubscripts, Assume) &&
!tryDelinearizeParametricSize(Src, Dst, SrcAccessFn, DstAccessFn,
- SrcSubscripts, DstSubscripts))
+ SrcSubscripts, DstSubscripts, Assume))
return false;
assert(isLoopInvariant(SrcBase, SrcLoop) &&
@@ -3245,7 +3246,8 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
bool DependenceInfo::tryDelinearizeFixedSize(
Instruction *Src, Instruction *Dst, const SCEV *SrcAccessFn,
const SCEV *DstAccessFn, SmallVectorImpl<const SCEV *> &SrcSubscripts,
- SmallVectorImpl<const SCEV *> &DstSubscripts) {
+ SmallVectorImpl<const SCEV *> &DstSubscripts,
+ SmallVectorImpl<const SCEVPredicate *> &Assume) {
LLVM_DEBUG({
const SCEVUnknown *SrcBase =
dyn_cast<SCEVUnknown>(SE->getPointerBase(SrcAccessFn));
@@ -3285,10 +3287,12 @@ bool DependenceInfo::tryDelinearizeFixedSize(
// dimensions. For example some C language usage/interpretation make it
// impossible to verify this at compile-time. As such we can only delinearize
// iff the subscripts are positive and are less than the range of the
- // dimension.
+ // dimension. If compile-time checks fail, add runtime predicates.
if (!DisableDelinearizationChecks) {
- if (!validateDelinearizationResult(*SE, SrcSizes, SrcSubscripts, SrcPtr) ||
- !validateDelinearizationResult(*SE, DstSizes, DstSubscripts, DstPtr)) {
+ if (!validateDelinearizationResult(*SE, SrcSizes, SrcSubscripts, SrcPtr,
+ &Assume) ||
+ !validateDelinearizationResult(*SE, DstSizes, DstSubscripts, DstPtr,
+ &Assume)) {
SrcSubscripts.clear();
DstSubscripts.clear();
return false;
@@ -3305,7 +3309,8 @@ bool DependenceInfo::tryDelinearizeFixedSize(
bool DependenceInfo::tryDelinearizeParametricSize(
Instruction *Src, Instruction *Dst, const SCEV *SrcAccessFn,
const SCEV *DstAccessFn, SmallVectorImpl<const SCEV *> &SrcSubscripts,
- SmallVectorImpl<const SCEV *> &DstSubscripts) {
+ SmallVectorImpl<const SCEV *> &DstSubscripts,
+ SmallVectorImpl<const SCEVPredicate *> &Assume) {
Value *SrcPtr = getLoadStorePointerOperand(Src);
Value *DstPtr = getLoadStorePointerOperand(Dst);
@@ -3346,15 +3351,13 @@ bool DependenceInfo::tryDelinearizeParametricSize(
SrcSubscripts.size() != DstSubscripts.size())
return false;
- // Statically check that the array bounds are in-range. The first subscript we
- // don't have a size for and it cannot overflow into another subscript, so is
- // always safe. The others need to be 0 <= subscript[i] < bound, for both src
- // and dst.
- // FIXME: It may be better to record these sizes and add them as constraints
- // to the dependency checks.
+ // Check that the array bounds are in-range. If compile-time checks fail,
+ // add runtime predicates.
if (!DisableDelinearizationChecks)
- if (!validateDelinearizationResult(*SE, Sizes, SrcSubscripts, SrcPtr) ||
- !validateDelinearizationResult(*SE, Sizes, DstSubscripts, DstPtr))
+ if (!validateDelinearizationResult(*SE, Sizes, SrcSubscripts, SrcPtr,
+ &Assume) ||
+ !validateDelinearizationResult(*SE, Sizes, DstSubscripts, DstPtr,
+ &Assume))
return false;
return true;
@@ -3507,7 +3510,7 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
SCEVUnionPredicate(Assume, *SE));
if (Delinearize) {
- if (tryDelinearize(Src, Dst, Pair)) {
+ if (tryDelinearize(Src, Dst, Pair, Assume)) {
LLVM_DEBUG(dbgs() << " delinearized\n");
Pairs = Pair.size();
}
diff --git a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
index 6dde8844c6040..6e75887db06d4 100644
--- a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
@@ -46,9 +46,14 @@ define void @banerjee0(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
;
; DELIN-LABEL: 'banerjee0'
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
-; DELIN-NEXT: da analyze - none!
+; DELIN-NEXT: da analyze - consistent output [0 0]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT: da analyze - flow [<= <>]!
+; DELIN-NEXT: da analyze - consistent flow [0 1]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
@@ -132,12 +137,18 @@ define void @banerjee1(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
; DELIN-LABEL: 'banerjee1'
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
; DELIN-NEXT: da analyze - output [* *]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %2 = load i64, ptr %arrayidx6, align 8
; DELIN-NEXT: da analyze - flow [* <>]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: %2 = load i64, ptr %arrayidx6, align 8 --> Dst: %2 = load i64, ptr %arrayidx6, align 8
; DELIN-NEXT: da analyze - input [* *]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {0,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: %2 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: store i64 %2, ptr %B.addr.12, align 8 --> Dst: store i64 %2, ptr %B.addr.12, align 8
@@ -320,11 +331,16 @@ define void @banerjee3(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
; DELIN-NEXT: da analyze - none!
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT: da analyze - flow [> >]!
+; DELIN-NEXT: da analyze - consistent flow [-9 -9]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT: da analyze - none!
+; DELIN-NEXT: da analyze - consistent input [0 0]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT: Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -490,11 +506,16 @@ define void @banerjee5(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
; DELIN-NEXT: da analyze - none!
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT: da analyze - flow [< <]!
+; DELIN-NEXT: da analyze - consistent flow [9 9]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: %0 = load i64, ptr %arrayidx6, align 8
-; DELIN-NEXT: da analyze - none!
+; DELIN-NEXT: da analyze - consistent input [0 0]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
+; DELIN-NEXT: Compare predicate: {-9,+,1}<nsw><%for.body3> sge) 0
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -575,11 +596,16 @@ define void @banerjee6(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
; DELIN-NEXT: da analyze - none!
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT: da analyze - flow [=> <>]!
+; DELIN-NEXT: da analyze - consistent flow [0 -9]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT: da analyze - none!
+; DELIN-NEXT: da analyze - consistent input [0 0]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT: Compare predicate: {9,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -745,11 +771,16 @@ define void @banerjee8(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
; DELIN-NEXT: da analyze - none!
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT: da analyze - flow [> <>]!
+; DELIN-NEXT: da analyze - consistent flow [-1 -1]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: %0 = load i64, ptr %arrayidx7, align 8
-; DELIN-NEXT: da analyze - none!
+; DELIN-NEXT: da analyze - consistent input [0 0]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
+; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10
; DELIN-NEXT: Src: %0 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
; DELIN-NEXT: da analyze - confused!
; DELIN-NEXT: Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
@@ -828,9 +859,14 @@ define void @banerjee9(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
;
; DELIN-LABEL: 'banerjee9'
; DELIN-NEXT: Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
-; DELIN-NEXT: da analyze - output [* *]!
+; DELIN-NEXT: da analyze - consistent output [0 0]!
+; DELIN-NEXT: Runtime Assumptions:
+; DELI...
[truncated]
|
When compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions.
When compile-time overflow checks (for Prod, Min, and Max offset computations) fail, add runtime SCEV predicates using the equality-based overflow detection pattern: (sext A) op (sext B) == sext(A op B). This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions. This extends the runtime predicate support from PR llvm#170713 to also cover the overflow validation checks added in PR llvm#169902.
acfe574 to
240badd
Compare
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) LLVMLLVM.Analysis/DDG/basic-loopnest.llLLVM.Transforms/LICM/lnicm.llIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
kasuga-fj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level thought: I think the runtime prediction feature should be removed for now, and re-introduced after the functionality like BatchDA is implemented. Current implementation looks a bit ugly to me, since each time the depends is invoked, same predicates will keep getting added. Probably a better design would be to use a single set of assumptions throughout the analysis.
| ; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10 | ||
| ; DELIN-NEXT: Compare predicate: {1,+,1}<nuw><nsw><%for.body3> slt) 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think assumptions are added too excessively. For example, these two predicates obviously don't hold. Probably we should not insert a predicate if we know it to be false at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redundant predicates happen because we validate both source and destination subscripts separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not duplication. {1,+,1}<nuw><nsw><%for.body3> slt) 10 will never be satisfied because the BTC of for.body3 is 10. I think such predication should not be added.
Refactor the overflow check runtime predicate generation to reuse the same pattern used by ScalarEvolution::willNotOverflow(). Instead of duplicating the ext(LHS op RHS) == ext(LHS) op ext(RHS) pattern in Delinearization.cpp, add a new getNoOverflowPredicate() method to ScalarEvolution that returns the predicate (or nullptr if no-overflow is already provable at compile time). This addresses review feedback to avoid code duplication between willNotOverflow() and the runtime predicate generation in delinearization.
When validating both source and destination subscripts, the same predicate can be generated multiple times (e.g., when both access A[i][j]). Add a helper AddPredicate() that checks if a predicate already exists before adding it to avoid duplicates.
I know how "batch delinearization" works: collect all data references in the current function, sort by base pointers, and perform delinearization on all arrays accessing the same base pointer. "batchDA" is different, and I believe the loop optimizations would need to bound or focus DA to a given region containing loops. |
|
Just as a high-level question: Do we want to introduce more runtime predicates at this point in time? Given that there is a lot of active ongoing work on correctness issues in DA/Delinearization, it may be better to delay this until a later time, just to make it easier to do further code changes, without also having to maintain the predicates along the way. I've been out for a while and am not up to date on the DA/Delinearization work, but IIRC these runtime predicates are not yet used anywhere, right? They only appear in the analysis output for now? I expect that actually making use of them will be quite challenging, because of the need to trade off the value of the transform with the probability of passing the runtime check and the code size increase of performing loop versioning. Having a lot of runtime checks implemented will probably make this harder than introducing them gradually and evaluating which of them are actually valuable in practice. On the other hand, I could see value in doing this now if this helps with testing somehow, e.g. because it allows us to easily test situations in DA that would otherwise be hard to test because of delinearization failures. Is something like this the case? |
Welcome back @nikic . :-) I think the above is fair assessment of the situation. I would like to add a few things though. First of all, there has been a lot of discussions about two different approaches to deal with wrapping behaviour. This spans different tickets and many weeks, and it is difficult to catch up on. The following is my attempt to briefly summarise the situation. @kasuga-fj found a problem with monotonicity as it is currently defined/implemented. At different program scopes, monotonicty may or may not hold due to conditions or loop guards. He is looking at defining "iteration domains" where monotonicity holds. Another school of thought pursued by @amehsan is saying that monotonicity is a special case of wrapping or the lack thereof, so monotonicity as concept may not be necessary if dependence test can be adapted to be accurate for these cases. Both are prototyping their approach, which then allows us to look at this and see what the way forward is. Besides this, we also have the runtime predicates. The way I look at this as follows:
But I agree that we haven't shown how to use these runtime predicates when it comes to actually performing a transformation. What I propose is the following:
So I will add the regression test first which I will do early this week, and after that start working on using the runtime predicates in interchange. |
I am not planning to comment on this PR. But I thought to share how I think about the discussion between myself @kasuga-fj Original concern of @kasuga-fj (for example stated here: #162281 (comment)) was that "each dependence test assumes monotonicity over the entire iteration space". Now he agrees that we can prove the correctness of dependence tests without assuming monotonicity over the entire iteration space. So for one of the two tests in strong SIV, he has dropped this requirement from his implementation. For the other test in Strong SIV or Symbolic RDIV, that he checks this requirement in his code, the reason is not correctness. His point of view at this time is that using the assumption of "monotonicy over the entire iteration space" results in simpler code and simpler proof. So at this point the views are closer together. My objection to the concept of "montonicity" is a somewhat different issue. |
I don't think we can prove the correctness without monotonicity over the entire iteration space. As for SIV (non-MIV), that condition means both of the following hold:
What I agree with is that some part of the code doesn't use an exact BTC. That said, this is off-topic for this PR, so I don't intend to continue it further here. |
The decision to perform versioning is with the loop transform passes. DA is supposed to only provide a minimal list of predicates under which the DA result holds. Currently, when the info is missing, the result of DA is just "don't know" on which the LNO gives up with no choice. Fewer runtime predicates will be generated with more info sent from the front-ends.
Correct, both Polly and the vectorizer use runtime predicates to check for alias, dependences, and possible values of parameters. |
|
Since I hadn't answered the questions, let me answer them here.
Yes, that's correct.
At least for this PR, this is not the case. This PR tries to add predicates to make the validation of delinearization succeed, but we already have the option Basically, I agree with postponing the runtime predicates functionality. Even if it appears orthogonal to other DA works, I think that generally increasing code complexity can make those works more difficult. Also, it's unclear which predicates we should use in runtime checks. At least we need to think about the followings:
|
When compile-time checks fail, rely on runtime SCEV predicates, instead of failing delinearization entirely. This allows delinearization to succeed in more cases where compile-time proofs are not possible, enabling more precise dependence analysis under runtime assumptions.