[VPlan] Don't fold UDiv in replicate regions.#175460
Conversation
The UDiv fold added in d12e993 (llvm#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes llvm#175295.
|
@llvm/pr-subscribers-vectorizers Author: Florian Hahn (fhahn) ChangesThe UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes #175295. Full diff: https://github.com/llvm/llvm-project/pull/175460.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a430f13f0c9c0..19c66e1efb956 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1352,7 +1352,12 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
{A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())},
*cast<VPRecipeWithIRFlags>(Def), Def->getDebugLoc()));
- if (match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) && APC->isPowerOf2())
+ // Don't convert udiv to lshr inside a replicate region, as VPInstructions are
+ // not allowed in them.
+ const VPRegionBlock *ParentRegion = Def->getParent()->getParent();
+ bool IsInReplicateRegion = ParentRegion && ParentRegion->isReplicator();
+ if (!IsInReplicateRegion && match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) &&
+ APC->isPowerOf2())
return Def->replaceAllUsesWith(Builder.createNaryOp(
Instruction::LShr,
{A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())}, {},
diff --git a/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
new file mode 100644
index 0000000000000..45f211b9b5284
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
@@ -0,0 +1,310 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Test case for https://github.com/llvm/llvm-project/issues/175295.
+define i32 @simplify_udiv_1_in_replicate_region(i8 %arg, ptr %src) {
+; CHECK-LABEL: define i32 @simplify_udiv_1_in_replicate_region(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[TMP0:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i8> poison, i8 [[TMP0]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[INDEX]]
+; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 4
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP3]], align 1
+; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x i8> zeroinitializer, <4 x i8> [[TMP1]]
+; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT: [[TMP6:%.*]] = zext <4 x i1> [[TMP5]] to <4 x i32>
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP6]], i32 3
+; CHECK-NEXT: br label %[[SCALAR_PH:.*]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT: [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT: [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT: [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT: br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK: [[THEN]]:
+; CHECK-NEXT: [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT: [[UDIV:%.*]] = udiv i8 [[OR]], 1
+; CHECK-NEXT: br label %[[LATCH]]
+; CHECK: [[LATCH]]:
+; CHECK-NEXT: [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT: [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT: br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret i32 0
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+ %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+ %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+ %l = load i8, ptr %gep.src
+ %c = icmp eq i8 %l, 0
+ br i1 %c, label %latch, label %then
+
+then:
+ %or = or i8 %arg, 1
+ %udiv = udiv i8 %or, 1
+ br label %latch
+
+latch:
+ %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+ %cmp = icmp ne i8 %phi, 0
+ %zext = zext i1 %cmp to i32
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv, 18
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret i32 0
+}
+
+define i32 @simplify_udiv_4_in_replicate_region2(i8 %arg, ptr noalias %src, ptr %dst) {
+; CHECK-LABEL: define i32 @simplify_udiv_4_in_replicate_region2(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr noalias [[SRC:%.*]], ptr [[DST:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE29:.*]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
+; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
+; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
+; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 4
+; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 5
+; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[INDEX]], 6
+; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[INDEX]], 7
+; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[TMP0]]
+; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP8]], i64 4
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP8]], align 1
+; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i8>, ptr [[TMP9]], align 1
+; CHECK-NEXT: [[TMP10:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD1]], zeroinitializer
+; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP0]]
+; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP1]]
+; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP2]]
+; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP3]]
+; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP12]], i32 0
+; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x ptr> [[TMP16]], ptr [[TMP13]], i32 1
+; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x ptr> [[TMP17]], ptr [[TMP14]], i32 2
+; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x ptr> [[TMP18]], ptr [[TMP15]], i32 3
+; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP4]]
+; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP5]]
+; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP6]]
+; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP7]]
+; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP20]], i32 0
+; CHECK-NEXT: [[TMP25:%.*]] = insertelement <4 x ptr> [[TMP24]], ptr [[TMP21]], i32 1
+; CHECK-NEXT: [[TMP26:%.*]] = insertelement <4 x ptr> [[TMP25]], ptr [[TMP22]], i32 2
+; CHECK-NEXT: [[TMP27:%.*]] = insertelement <4 x ptr> [[TMP26]], ptr [[TMP23]], i32 3
+; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT: br i1 [[TMP28]], label %[[PRED_LOAD_IF:.*]], label %[[PRED_LOAD_CONTINUE:.*]]
+; CHECK: [[PRED_LOAD_IF]]:
+; CHECK-NEXT: [[TMP29:%.*]] = load i8, ptr [[TMP12]], align 1
+; CHECK-NEXT: [[TMP30:%.*]] = insertelement <4 x i8> poison, i8 [[TMP29]], i32 0
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE]]
+; CHECK: [[PRED_LOAD_CONTINUE]]:
+; CHECK-NEXT: [[TMP31:%.*]] = phi <4 x i8> [ poison, %[[VECTOR_BODY]] ], [ [[TMP30]], %[[PRED_LOAD_IF]] ]
+; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT: br i1 [[TMP32]], label %[[PRED_LOAD_IF2:.*]], label %[[PRED_LOAD_CONTINUE3:.*]]
+; CHECK: [[PRED_LOAD_IF2]]:
+; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[TMP13]], align 1
+; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x i8> [[TMP31]], i8 [[TMP33]], i32 1
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE3]]
+; CHECK: [[PRED_LOAD_CONTINUE3]]:
+; CHECK-NEXT: [[TMP35:%.*]] = phi <4 x i8> [ [[TMP31]], %[[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], %[[PRED_LOAD_IF2]] ]
+; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT: br i1 [[TMP36]], label %[[PRED_LOAD_IF4:.*]], label %[[PRED_LOAD_CONTINUE5:.*]]
+; CHECK: [[PRED_LOAD_IF4]]:
+; CHECK-NEXT: [[TMP37:%.*]] = load i8, ptr [[TMP14]], align 1
+; CHECK-NEXT: [[TMP38:%.*]] = insertelement <4 x i8> [[TMP35]], i8 [[TMP37]], i32 2
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE5]]
+; CHECK: [[PRED_LOAD_CONTINUE5]]:
+; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x i8> [ [[TMP35]], %[[PRED_LOAD_CONTINUE3]] ], [ [[TMP38]], %[[PRED_LOAD_IF4]] ]
+; CHECK-NEXT: [[TMP40:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT: br i1 [[TMP40]], label %[[PRED_LOAD_IF6:.*]], label %[[PRED_LOAD_CONTINUE7:.*]]
+; CHECK: [[PRED_LOAD_IF6]]:
+; CHECK-NEXT: [[TMP41:%.*]] = load i8, ptr [[TMP15]], align 1
+; CHECK-NEXT: [[TMP42:%.*]] = insertelement <4 x i8> [[TMP39]], i8 [[TMP41]], i32 3
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE7]]
+; CHECK: [[PRED_LOAD_CONTINUE7]]:
+; CHECK-NEXT: [[TMP43:%.*]] = phi <4 x i8> [ [[TMP39]], %[[PRED_LOAD_CONTINUE5]] ], [ [[TMP42]], %[[PRED_LOAD_IF6]] ]
+; CHECK-NEXT: [[TMP44:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT: br i1 [[TMP44]], label %[[PRED_LOAD_IF8:.*]], label %[[PRED_LOAD_CONTINUE9:.*]]
+; CHECK: [[PRED_LOAD_IF8]]:
+; CHECK-NEXT: [[TMP45:%.*]] = load i8, ptr [[TMP20]], align 1
+; CHECK-NEXT: [[TMP46:%.*]] = insertelement <4 x i8> poison, i8 [[TMP45]], i32 0
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE9]]
+; CHECK: [[PRED_LOAD_CONTINUE9]]:
+; CHECK-NEXT: [[TMP47:%.*]] = phi <4 x i8> [ poison, %[[PRED_LOAD_CONTINUE7]] ], [ [[TMP46]], %[[PRED_LOAD_IF8]] ]
+; CHECK-NEXT: [[TMP48:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT: br i1 [[TMP48]], label %[[PRED_LOAD_IF10:.*]], label %[[PRED_LOAD_CONTINUE11:.*]]
+; CHECK: [[PRED_LOAD_IF10]]:
+; CHECK-NEXT: [[TMP49:%.*]] = load i8, ptr [[TMP21]], align 1
+; CHECK-NEXT: [[TMP50:%.*]] = insertelement <4 x i8> [[TMP47]], i8 [[TMP49]], i32 1
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE11]]
+; CHECK: [[PRED_LOAD_CONTINUE11]]:
+; CHECK-NEXT: [[TMP51:%.*]] = phi <4 x i8> [ [[TMP47]], %[[PRED_LOAD_CONTINUE9]] ], [ [[TMP50]], %[[PRED_LOAD_IF10]] ]
+; CHECK-NEXT: [[TMP52:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT: br i1 [[TMP52]], label %[[PRED_LOAD_IF12:.*]], label %[[PRED_LOAD_CONTINUE13:.*]]
+; CHECK: [[PRED_LOAD_IF12]]:
+; CHECK-NEXT: [[TMP53:%.*]] = load i8, ptr [[TMP22]], align 1
+; CHECK-NEXT: [[TMP54:%.*]] = insertelement <4 x i8> [[TMP51]], i8 [[TMP53]], i32 2
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE13]]
+; CHECK: [[PRED_LOAD_CONTINUE13]]:
+; CHECK-NEXT: [[TMP55:%.*]] = phi <4 x i8> [ [[TMP51]], %[[PRED_LOAD_CONTINUE11]] ], [ [[TMP54]], %[[PRED_LOAD_IF12]] ]
+; CHECK-NEXT: [[TMP56:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT: br i1 [[TMP56]], label %[[PRED_LOAD_IF14:.*]], label %[[PRED_LOAD_CONTINUE15:.*]]
+; CHECK: [[PRED_LOAD_IF14]]:
+; CHECK-NEXT: [[TMP57:%.*]] = load i8, ptr [[TMP23]], align 1
+; CHECK-NEXT: [[TMP58:%.*]] = insertelement <4 x i8> [[TMP55]], i8 [[TMP57]], i32 3
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE15]]
+; CHECK: [[PRED_LOAD_CONTINUE15]]:
+; CHECK-NEXT: [[TMP59:%.*]] = phi <4 x i8> [ [[TMP55]], %[[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], %[[PRED_LOAD_IF14]] ]
+; CHECK-NEXT: [[TMP60:%.*]] = lshr <4 x i8> [[TMP43]], splat (i8 1)
+; CHECK-NEXT: [[TMP61:%.*]] = lshr <4 x i8> [[TMP59]], splat (i8 1)
+; CHECK-NEXT: [[TMP62:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT: br i1 [[TMP62]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
+; CHECK: [[PRED_STORE_IF]]:
+; CHECK-NEXT: [[TMP63:%.*]] = extractelement <4 x i8> [[TMP60]], i32 0
+; CHECK-NEXT: store i8 [[TMP63]], ptr [[TMP12]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE]]
+; CHECK: [[PRED_STORE_CONTINUE]]:
+; CHECK-NEXT: [[TMP64:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT: br i1 [[TMP64]], label %[[PRED_STORE_IF16:.*]], label %[[PRED_STORE_CONTINUE17:.*]]
+; CHECK: [[PRED_STORE_IF16]]:
+; CHECK-NEXT: [[TMP65:%.*]] = extractelement <4 x i8> [[TMP60]], i32 1
+; CHECK-NEXT: store i8 [[TMP65]], ptr [[TMP13]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE17]]
+; CHECK: [[PRED_STORE_CONTINUE17]]:
+; CHECK-NEXT: [[TMP66:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT: br i1 [[TMP66]], label %[[PRED_STORE_IF18:.*]], label %[[PRED_STORE_CONTINUE19:.*]]
+; CHECK: [[PRED_STORE_IF18]]:
+; CHECK-NEXT: [[TMP67:%.*]] = extractelement <4 x i8> [[TMP60]], i32 2
+; CHECK-NEXT: store i8 [[TMP67]], ptr [[TMP14]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE19]]
+; CHECK: [[PRED_STORE_CONTINUE19]]:
+; CHECK-NEXT: [[TMP68:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT: br i1 [[TMP68]], label %[[PRED_STORE_IF20:.*]], label %[[PRED_STORE_CONTINUE21:.*]]
+; CHECK: [[PRED_STORE_IF20]]:
+; CHECK-NEXT: [[TMP69:%.*]] = extractelement <4 x i8> [[TMP60]], i32 3
+; CHECK-NEXT: store i8 [[TMP69]], ptr [[TMP15]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE21]]
+; CHECK: [[PRED_STORE_CONTINUE21]]:
+; CHECK-NEXT: [[TMP70:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT: br i1 [[TMP70]], label %[[PRED_STORE_IF22:.*]], label %[[PRED_STORE_CONTINUE23:.*]]
+; CHECK: [[PRED_STORE_IF22]]:
+; CHECK-NEXT: [[TMP71:%.*]] = extractelement <4 x i8> [[TMP61]], i32 0
+; CHECK-NEXT: store i8 [[TMP71]], ptr [[TMP20]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE23]]
+; CHECK: [[PRED_STORE_CONTINUE23]]:
+; CHECK-NEXT: [[TMP72:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT: br i1 [[TMP72]], label %[[PRED_STORE_IF24:.*]], label %[[PRED_STORE_CONTINUE25:.*]]
+; CHECK: [[PRED_STORE_IF24]]:
+; CHECK-NEXT: [[TMP73:%.*]] = extractelement <4 x i8> [[TMP61]], i32 1
+; CHECK-NEXT: store i8 [[TMP73]], ptr [[TMP21]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE25]]
+; CHECK: [[PRED_STORE_CONTINUE25]]:
+; CHECK-NEXT: [[TMP74:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT: br i1 [[TMP74]], label %[[PRED_STORE_IF26:.*]], label %[[PRED_STORE_CONTINUE27:.*]]
+; CHECK: [[PRED_STORE_IF26]]:
+; CHECK-NEXT: [[TMP75:%.*]] = extractelement <4 x i8> [[TMP61]], i32 2
+; CHECK-NEXT: store i8 [[TMP75]], ptr [[TMP22]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE27]]
+; CHECK: [[PRED_STORE_CONTINUE27]]:
+; CHECK-NEXT: [[TMP76:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT: br i1 [[TMP76]], label %[[PRED_STORE_IF28:.*]], label %[[PRED_STORE_CONTINUE29]]
+; CHECK: [[PRED_STORE_IF28]]:
+; CHECK-NEXT: [[TMP77:%.*]] = extractelement <4 x i8> [[TMP61]], i32 3
+; CHECK-NEXT: store i8 [[TMP77]], ptr [[TMP23]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE29]]
+; CHECK: [[PRED_STORE_CONTINUE29]]:
+; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP11]], <4 x i8> [[TMP61]], <4 x i8> zeroinitializer
+; CHECK-NEXT: [[TMP78:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT: [[TMP79:%.*]] = zext <4 x i1> [[TMP78]] to <4 x i32>
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT: [[TMP80:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT: br i1 [[TMP80]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP79]], i32 3
+; CHECK-NEXT: br label %[[SCALAR_PH:.*]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT: [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT: [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT: [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT: br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK: [[THEN]]:
+; CHECK-NEXT: [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[IV]]
+; CHECK-NEXT: [[L_2:%.*]] = load i8, ptr [[GEP]], align 1
+; CHECK-NEXT: [[UDIV:%.*]] = udiv i8 [[L_2]], 2
+; CHECK-NEXT: store i8 [[UDIV]], ptr [[GEP]], align 1
+; CHECK-NEXT: br label %[[LATCH]]
+; CHECK: [[LATCH]]:
+; CHECK-NEXT: [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT: [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT: br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret i32 0
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+ %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+ %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+ %l = load i8, ptr %gep.src
+ %c = icmp eq i8 %l, 0
+ br i1 %c, label %latch, label %then
+
+then:
+ %or = or i8 %arg, 1
+ %gep = getelementptr inbounds i8, ptr %dst, i32 %iv
+ %l.2 = load i8, ptr %gep
+ %udiv = udiv i8 %l.2, 2
+ store i8 %udiv, ptr %gep
+ br label %latch
+
+latch:
+ %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+ %cmp = icmp ne i8 %phi, 0
+ %zext = zext i1 %cmp to i32
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv, 18
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret i32 0
+}
|
|
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesThe UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes #175295. Full diff: https://github.com/llvm/llvm-project/pull/175460.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a430f13f0c9c0..19c66e1efb956 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1352,7 +1352,12 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
{A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())},
*cast<VPRecipeWithIRFlags>(Def), Def->getDebugLoc()));
- if (match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) && APC->isPowerOf2())
+ // Don't convert udiv to lshr inside a replicate region, as VPInstructions are
+ // not allowed in them.
+ const VPRegionBlock *ParentRegion = Def->getParent()->getParent();
+ bool IsInReplicateRegion = ParentRegion && ParentRegion->isReplicator();
+ if (!IsInReplicateRegion && match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) &&
+ APC->isPowerOf2())
return Def->replaceAllUsesWith(Builder.createNaryOp(
Instruction::LShr,
{A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())}, {},
diff --git a/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
new file mode 100644
index 0000000000000..45f211b9b5284
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
@@ -0,0 +1,310 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Test case for https://github.com/llvm/llvm-project/issues/175295.
+define i32 @simplify_udiv_1_in_replicate_region(i8 %arg, ptr %src) {
+; CHECK-LABEL: define i32 @simplify_udiv_1_in_replicate_region(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[TMP0:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i8> poison, i8 [[TMP0]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[INDEX]]
+; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 4
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP3]], align 1
+; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x i8> zeroinitializer, <4 x i8> [[TMP1]]
+; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT: [[TMP6:%.*]] = zext <4 x i1> [[TMP5]] to <4 x i32>
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP6]], i32 3
+; CHECK-NEXT: br label %[[SCALAR_PH:.*]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT: [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT: [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT: [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT: br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK: [[THEN]]:
+; CHECK-NEXT: [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT: [[UDIV:%.*]] = udiv i8 [[OR]], 1
+; CHECK-NEXT: br label %[[LATCH]]
+; CHECK: [[LATCH]]:
+; CHECK-NEXT: [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT: [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT: br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret i32 0
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+ %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+ %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+ %l = load i8, ptr %gep.src
+ %c = icmp eq i8 %l, 0
+ br i1 %c, label %latch, label %then
+
+then:
+ %or = or i8 %arg, 1
+ %udiv = udiv i8 %or, 1
+ br label %latch
+
+latch:
+ %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+ %cmp = icmp ne i8 %phi, 0
+ %zext = zext i1 %cmp to i32
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv, 18
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret i32 0
+}
+
+define i32 @simplify_udiv_4_in_replicate_region2(i8 %arg, ptr noalias %src, ptr %dst) {
+; CHECK-LABEL: define i32 @simplify_udiv_4_in_replicate_region2(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr noalias [[SRC:%.*]], ptr [[DST:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE29:.*]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
+; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
+; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
+; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 4
+; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 5
+; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[INDEX]], 6
+; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[INDEX]], 7
+; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[TMP0]]
+; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP8]], i64 4
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP8]], align 1
+; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i8>, ptr [[TMP9]], align 1
+; CHECK-NEXT: [[TMP10:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD1]], zeroinitializer
+; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP0]]
+; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP1]]
+; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP2]]
+; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP3]]
+; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP12]], i32 0
+; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x ptr> [[TMP16]], ptr [[TMP13]], i32 1
+; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x ptr> [[TMP17]], ptr [[TMP14]], i32 2
+; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x ptr> [[TMP18]], ptr [[TMP15]], i32 3
+; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP4]]
+; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP5]]
+; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP6]]
+; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP7]]
+; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP20]], i32 0
+; CHECK-NEXT: [[TMP25:%.*]] = insertelement <4 x ptr> [[TMP24]], ptr [[TMP21]], i32 1
+; CHECK-NEXT: [[TMP26:%.*]] = insertelement <4 x ptr> [[TMP25]], ptr [[TMP22]], i32 2
+; CHECK-NEXT: [[TMP27:%.*]] = insertelement <4 x ptr> [[TMP26]], ptr [[TMP23]], i32 3
+; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT: br i1 [[TMP28]], label %[[PRED_LOAD_IF:.*]], label %[[PRED_LOAD_CONTINUE:.*]]
+; CHECK: [[PRED_LOAD_IF]]:
+; CHECK-NEXT: [[TMP29:%.*]] = load i8, ptr [[TMP12]], align 1
+; CHECK-NEXT: [[TMP30:%.*]] = insertelement <4 x i8> poison, i8 [[TMP29]], i32 0
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE]]
+; CHECK: [[PRED_LOAD_CONTINUE]]:
+; CHECK-NEXT: [[TMP31:%.*]] = phi <4 x i8> [ poison, %[[VECTOR_BODY]] ], [ [[TMP30]], %[[PRED_LOAD_IF]] ]
+; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT: br i1 [[TMP32]], label %[[PRED_LOAD_IF2:.*]], label %[[PRED_LOAD_CONTINUE3:.*]]
+; CHECK: [[PRED_LOAD_IF2]]:
+; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[TMP13]], align 1
+; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x i8> [[TMP31]], i8 [[TMP33]], i32 1
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE3]]
+; CHECK: [[PRED_LOAD_CONTINUE3]]:
+; CHECK-NEXT: [[TMP35:%.*]] = phi <4 x i8> [ [[TMP31]], %[[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], %[[PRED_LOAD_IF2]] ]
+; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT: br i1 [[TMP36]], label %[[PRED_LOAD_IF4:.*]], label %[[PRED_LOAD_CONTINUE5:.*]]
+; CHECK: [[PRED_LOAD_IF4]]:
+; CHECK-NEXT: [[TMP37:%.*]] = load i8, ptr [[TMP14]], align 1
+; CHECK-NEXT: [[TMP38:%.*]] = insertelement <4 x i8> [[TMP35]], i8 [[TMP37]], i32 2
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE5]]
+; CHECK: [[PRED_LOAD_CONTINUE5]]:
+; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x i8> [ [[TMP35]], %[[PRED_LOAD_CONTINUE3]] ], [ [[TMP38]], %[[PRED_LOAD_IF4]] ]
+; CHECK-NEXT: [[TMP40:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT: br i1 [[TMP40]], label %[[PRED_LOAD_IF6:.*]], label %[[PRED_LOAD_CONTINUE7:.*]]
+; CHECK: [[PRED_LOAD_IF6]]:
+; CHECK-NEXT: [[TMP41:%.*]] = load i8, ptr [[TMP15]], align 1
+; CHECK-NEXT: [[TMP42:%.*]] = insertelement <4 x i8> [[TMP39]], i8 [[TMP41]], i32 3
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE7]]
+; CHECK: [[PRED_LOAD_CONTINUE7]]:
+; CHECK-NEXT: [[TMP43:%.*]] = phi <4 x i8> [ [[TMP39]], %[[PRED_LOAD_CONTINUE5]] ], [ [[TMP42]], %[[PRED_LOAD_IF6]] ]
+; CHECK-NEXT: [[TMP44:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT: br i1 [[TMP44]], label %[[PRED_LOAD_IF8:.*]], label %[[PRED_LOAD_CONTINUE9:.*]]
+; CHECK: [[PRED_LOAD_IF8]]:
+; CHECK-NEXT: [[TMP45:%.*]] = load i8, ptr [[TMP20]], align 1
+; CHECK-NEXT: [[TMP46:%.*]] = insertelement <4 x i8> poison, i8 [[TMP45]], i32 0
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE9]]
+; CHECK: [[PRED_LOAD_CONTINUE9]]:
+; CHECK-NEXT: [[TMP47:%.*]] = phi <4 x i8> [ poison, %[[PRED_LOAD_CONTINUE7]] ], [ [[TMP46]], %[[PRED_LOAD_IF8]] ]
+; CHECK-NEXT: [[TMP48:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT: br i1 [[TMP48]], label %[[PRED_LOAD_IF10:.*]], label %[[PRED_LOAD_CONTINUE11:.*]]
+; CHECK: [[PRED_LOAD_IF10]]:
+; CHECK-NEXT: [[TMP49:%.*]] = load i8, ptr [[TMP21]], align 1
+; CHECK-NEXT: [[TMP50:%.*]] = insertelement <4 x i8> [[TMP47]], i8 [[TMP49]], i32 1
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE11]]
+; CHECK: [[PRED_LOAD_CONTINUE11]]:
+; CHECK-NEXT: [[TMP51:%.*]] = phi <4 x i8> [ [[TMP47]], %[[PRED_LOAD_CONTINUE9]] ], [ [[TMP50]], %[[PRED_LOAD_IF10]] ]
+; CHECK-NEXT: [[TMP52:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT: br i1 [[TMP52]], label %[[PRED_LOAD_IF12:.*]], label %[[PRED_LOAD_CONTINUE13:.*]]
+; CHECK: [[PRED_LOAD_IF12]]:
+; CHECK-NEXT: [[TMP53:%.*]] = load i8, ptr [[TMP22]], align 1
+; CHECK-NEXT: [[TMP54:%.*]] = insertelement <4 x i8> [[TMP51]], i8 [[TMP53]], i32 2
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE13]]
+; CHECK: [[PRED_LOAD_CONTINUE13]]:
+; CHECK-NEXT: [[TMP55:%.*]] = phi <4 x i8> [ [[TMP51]], %[[PRED_LOAD_CONTINUE11]] ], [ [[TMP54]], %[[PRED_LOAD_IF12]] ]
+; CHECK-NEXT: [[TMP56:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT: br i1 [[TMP56]], label %[[PRED_LOAD_IF14:.*]], label %[[PRED_LOAD_CONTINUE15:.*]]
+; CHECK: [[PRED_LOAD_IF14]]:
+; CHECK-NEXT: [[TMP57:%.*]] = load i8, ptr [[TMP23]], align 1
+; CHECK-NEXT: [[TMP58:%.*]] = insertelement <4 x i8> [[TMP55]], i8 [[TMP57]], i32 3
+; CHECK-NEXT: br label %[[PRED_LOAD_CONTINUE15]]
+; CHECK: [[PRED_LOAD_CONTINUE15]]:
+; CHECK-NEXT: [[TMP59:%.*]] = phi <4 x i8> [ [[TMP55]], %[[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], %[[PRED_LOAD_IF14]] ]
+; CHECK-NEXT: [[TMP60:%.*]] = lshr <4 x i8> [[TMP43]], splat (i8 1)
+; CHECK-NEXT: [[TMP61:%.*]] = lshr <4 x i8> [[TMP59]], splat (i8 1)
+; CHECK-NEXT: [[TMP62:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT: br i1 [[TMP62]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
+; CHECK: [[PRED_STORE_IF]]:
+; CHECK-NEXT: [[TMP63:%.*]] = extractelement <4 x i8> [[TMP60]], i32 0
+; CHECK-NEXT: store i8 [[TMP63]], ptr [[TMP12]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE]]
+; CHECK: [[PRED_STORE_CONTINUE]]:
+; CHECK-NEXT: [[TMP64:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT: br i1 [[TMP64]], label %[[PRED_STORE_IF16:.*]], label %[[PRED_STORE_CONTINUE17:.*]]
+; CHECK: [[PRED_STORE_IF16]]:
+; CHECK-NEXT: [[TMP65:%.*]] = extractelement <4 x i8> [[TMP60]], i32 1
+; CHECK-NEXT: store i8 [[TMP65]], ptr [[TMP13]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE17]]
+; CHECK: [[PRED_STORE_CONTINUE17]]:
+; CHECK-NEXT: [[TMP66:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT: br i1 [[TMP66]], label %[[PRED_STORE_IF18:.*]], label %[[PRED_STORE_CONTINUE19:.*]]
+; CHECK: [[PRED_STORE_IF18]]:
+; CHECK-NEXT: [[TMP67:%.*]] = extractelement <4 x i8> [[TMP60]], i32 2
+; CHECK-NEXT: store i8 [[TMP67]], ptr [[TMP14]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE19]]
+; CHECK: [[PRED_STORE_CONTINUE19]]:
+; CHECK-NEXT: [[TMP68:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT: br i1 [[TMP68]], label %[[PRED_STORE_IF20:.*]], label %[[PRED_STORE_CONTINUE21:.*]]
+; CHECK: [[PRED_STORE_IF20]]:
+; CHECK-NEXT: [[TMP69:%.*]] = extractelement <4 x i8> [[TMP60]], i32 3
+; CHECK-NEXT: store i8 [[TMP69]], ptr [[TMP15]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE21]]
+; CHECK: [[PRED_STORE_CONTINUE21]]:
+; CHECK-NEXT: [[TMP70:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT: br i1 [[TMP70]], label %[[PRED_STORE_IF22:.*]], label %[[PRED_STORE_CONTINUE23:.*]]
+; CHECK: [[PRED_STORE_IF22]]:
+; CHECK-NEXT: [[TMP71:%.*]] = extractelement <4 x i8> [[TMP61]], i32 0
+; CHECK-NEXT: store i8 [[TMP71]], ptr [[TMP20]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE23]]
+; CHECK: [[PRED_STORE_CONTINUE23]]:
+; CHECK-NEXT: [[TMP72:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT: br i1 [[TMP72]], label %[[PRED_STORE_IF24:.*]], label %[[PRED_STORE_CONTINUE25:.*]]
+; CHECK: [[PRED_STORE_IF24]]:
+; CHECK-NEXT: [[TMP73:%.*]] = extractelement <4 x i8> [[TMP61]], i32 1
+; CHECK-NEXT: store i8 [[TMP73]], ptr [[TMP21]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE25]]
+; CHECK: [[PRED_STORE_CONTINUE25]]:
+; CHECK-NEXT: [[TMP74:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT: br i1 [[TMP74]], label %[[PRED_STORE_IF26:.*]], label %[[PRED_STORE_CONTINUE27:.*]]
+; CHECK: [[PRED_STORE_IF26]]:
+; CHECK-NEXT: [[TMP75:%.*]] = extractelement <4 x i8> [[TMP61]], i32 2
+; CHECK-NEXT: store i8 [[TMP75]], ptr [[TMP22]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE27]]
+; CHECK: [[PRED_STORE_CONTINUE27]]:
+; CHECK-NEXT: [[TMP76:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT: br i1 [[TMP76]], label %[[PRED_STORE_IF28:.*]], label %[[PRED_STORE_CONTINUE29]]
+; CHECK: [[PRED_STORE_IF28]]:
+; CHECK-NEXT: [[TMP77:%.*]] = extractelement <4 x i8> [[TMP61]], i32 3
+; CHECK-NEXT: store i8 [[TMP77]], ptr [[TMP23]], align 1
+; CHECK-NEXT: br label %[[PRED_STORE_CONTINUE29]]
+; CHECK: [[PRED_STORE_CONTINUE29]]:
+; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP11]], <4 x i8> [[TMP61]], <4 x i8> zeroinitializer
+; CHECK-NEXT: [[TMP78:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT: [[TMP79:%.*]] = zext <4 x i1> [[TMP78]] to <4 x i32>
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT: [[TMP80:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT: br i1 [[TMP80]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP79]], i32 3
+; CHECK-NEXT: br label %[[SCALAR_PH:.*]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT: [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT: [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT: [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT: br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK: [[THEN]]:
+; CHECK-NEXT: [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[IV]]
+; CHECK-NEXT: [[L_2:%.*]] = load i8, ptr [[GEP]], align 1
+; CHECK-NEXT: [[UDIV:%.*]] = udiv i8 [[L_2]], 2
+; CHECK-NEXT: store i8 [[UDIV]], ptr [[GEP]], align 1
+; CHECK-NEXT: br label %[[LATCH]]
+; CHECK: [[LATCH]]:
+; CHECK-NEXT: [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT: [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT: br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret i32 0
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+ %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+ %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+ %l = load i8, ptr %gep.src
+ %c = icmp eq i8 %l, 0
+ br i1 %c, label %latch, label %then
+
+then:
+ %or = or i8 %arg, 1
+ %gep = getelementptr inbounds i8, ptr %dst, i32 %iv
+ %l.2 = load i8, ptr %gep
+ %udiv = udiv i8 %l.2, 2
+ store i8 %udiv, ptr %gep
+ br label %latch
+
+latch:
+ %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+ %cmp = icmp ne i8 %phi, 0
+ %zext = zext i1 %cmp to i32
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv, 18
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret i32 0
+}
|
artagnon
left a comment
There was a problem hiding this comment.
Hm, I'd have never guessed: why are VPInstructions not supported in VPReplicateRegions? I understand that this is probably a design flaw, but I'm interested in some design history?
LGTM, thanks!
| @@ -0,0 +1,310 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5 | |||
There was a problem hiding this comment.
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5 | |
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph" --version 5 |
fhahn
left a comment
There was a problem hiding this comment.
Hm, I'd have never guessed: why are VPInstructions not supported in VPReplicateRegions? I understand that this is probably a design flaw, but I'm interested in some design history?
Replicate regions are a bit of a legacy construct which are coupled with replicate (and scalar-iv-steps) recipes since the initial VPlan introduction. Lots of things changed since then, but not yet the handling of replicate regions. Down the line, the plan is to explicitly dissolve replicate regions as well (#170212), eventually completely removing the implicit VPTransformState::Lane. After that, it should be possible to support VPInstructions in replicate regions as well
The UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes llvm/llvm-project#175295. PR: llvm/llvm-project#175460
The UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes llvm/llvm-project#175295. PR: llvm/llvm-project#175460 (cherry picked from commit 8f18252)
The UDiv fold added in d12e993 (llvm#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes llvm#175295. PR: llvm#175460
The UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported.
Fixes #175295.