-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[VPlan] Connect (MemRuntime|SCEV)Check blocks as VPlan transform (NFC). #143879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Connect (MemRuntime|SCEV)Check blocks as VPlan transform (NFC). #143879
Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-risc-v Author: Florian Hahn (fhahn) ChangesConnect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks. The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks. Patch is 22.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143879.diff 6 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 70f541d64b305..aae7c9a0075d1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -28,6 +28,10 @@
#include "llvm/ADT/SmallSet.h"
#include "llvm/Support/InstructionCost.h"
+namespace {
+class GeneratedRTChecks;
+}
+
namespace llvm {
class LoopInfo;
@@ -548,6 +552,9 @@ class LoopVectorizationPlanner {
VPRecipeBuilder &RecipeBuilder,
ElementCount MinVF);
+ /// Add the runtime checks from \p RTChecks to \p VPlan.
+ void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const;
+
#ifndef NDEBUG
/// \return The most profitable vectorization factor for the available VPlans
/// and the cost of that VF.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 474f856d20461..50ee102b92b53 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -399,12 +399,6 @@ static cl::opt<bool> EnableEarlyExitVectorization(
cl::desc(
"Enable vectorization of early exit loops with uncountable exits."));
-// Likelyhood of bypassing the vectorized loop because assumptions about SCEV
-// variables not overflowing do not hold. See `emitSCEVChecks`.
-static constexpr uint32_t SCEVCheckBypassWeights[] = {1, 127};
-// Likelyhood of bypassing the vectorized loop because pointers overlap. See
-// `emitMemRuntimeChecks`.
-static constexpr uint32_t MemCheckBypassWeights[] = {1, 127};
// Likelyhood of bypassing the vectorized loop because there are zero trips left
// after prolog. See `emitIterationCountCheck`.
static constexpr uint32_t MinItersBypassWeights[] = {1, 127};
@@ -534,16 +528,6 @@ class InnerLoopVectorizer {
/// it overflows.
void emitIterationCountCheck(BasicBlock *Bypass);
- /// Emit a bypass check to see if all of the SCEV assumptions we've
- /// had to make are correct. Returns the block containing the checks or
- /// nullptr if no checks have been added.
- BasicBlock *emitSCEVChecks(BasicBlock *Bypass);
-
- /// Emit bypass checks to check any memory assumptions we may have made.
- /// Returns the block containing the checks or nullptr if no checks have been
- /// added.
- BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass);
-
/// Emit basic blocks (prefixed with \p Prefix) for the iteration check,
/// vector loop preheader, middle block and scalar preheader.
void createVectorLoopSkeleton(StringRef Prefix);
@@ -1782,7 +1766,6 @@ class GeneratedRTChecks {
SCEVExpander MemCheckExp;
bool CostTooHigh = false;
- const bool AddBranchWeights;
Loop *OuterLoop = nullptr;
@@ -1794,11 +1777,10 @@ class GeneratedRTChecks {
public:
GeneratedRTChecks(PredicatedScalarEvolution &PSE, DominatorTree *DT,
LoopInfo *LI, TargetTransformInfo *TTI,
- const DataLayout &DL, bool AddBranchWeights,
- TTI::TargetCostKind CostKind)
+ const DataLayout &DL, TTI::TargetCostKind CostKind)
: DT(DT), LI(LI), TTI(TTI), SCEVExp(*PSE.getSE(), DL, "scev.check"),
- MemCheckExp(*PSE.getSE(), DL, "scev.check"),
- AddBranchWeights(AddBranchWeights), PSE(PSE), CostKind(CostKind) {}
+ MemCheckExp(*PSE.getSE(), DL, "scev.check"), PSE(PSE),
+ CostKind(CostKind) {}
/// Generate runtime checks in SCEVCheckBlock and MemCheckBlock, so we can
/// accurately estimate the cost of the runtime checks. The blocks are
@@ -2016,61 +1998,35 @@ class GeneratedRTChecks {
/// Adds the generated SCEVCheckBlock before \p LoopVectorPreHeader and
/// adjusts the branches to branch to the vector preheader or \p Bypass,
/// depending on the generated condition.
- BasicBlock *emitSCEVChecks(BasicBlock *Bypass,
- BasicBlock *LoopVectorPreHeader) {
+ std::pair<Value *, BasicBlock *> emitSCEVChecks() {
using namespace llvm::PatternMatch;
if (!SCEVCheckCond || match(SCEVCheckCond, m_ZeroInt()))
- return nullptr;
+ return {nullptr, nullptr};
- auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
- BranchInst::Create(LoopVectorPreHeader, SCEVCheckBlock);
-
- SCEVCheckBlock->getTerminator()->eraseFromParent();
- SCEVCheckBlock->moveBefore(LoopVectorPreHeader);
- Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
- SCEVCheckBlock);
-
- BranchInst &BI =
- *BranchInst::Create(Bypass, LoopVectorPreHeader, SCEVCheckCond);
- if (AddBranchWeights)
- setBranchWeights(BI, SCEVCheckBypassWeights, /*IsExpected=*/false);
- ReplaceInstWithInst(SCEVCheckBlock->getTerminator(), &BI);
- // Mark the check as used, to prevent it from being removed during cleanup.
+ Value *Cond = SCEVCheckCond;
SCEVCheckCond = nullptr;
AddedAnyChecks = true;
- return SCEVCheckBlock;
+ return {Cond, SCEVCheckBlock};
}
/// Adds the generated MemCheckBlock before \p LoopVectorPreHeader and adjusts
/// the branches to branch to the vector preheader or \p Bypass, depending on
/// the generated condition.
- BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass,
- BasicBlock *LoopVectorPreHeader) {
+ std::pair<Value *, BasicBlock *> emitMemRuntimeChecks() {
// Check if we generated code that checks in runtime if arrays overlap.
if (!MemRuntimeCheckCond)
- return nullptr;
-
- auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
- Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
- MemCheckBlock);
-
- MemCheckBlock->moveBefore(LoopVectorPreHeader);
-
- BranchInst &BI =
- *BranchInst::Create(Bypass, LoopVectorPreHeader, MemRuntimeCheckCond);
- if (AddBranchWeights) {
- setBranchWeights(BI, MemCheckBypassWeights, /*IsExpected=*/false);
- }
- ReplaceInstWithInst(MemCheckBlock->getTerminator(), &BI);
- MemCheckBlock->getTerminator()->setDebugLoc(
- Pred->getTerminator()->getDebugLoc());
+ return {nullptr, nullptr};
// Mark the check as used, to prevent it from being removed during cleanup.
+ Value *Cond = MemRuntimeCheckCond;
MemRuntimeCheckCond = nullptr;
AddedAnyChecks = true;
- return MemCheckBlock;
+ return {Cond, MemCheckBlock};
}
+ BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; }
+ BasicBlock *getMemCheckBlock() const { return MemCheckBlock; }
+
/// Return true if any runtime checks have been added
bool hasChecks() const { return AddedAnyChecks; }
};
@@ -2451,53 +2407,6 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
"Plan's entry must be TCCCheckBlock");
}
-BasicBlock *InnerLoopVectorizer::emitSCEVChecks(BasicBlock *Bypass) {
- BasicBlock *const SCEVCheckBlock =
- RTChecks.emitSCEVChecks(Bypass, LoopVectorPreHeader);
- if (!SCEVCheckBlock)
- return nullptr;
-
- assert((!Cost->OptForSize ||
- Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
- "Cannot SCEV check stride or overflow when optimizing for size");
-
- introduceCheckBlockInVPlan(SCEVCheckBlock);
- return SCEVCheckBlock;
-}
-
-BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
- BasicBlock *const MemCheckBlock =
- RTChecks.emitMemRuntimeChecks(Bypass, LoopVectorPreHeader);
-
- // Check if we generated code that checks in runtime if arrays overlap. We put
- // the checks into a separate block to make the more common case of few
- // elements faster.
- if (!MemCheckBlock)
- return nullptr;
-
- // VPlan-native path does not do any analysis for runtime checks currently.
- assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
- "Runtime checks are not supported for outer loops yet");
-
- if (Cost->OptForSize) {
- assert(Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
- "Cannot emit memory checks when optimizing for size, unless forced "
- "to vectorize.");
- ORE->emit([&]() {
- return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
- OrigLoop->getStartLoc(),
- OrigLoop->getHeader())
- << "Code-size may be reduced by not forcing "
- "vectorization, or by source-code modifications "
- "eliminating the need for runtime checks "
- "(e.g., adding 'restrict').";
- });
- }
-
- introduceCheckBlockInVPlan(MemCheckBlock);
- return MemCheckBlock;
-}
-
/// Replace \p VPBB with a VPIRBasicBlock wrapping \p IRBB. All recipes from \p
/// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must
/// have a single predecessor, which is rewired to the new VPIRBasicBlock. All
@@ -2614,15 +2523,6 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
// to the scalar loop.
emitIterationCountCheck(LoopScalarPreHeader);
- // Generate the code to check any assumptions that we've made for SCEV
- // expressions.
- emitSCEVChecks(LoopScalarPreHeader);
-
- // Generate the code that checks in runtime if arrays overlap. We put the
- // checks into a separate block to make the more common case of few elements
- // faster.
- emitMemRuntimeChecks(LoopScalarPreHeader);
-
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
return LoopVectorPreHeader;
}
@@ -7312,6 +7212,11 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF,
OrigLoop->getHeader()->getContext());
VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
+
+ if (!VectorizingEpilogue)
+ addRuntimeChecks(BestVPlan, ILV.RTChecks);
+
+ VPBasicBlock *VectorPH = cast<VPBasicBlock>(BestVPlan.getVectorPreheader());
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
VPlanTransforms::narrowInterleaveGroups(
@@ -7359,7 +7264,8 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// 1. Set up the skeleton for vectorization, including vector pre-header and
// middle block. The vector loop is created during VPlan execution.
- VPBasicBlock *VectorPH = cast<VPBasicBlock>(Entry->getSuccessors()[1]);
+ BasicBlock *EntryBB =
+ cast<VPIRBasicBlock>(BestVPlan.getEntry())->getIRBasicBlock();
State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
if (VectorizingEpilogue)
VPlanTransforms::removeDeadRecipes(BestVPlan);
@@ -7383,6 +7289,12 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);
+ // Move check blocks to their final position.
+ if (BasicBlock *MemCheckBlock = ILV.RTChecks.getMemCheckBlock())
+ MemCheckBlock->moveAfter(EntryBB);
+ if (BasicBlock *SCEVCheckBlock = ILV.RTChecks.getSCEVCheckBlock())
+ SCEVCheckBlock->moveAfter(EntryBB);
+
BestVPlan.execute(&State);
// 2.5 When vectorizing the epilogue, fix reduction resume values from the
@@ -7483,15 +7395,6 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() {
emitIterationCountCheck(LoopScalarPreHeader, true);
EPI.EpilogueIterationCountCheck->setName("iter.check");
- // Generate the code to check any assumptions that we've made for SCEV
- // expressions.
- EPI.SCEVSafetyCheck = emitSCEVChecks(LoopScalarPreHeader);
-
- // Generate the code that checks at runtime if arrays overlap. We put the
- // checks into a separate block to make the more common case of few elements
- // faster.
- EPI.MemSafetyCheck = emitMemRuntimeChecks(LoopScalarPreHeader);
-
// Generate the iteration count check for the main loop, *after* the check
// for the epilogue loop, so that the path-length is shorter for the case
// that goes directly through the vector epilogue. The longer-path length for
@@ -7608,6 +7511,13 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton() {
EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopScalarPreHeader);
+ BasicBlock *SCEVCheckBlock = RTChecks.getSCEVCheckBlock();
+ if (SCEVCheckBlock && SCEVCheckBlock->hasNPredecessorsOrMore(1))
+ EPI.SCEVSafetyCheck = SCEVCheckBlock;
+
+ BasicBlock *MemCheckBlock = RTChecks.getMemCheckBlock();
+ if (MemCheckBlock && MemCheckBlock->hasNPredecessorsOrMore(1))
+ EPI.MemSafetyCheck = MemCheckBlock;
if (EPI.SCEVSafetyCheck)
EPI.SCEVSafetyCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopScalarPreHeader);
@@ -9325,6 +9235,47 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
VPlanTransforms::runPass(VPlanTransforms::clearReductionWrapFlags, *Plan);
}
+void LoopVectorizationPlanner::addRuntimeChecks(
+ VPlan &Plan, GeneratedRTChecks &RTChecks) const {
+ SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;
+ const auto &[SCEVCheckCond, SCEVCheckBlock] = RTChecks.emitSCEVChecks();
+ if (SCEVCheckBlock) {
+ assert((!CM.OptForSize ||
+ CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
+ "Cannot SCEV check stride or overflow when optimizing for size");
+ Checks.emplace_back(Plan.getOrAddLiveIn(SCEVCheckCond),
+ Plan.createVPIRBasicBlock(SCEVCheckBlock));
+ }
+ const auto &[MemCheckCond, MemCheckBlock] = RTChecks.emitMemRuntimeChecks();
+ if (MemCheckBlock) {
+ // VPlan-native path does not do any analysis for runtime checks
+ // currently.
+ assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
+ "Runtime checks are not supported for outer loops yet");
+
+ if (CM.OptForSize) {
+ assert(
+ CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
+ "Cannot emit memory checks when optimizing for size, unless forced "
+ "to vectorize.");
+ ORE->emit([&]() {
+ return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
+ OrigLoop->getStartLoc(),
+ OrigLoop->getHeader())
+ << "Code-size may be reduced by not forcing "
+ "vectorization, or by source-code modifications "
+ "eliminating the need for runtime checks "
+ "(e.g., adding 'restrict').";
+ });
+ }
+ Checks.emplace_back(Plan.getOrAddLiveIn(MemCheckCond),
+ Plan.createVPIRBasicBlock(MemCheckBlock));
+ }
+ VPlanTransforms::connectCheckBlocks(
+ Plan, Checks,
+ hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()));
+}
+
void VPDerivedIVRecipe::execute(VPTransformState &State) {
assert(!State.Lane && "VPDerivedIVRecipe being replicated.");
@@ -9446,10 +9397,7 @@ static bool processLoopInVPlanNativePath(
VPlan &BestPlan = LVP.getPlanFor(VF.Width);
{
- bool AddBranchWeights =
- hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
- GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
- AddBranchWeights, CM.CostKind);
+ GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
VF.Width, 1, &CM, BFI, PSI, Checks, BestPlan);
LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
@@ -10085,10 +10033,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
if (ORE->allowExtraAnalysis(LV_NAME))
LVP.emitInvalidCostRemarks(ORE);
- bool AddBranchWeights =
- hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
- GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
- AddBranchWeights, CM.CostKind);
+ GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
if (LVP.hasPlanWithVF(VF.Width)) {
// Select the interleave count.
IC = CM.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, VF.Cost);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1838562f26b82..8068a6b3b968f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -484,7 +484,8 @@ void VPBasicBlock::connectToPredecessors(VPTransformState &State) {
unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
assert((TermBr && (!TermBr->getSuccessor(idx) ||
(isa<VPIRBasicBlock>(this) &&
- TermBr->getSuccessor(idx) == NewBB))) &&
+ (TermBr->getSuccessor(idx) == NewBB ||
+ PredVPBlock == getPlan()->getEntry())))) &&
"Trying to reset an existing successor block.");
TermBr->setSuccessor(idx, NewBB);
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 593e5063802ba..a55e95ef274b7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -20,6 +20,7 @@
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/IR/MDBuilder.h"
#define DEBUG_TYPE "vplan"
@@ -589,3 +590,41 @@ void VPlanTransforms::createLoopRegions(VPlan &Plan) {
TopRegion->setName("vector loop");
TopRegion->getEntryBasicBlock()->setName("vector.body");
}
+
+// Likelyhood of bypassing the vectorized loop because SCEV assumptions or
+// memory runtime checks.
+static constexpr uint32_t CheckBypassWeights[] = {1, 127};
+
+void VPlanTransforms::connectCheckBlocks(
+ VPlan &Plan, ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> Checks,
+ bool AddBranchWeights) {
+ VPBlockBase *VectorPH = Plan.getVectorPreheader();
+ VPBlockBase *ScalarPH = Plan.getScalarPreheader();
+ for (const auto &[Cond, CheckBlock] : Checks) {
+ VPBlockBase *PreVectorPH = VectorPH->getSinglePredecessor();
+ VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckBlock);
+ VPBlockUtils::connectBlocks(CheckBlock, ScalarPH);
+ CheckBlock->swapSuccessors();
+
+ // We just connected a new block to the scalar preheader. Update all
+ // VPPhis by adding an incoming value for it, replicating the last value.
+ unsigned NumPredecessors = ScalarPH->getNumPredecessors();
+ for (VPRecipeBase &R : cast<VPBasicBlock>(ScalarPH)->phis()) {
+ assert(isa<VPPhi>(&R) && "Phi expected to be VPPhi");
+ assert(cast<VPPhi>(&R)->getNumIncoming() == NumPredecessors - 1 &&
+ "must have incoming values for all operands");
+ R.addOperand(R.getOperand(NumPredecessors - 2));
+ }
+
+ VPIRMetadata VPBranchWeights;
+ auto *Term = VPBuilder(CheckBlock)
+ .createNaryOp(VPInstruction::BranchOnCond, {Cond},
+ Plan.getCanonicalIV()->getDebugLoc());
+ if (AddBranchWeights) {
+ MDBuilder MDB(Plan.getScalarHeader()->getIRBasicBlock()->getContext());
+ MDNode *BranchWeights =
+ MDB.createBranchWeights(CheckBypassWeights, /*IsExpected=*/false);
+ Term->addMetadata(LLVMContext::MD_prof, BranchWeights);
+ }
+ }
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..c13cabc87ce31 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -74,6 +74,13 @@ struct VPlanTransforms {
/// flat CFG into a hierarchical CFG.
static void createLoopRegions(VPlan &Plan);
+ /// Connect the blocks in \p Checks to \p Plan, using the corresponding
+ /// VPValue as branch condition.
+ static void
+ connectCheckBlocks(VPlan &Plan,
+ ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> ...
[truncated]
|
|
@llvm/pr-subscribers-vectorizers Author: Florian Hahn (fhahn) ChangesConnect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks. The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks. Patch is 22.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143879.diff 6 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 70f541d64b305..aae7c9a0075d1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -28,6 +28,10 @@
#include "llvm/ADT/SmallSet.h"
#include "llvm/Support/InstructionCost.h"
+namespace {
+class GeneratedRTChecks;
+}
+
namespace llvm {
class LoopInfo;
@@ -548,6 +552,9 @@ class LoopVectorizationPlanner {
VPRecipeBuilder &RecipeBuilder,
ElementCount MinVF);
+ /// Add the runtime checks from \p RTChecks to \p VPlan.
+ void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const;
+
#ifndef NDEBUG
/// \return The most profitable vectorization factor for the available VPlans
/// and the cost of that VF.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 474f856d20461..50ee102b92b53 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -399,12 +399,6 @@ static cl::opt<bool> EnableEarlyExitVectorization(
cl::desc(
"Enable vectorization of early exit loops with uncountable exits."));
-// Likelyhood of bypassing the vectorized loop because assumptions about SCEV
-// variables not overflowing do not hold. See `emitSCEVChecks`.
-static constexpr uint32_t SCEVCheckBypassWeights[] = {1, 127};
-// Likelyhood of bypassing the vectorized loop because pointers overlap. See
-// `emitMemRuntimeChecks`.
-static constexpr uint32_t MemCheckBypassWeights[] = {1, 127};
// Likelyhood of bypassing the vectorized loop because there are zero trips left
// after prolog. See `emitIterationCountCheck`.
static constexpr uint32_t MinItersBypassWeights[] = {1, 127};
@@ -534,16 +528,6 @@ class InnerLoopVectorizer {
/// it overflows.
void emitIterationCountCheck(BasicBlock *Bypass);
- /// Emit a bypass check to see if all of the SCEV assumptions we've
- /// had to make are correct. Returns the block containing the checks or
- /// nullptr if no checks have been added.
- BasicBlock *emitSCEVChecks(BasicBlock *Bypass);
-
- /// Emit bypass checks to check any memory assumptions we may have made.
- /// Returns the block containing the checks or nullptr if no checks have been
- /// added.
- BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass);
-
/// Emit basic blocks (prefixed with \p Prefix) for the iteration check,
/// vector loop preheader, middle block and scalar preheader.
void createVectorLoopSkeleton(StringRef Prefix);
@@ -1782,7 +1766,6 @@ class GeneratedRTChecks {
SCEVExpander MemCheckExp;
bool CostTooHigh = false;
- const bool AddBranchWeights;
Loop *OuterLoop = nullptr;
@@ -1794,11 +1777,10 @@ class GeneratedRTChecks {
public:
GeneratedRTChecks(PredicatedScalarEvolution &PSE, DominatorTree *DT,
LoopInfo *LI, TargetTransformInfo *TTI,
- const DataLayout &DL, bool AddBranchWeights,
- TTI::TargetCostKind CostKind)
+ const DataLayout &DL, TTI::TargetCostKind CostKind)
: DT(DT), LI(LI), TTI(TTI), SCEVExp(*PSE.getSE(), DL, "scev.check"),
- MemCheckExp(*PSE.getSE(), DL, "scev.check"),
- AddBranchWeights(AddBranchWeights), PSE(PSE), CostKind(CostKind) {}
+ MemCheckExp(*PSE.getSE(), DL, "scev.check"), PSE(PSE),
+ CostKind(CostKind) {}
/// Generate runtime checks in SCEVCheckBlock and MemCheckBlock, so we can
/// accurately estimate the cost of the runtime checks. The blocks are
@@ -2016,61 +1998,35 @@ class GeneratedRTChecks {
/// Adds the generated SCEVCheckBlock before \p LoopVectorPreHeader and
/// adjusts the branches to branch to the vector preheader or \p Bypass,
/// depending on the generated condition.
- BasicBlock *emitSCEVChecks(BasicBlock *Bypass,
- BasicBlock *LoopVectorPreHeader) {
+ std::pair<Value *, BasicBlock *> emitSCEVChecks() {
using namespace llvm::PatternMatch;
if (!SCEVCheckCond || match(SCEVCheckCond, m_ZeroInt()))
- return nullptr;
+ return {nullptr, nullptr};
- auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
- BranchInst::Create(LoopVectorPreHeader, SCEVCheckBlock);
-
- SCEVCheckBlock->getTerminator()->eraseFromParent();
- SCEVCheckBlock->moveBefore(LoopVectorPreHeader);
- Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
- SCEVCheckBlock);
-
- BranchInst &BI =
- *BranchInst::Create(Bypass, LoopVectorPreHeader, SCEVCheckCond);
- if (AddBranchWeights)
- setBranchWeights(BI, SCEVCheckBypassWeights, /*IsExpected=*/false);
- ReplaceInstWithInst(SCEVCheckBlock->getTerminator(), &BI);
- // Mark the check as used, to prevent it from being removed during cleanup.
+ Value *Cond = SCEVCheckCond;
SCEVCheckCond = nullptr;
AddedAnyChecks = true;
- return SCEVCheckBlock;
+ return {Cond, SCEVCheckBlock};
}
/// Adds the generated MemCheckBlock before \p LoopVectorPreHeader and adjusts
/// the branches to branch to the vector preheader or \p Bypass, depending on
/// the generated condition.
- BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass,
- BasicBlock *LoopVectorPreHeader) {
+ std::pair<Value *, BasicBlock *> emitMemRuntimeChecks() {
// Check if we generated code that checks in runtime if arrays overlap.
if (!MemRuntimeCheckCond)
- return nullptr;
-
- auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
- Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
- MemCheckBlock);
-
- MemCheckBlock->moveBefore(LoopVectorPreHeader);
-
- BranchInst &BI =
- *BranchInst::Create(Bypass, LoopVectorPreHeader, MemRuntimeCheckCond);
- if (AddBranchWeights) {
- setBranchWeights(BI, MemCheckBypassWeights, /*IsExpected=*/false);
- }
- ReplaceInstWithInst(MemCheckBlock->getTerminator(), &BI);
- MemCheckBlock->getTerminator()->setDebugLoc(
- Pred->getTerminator()->getDebugLoc());
+ return {nullptr, nullptr};
// Mark the check as used, to prevent it from being removed during cleanup.
+ Value *Cond = MemRuntimeCheckCond;
MemRuntimeCheckCond = nullptr;
AddedAnyChecks = true;
- return MemCheckBlock;
+ return {Cond, MemCheckBlock};
}
+ BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; }
+ BasicBlock *getMemCheckBlock() const { return MemCheckBlock; }
+
/// Return true if any runtime checks have been added
bool hasChecks() const { return AddedAnyChecks; }
};
@@ -2451,53 +2407,6 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
"Plan's entry must be TCCCheckBlock");
}
-BasicBlock *InnerLoopVectorizer::emitSCEVChecks(BasicBlock *Bypass) {
- BasicBlock *const SCEVCheckBlock =
- RTChecks.emitSCEVChecks(Bypass, LoopVectorPreHeader);
- if (!SCEVCheckBlock)
- return nullptr;
-
- assert((!Cost->OptForSize ||
- Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
- "Cannot SCEV check stride or overflow when optimizing for size");
-
- introduceCheckBlockInVPlan(SCEVCheckBlock);
- return SCEVCheckBlock;
-}
-
-BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
- BasicBlock *const MemCheckBlock =
- RTChecks.emitMemRuntimeChecks(Bypass, LoopVectorPreHeader);
-
- // Check if we generated code that checks in runtime if arrays overlap. We put
- // the checks into a separate block to make the more common case of few
- // elements faster.
- if (!MemCheckBlock)
- return nullptr;
-
- // VPlan-native path does not do any analysis for runtime checks currently.
- assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
- "Runtime checks are not supported for outer loops yet");
-
- if (Cost->OptForSize) {
- assert(Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
- "Cannot emit memory checks when optimizing for size, unless forced "
- "to vectorize.");
- ORE->emit([&]() {
- return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
- OrigLoop->getStartLoc(),
- OrigLoop->getHeader())
- << "Code-size may be reduced by not forcing "
- "vectorization, or by source-code modifications "
- "eliminating the need for runtime checks "
- "(e.g., adding 'restrict').";
- });
- }
-
- introduceCheckBlockInVPlan(MemCheckBlock);
- return MemCheckBlock;
-}
-
/// Replace \p VPBB with a VPIRBasicBlock wrapping \p IRBB. All recipes from \p
/// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must
/// have a single predecessor, which is rewired to the new VPIRBasicBlock. All
@@ -2614,15 +2523,6 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
// to the scalar loop.
emitIterationCountCheck(LoopScalarPreHeader);
- // Generate the code to check any assumptions that we've made for SCEV
- // expressions.
- emitSCEVChecks(LoopScalarPreHeader);
-
- // Generate the code that checks in runtime if arrays overlap. We put the
- // checks into a separate block to make the more common case of few elements
- // faster.
- emitMemRuntimeChecks(LoopScalarPreHeader);
-
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
return LoopVectorPreHeader;
}
@@ -7312,6 +7212,11 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF,
OrigLoop->getHeader()->getContext());
VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
+
+ if (!VectorizingEpilogue)
+ addRuntimeChecks(BestVPlan, ILV.RTChecks);
+
+ VPBasicBlock *VectorPH = cast<VPBasicBlock>(BestVPlan.getVectorPreheader());
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
VPlanTransforms::narrowInterleaveGroups(
@@ -7359,7 +7264,8 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// 1. Set up the skeleton for vectorization, including vector pre-header and
// middle block. The vector loop is created during VPlan execution.
- VPBasicBlock *VectorPH = cast<VPBasicBlock>(Entry->getSuccessors()[1]);
+ BasicBlock *EntryBB =
+ cast<VPIRBasicBlock>(BestVPlan.getEntry())->getIRBasicBlock();
State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
if (VectorizingEpilogue)
VPlanTransforms::removeDeadRecipes(BestVPlan);
@@ -7383,6 +7289,12 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);
+ // Move check blocks to their final position.
+ if (BasicBlock *MemCheckBlock = ILV.RTChecks.getMemCheckBlock())
+ MemCheckBlock->moveAfter(EntryBB);
+ if (BasicBlock *SCEVCheckBlock = ILV.RTChecks.getSCEVCheckBlock())
+ SCEVCheckBlock->moveAfter(EntryBB);
+
BestVPlan.execute(&State);
// 2.5 When vectorizing the epilogue, fix reduction resume values from the
@@ -7483,15 +7395,6 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() {
emitIterationCountCheck(LoopScalarPreHeader, true);
EPI.EpilogueIterationCountCheck->setName("iter.check");
- // Generate the code to check any assumptions that we've made for SCEV
- // expressions.
- EPI.SCEVSafetyCheck = emitSCEVChecks(LoopScalarPreHeader);
-
- // Generate the code that checks at runtime if arrays overlap. We put the
- // checks into a separate block to make the more common case of few elements
- // faster.
- EPI.MemSafetyCheck = emitMemRuntimeChecks(LoopScalarPreHeader);
-
// Generate the iteration count check for the main loop, *after* the check
// for the epilogue loop, so that the path-length is shorter for the case
// that goes directly through the vector epilogue. The longer-path length for
@@ -7608,6 +7511,13 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton() {
EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopScalarPreHeader);
+ BasicBlock *SCEVCheckBlock = RTChecks.getSCEVCheckBlock();
+ if (SCEVCheckBlock && SCEVCheckBlock->hasNPredecessorsOrMore(1))
+ EPI.SCEVSafetyCheck = SCEVCheckBlock;
+
+ BasicBlock *MemCheckBlock = RTChecks.getMemCheckBlock();
+ if (MemCheckBlock && MemCheckBlock->hasNPredecessorsOrMore(1))
+ EPI.MemSafetyCheck = MemCheckBlock;
if (EPI.SCEVSafetyCheck)
EPI.SCEVSafetyCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopScalarPreHeader);
@@ -9325,6 +9235,47 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
VPlanTransforms::runPass(VPlanTransforms::clearReductionWrapFlags, *Plan);
}
+void LoopVectorizationPlanner::addRuntimeChecks(
+ VPlan &Plan, GeneratedRTChecks &RTChecks) const {
+ SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;
+ const auto &[SCEVCheckCond, SCEVCheckBlock] = RTChecks.emitSCEVChecks();
+ if (SCEVCheckBlock) {
+ assert((!CM.OptForSize ||
+ CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
+ "Cannot SCEV check stride or overflow when optimizing for size");
+ Checks.emplace_back(Plan.getOrAddLiveIn(SCEVCheckCond),
+ Plan.createVPIRBasicBlock(SCEVCheckBlock));
+ }
+ const auto &[MemCheckCond, MemCheckBlock] = RTChecks.emitMemRuntimeChecks();
+ if (MemCheckBlock) {
+ // VPlan-native path does not do any analysis for runtime checks
+ // currently.
+ assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
+ "Runtime checks are not supported for outer loops yet");
+
+ if (CM.OptForSize) {
+ assert(
+ CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
+ "Cannot emit memory checks when optimizing for size, unless forced "
+ "to vectorize.");
+ ORE->emit([&]() {
+ return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
+ OrigLoop->getStartLoc(),
+ OrigLoop->getHeader())
+ << "Code-size may be reduced by not forcing "
+ "vectorization, or by source-code modifications "
+ "eliminating the need for runtime checks "
+ "(e.g., adding 'restrict').";
+ });
+ }
+ Checks.emplace_back(Plan.getOrAddLiveIn(MemCheckCond),
+ Plan.createVPIRBasicBlock(MemCheckBlock));
+ }
+ VPlanTransforms::connectCheckBlocks(
+ Plan, Checks,
+ hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()));
+}
+
void VPDerivedIVRecipe::execute(VPTransformState &State) {
assert(!State.Lane && "VPDerivedIVRecipe being replicated.");
@@ -9446,10 +9397,7 @@ static bool processLoopInVPlanNativePath(
VPlan &BestPlan = LVP.getPlanFor(VF.Width);
{
- bool AddBranchWeights =
- hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
- GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
- AddBranchWeights, CM.CostKind);
+ GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
VF.Width, 1, &CM, BFI, PSI, Checks, BestPlan);
LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
@@ -10085,10 +10033,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
if (ORE->allowExtraAnalysis(LV_NAME))
LVP.emitInvalidCostRemarks(ORE);
- bool AddBranchWeights =
- hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
- GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
- AddBranchWeights, CM.CostKind);
+ GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
if (LVP.hasPlanWithVF(VF.Width)) {
// Select the interleave count.
IC = CM.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, VF.Cost);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1838562f26b82..8068a6b3b968f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -484,7 +484,8 @@ void VPBasicBlock::connectToPredecessors(VPTransformState &State) {
unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
assert((TermBr && (!TermBr->getSuccessor(idx) ||
(isa<VPIRBasicBlock>(this) &&
- TermBr->getSuccessor(idx) == NewBB))) &&
+ (TermBr->getSuccessor(idx) == NewBB ||
+ PredVPBlock == getPlan()->getEntry())))) &&
"Trying to reset an existing successor block.");
TermBr->setSuccessor(idx, NewBB);
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 593e5063802ba..a55e95ef274b7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -20,6 +20,7 @@
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/IR/MDBuilder.h"
#define DEBUG_TYPE "vplan"
@@ -589,3 +590,41 @@ void VPlanTransforms::createLoopRegions(VPlan &Plan) {
TopRegion->setName("vector loop");
TopRegion->getEntryBasicBlock()->setName("vector.body");
}
+
+// Likelyhood of bypassing the vectorized loop because SCEV assumptions or
+// memory runtime checks.
+static constexpr uint32_t CheckBypassWeights[] = {1, 127};
+
+void VPlanTransforms::connectCheckBlocks(
+ VPlan &Plan, ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> Checks,
+ bool AddBranchWeights) {
+ VPBlockBase *VectorPH = Plan.getVectorPreheader();
+ VPBlockBase *ScalarPH = Plan.getScalarPreheader();
+ for (const auto &[Cond, CheckBlock] : Checks) {
+ VPBlockBase *PreVectorPH = VectorPH->getSinglePredecessor();
+ VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckBlock);
+ VPBlockUtils::connectBlocks(CheckBlock, ScalarPH);
+ CheckBlock->swapSuccessors();
+
+ // We just connected a new block to the scalar preheader. Update all
+ // VPPhis by adding an incoming value for it, replicating the last value.
+ unsigned NumPredecessors = ScalarPH->getNumPredecessors();
+ for (VPRecipeBase &R : cast<VPBasicBlock>(ScalarPH)->phis()) {
+ assert(isa<VPPhi>(&R) && "Phi expected to be VPPhi");
+ assert(cast<VPPhi>(&R)->getNumIncoming() == NumPredecessors - 1 &&
+ "must have incoming values for all operands");
+ R.addOperand(R.getOperand(NumPredecessors - 2));
+ }
+
+ VPIRMetadata VPBranchWeights;
+ auto *Term = VPBuilder(CheckBlock)
+ .createNaryOp(VPInstruction::BranchOnCond, {Cond},
+ Plan.getCanonicalIV()->getDebugLoc());
+ if (AddBranchWeights) {
+ MDBuilder MDB(Plan.getScalarHeader()->getIRBasicBlock()->getContext());
+ MDNode *BranchWeights =
+ MDB.createBranchWeights(CheckBypassWeights, /*IsExpected=*/false);
+ Term->addMetadata(LLVMContext::MD_prof, BranchWeights);
+ }
+ }
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..c13cabc87ce31 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -74,6 +74,13 @@ struct VPlanTransforms {
/// flat CFG into a hierarchical CFG.
static void createLoopRegions(VPlan &Plan);
+ /// Connect the blocks in \p Checks to \p Plan, using the corresponding
+ /// VPValue as branch condition.
+ static void
+ connectCheckBlocks(VPlan &Plan,
+ ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> ...
[truncated]
|
6cc3fc0 to
205fad6
Compare
| /// Adds the generated SCEVCheckBlock before \p LoopVectorPreHeader and | ||
| /// adjusts the branches to branch to the vector preheader or \p Bypass, | ||
| /// depending on the generated condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation needs updating. This method no longer adds or adjusts anything, mostly retrieves information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
| setBranchWeights(BI, SCEVCheckBypassWeights, /*IsExpected=*/false); | ||
| ReplaceInstWithInst(SCEVCheckBlock->getTerminator(), &BI); | ||
| // Mark the check as used, to prevent it from being removed during cleanup. | ||
| Value *Cond = SCEVCheckCond; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Value *Cond = SCEVCheckCond; | |
| // Mark the check as used, to prevent it from being removed during cleanup. | |
| Value *Cond = SCEVCheckCond; |
Is this prevention still needed? Comment could be improved, this seems to prevent repeated retrieval / reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really needed independent of the patch, simplified in 786ccb2
| BasicBlock *LoopVectorPreHeader) { | ||
| std::pair<Value *, BasicBlock *> emitSCEVChecks() { | ||
| using namespace llvm::PatternMatch; | ||
| if (!SCEVCheckCond || match(SCEVCheckCond, m_ZeroInt())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better optimize branch-on-false elsewhere? (SCEVCheckCond being zero)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will be possible after this change, although it is probably best done and tested separately as follow-up?
| /// depending on the generated condition. | ||
| BasicBlock *emitSCEVChecks(BasicBlock *Bypass, | ||
| BasicBlock *LoopVectorPreHeader) { | ||
| std::pair<Value *, BasicBlock *> emitSCEVChecks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this simply retrieve SCEVCheckCond and SCEVCheckBlock, i.e., getSCEVChecks(), provided other things are done elsewhere? See below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to return the block + cond, with still checking if there's a non-trivial check, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still called emit and documented as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to getSCECVChecks, thanks
| // Mark the check as used, to prevent it from being removed during cleanup. | ||
| Value *Cond = SCEVCheckCond; | ||
| SCEVCheckCond = nullptr; | ||
| AddedAnyChecks = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should AddedAnyChecks be accumulated elsewhere rather than here upon retrieval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could probably just check if the check blocks have been inserted, will check separately, thanks
| ; CHECK-NEXT: br i1 %min.iters.check, label %scalar.ph, label %vector.ph | ||
| ; CHECK-NEXT: LV: vectorizing VPBB:ir-bb<vector.scevcheck> in BB:vector.scevcheck | ||
| ; CHECK-NEXT: LV: filled BB: | ||
| ; CHECK-NEXT: vector.scevcheck: ; preds = %for.body.preheader | ||
| ; CHECK-NEXT: vector.scevcheck: ; No predecessors! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
Also below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, at this point, it has not been connected yet; and the message is printed after executing all recipes in a block, but before connecting to predecssors.
| // Move check blocks to their final position. | ||
| if (BasicBlock *MemCheckBlock = ILV.RTChecks.getMemCheckBlock()) | ||
| MemCheckBlock->moveAfter(EntryBB); | ||
| if (BasicBlock *SCEVCheckBlock = ILV.RTChecks.getSCEVCheckBlock()) | ||
| SCEVCheckBlock->moveAfter(EntryBB); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of VPIRBB::execute(), for VPIRBB's who are not placed correctly when generated? VPlan should model the final position of EntryVPBB->SCEVCheckBlock->MemCheckBlock, and VPlan::execute should take care of (re)wiring its new and existing IRBB's accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be part of VPIRBB's execute, except that it will change the order for epilogue vectorization, which inserts an additional block with an iteration count check after the original entry block changing the order. Could be adjusted as follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, can leave behind a TODO for later. Ideally VPlan should model the final control-flow fully and accurately, also for epilog vectorization, and when executed should make sure the generated control-flow matches that modeled by VPlan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The control-flow is completely modeled in VPlan. This is just the position in the function, added a TODO
| BasicBlock *SCEVCheckBlock = RTChecks.getSCEVCheckBlock(); | ||
| if (SCEVCheckBlock && SCEVCheckBlock->hasNPredecessorsOrMore(1)) | ||
| EPI.SCEVSafetyCheck = SCEVCheckBlock; | ||
|
|
||
| BasicBlock *MemCheckBlock = RTChecks.getMemCheckBlock(); | ||
| if (MemCheckBlock && MemCheckBlock->hasNPredecessorsOrMore(1)) | ||
| EPI.MemSafetyCheck = MemCheckBlock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh, worth a comment to explain this hook up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, also removed SCEVSafetyCheck and MemSafetyCheck from EPI, as it is now only needed in this function.
| EPI.SCEVSafetyCheck = emitSCEVChecks(LoopScalarPreHeader); | ||
|
|
||
| // Generate the code that checks at runtime if arrays overlap. We put the | ||
| // checks into a separate block to make the more common case of few elements | ||
| // faster. | ||
| EPI.MemSafetyCheck = emitMemRuntimeChecks(LoopScalarPreHeader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are replaced by calling getSCEVCheckBlock() and getMemCheckBlock() instead, so can emitSCEVChecks() and emitMemRuntimeChecks() be simplified, regarding "Mark the check as used, to prevent it from being removed during cleanup"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep should be done in the latest version, thanks
| BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; } | ||
| BasicBlock *getMemCheckBlock() const { return MemCheckBlock; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly needed for hooking up epilog vectorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, now just uses getSCEVChecks()/getMemRuntimeChecks()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seem to be here.
| BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; } | |
| BasicBlock *getMemCheckBlock() const { return MemCheckBlock; } |
Note that getSCEVChecks()/getMemRuntimeChecks() set the AddedAnyChecks sticky bit when called.
If SCEVCheckCond is zero, getSCEVCheckBlock() should return null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah it looks like some updates weren't pushed, sorry about that
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks.
The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks
which collects a list of {Condition, CheckBlock} pairs and performs some
checks and emits remarks if needed. The list of checks is then added to
VPlan in VPlanTransforms;:connectCheckBlocks.
…k-scevcheck-early
…k-scevcheck-early
205fad6 to
7ba9734
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
7ba9734 to
044d73a
Compare
| VPlanTransforms::connectCheckBlocks( | ||
| Plan, Checks, | ||
| hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better call attachCheckBlock() given each pair of cond and check blocks separately, twice, above?
Perhaps "attach" is more accurate than "add" or "connect", as this transformation wraps existing IR check conditions and blocks (held detached from original IR, and detached from VPlan until now) in VPValues and VPIRBBs, as when forming the initial VPlan from IR, rather than connecting existing VPlan entities together, or adding completely new entities to VPlan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to be called attachCheckBlock, thanks!
| if (!VectorizingEpilogue) | ||
| addRuntimeChecks(BestVPlan, ILV.RTChecks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting, e.g.,
// Checks are the same for all VPlans, added to BestVPlan only for compactness.
?
| /// Add the runtime checks from \p RTChecks to \p VPlan. | ||
| void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Add the runtime checks from \p RTChecks to \p VPlan. | |
| void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const; | |
| /// Attach the runtime checks of \p RTChecks to \p VPlan. | |
| void attachRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const; |
?
The runtime checks are conceptually "there", but held detached from VPlan(s) until now for compactness, and detached from IR until VPlan execution.
This method wraps connectRuntimeChecks() with asserts and remarks. Suffice to have a single VPlanTransform::attachRuntimeCheck() with a unified assert and remark - of a conflict between a runtime check and OptForSize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the naming, but still left the separate function in LoopVectorizationPlanner, to avoid having to pass Hints/OptForSize (and ORE). I think we can move the asserts to getSCEVChecks/getMemRuntimeChecks and emit a remark in both cases, but this should probably also be done separately?
| /// depending on the generated condition. | ||
| BasicBlock *emitSCEVChecks(BasicBlock *Bypass, | ||
| BasicBlock *LoopVectorPreHeader) { | ||
| std::pair<Value *, BasicBlock *> emitSCEVChecks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still called emit and documented as such.
| /// Adds the generated MemCheckBlock before \p LoopVectorPreHeader and adjusts | ||
| /// the branches to branch to the vector preheader or \p Bypass, depending on | ||
| /// the generated condition. | ||
| BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass, | ||
| BasicBlock *LoopVectorPreHeader) { | ||
| std::pair<Value *, BasicBlock *> emitMemRuntimeChecks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still called emit and documented as such.
| OrigLoop->getHeader()->getContext()); | ||
| VPlanTransforms::runPass(VPlanTransforms::replicateByVF, BestVPlan, BestVF); | ||
| VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan); | ||
| if (hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) | |
| bool hasBranchWeights = hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()); | |
| if (hasBranchWeights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
| if (!VectorizingEpilogue) | ||
| addRuntimeChecks(BestVPlan, ILV.RTChecks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!VectorizingEpilogue) | |
| addRuntimeChecks(BestVPlan, ILV.RTChecks); | |
| // Runtime checks are the same for all VPlans, attach them to main loop of BestVPlan only, for compactness. | |
| if (!VectorizingEpilogue) | |
| attachRuntimeChecks(BestVPlan, ILV.RTChecks, hasBranchWeights); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
| void LoopVectorizationPlanner::addRuntimeChecks( | ||
| VPlan &Plan, GeneratedRTChecks &RTChecks) const { | ||
| SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void LoopVectorizationPlanner::addRuntimeChecks( | |
| VPlan &Plan, GeneratedRTChecks &RTChecks) const { | |
| SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks; | |
| void LoopVectorizationPlanner::attachRuntimeChecks( | |
| VPlan &Plan, GeneratedRTChecks &RTChecks, bool hasBranchWeights) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks
| // Likelyhood of bypassing the vectorized loop due to SCEV or memory runtime | ||
| // checks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Likelyhood of bypassing the vectorized loop due to SCEV or memory runtime | |
| // checks. | |
| // Likelyhood of bypassing the vectorized loop due to a runtime check block, including memory overlap checks block and wrapping/unit-stride checks block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
| void VPlanTransforms::connectCheckBlocks( | ||
| VPlan &Plan, ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> Checks, | ||
| bool AddBranchWeights) { | ||
| VPBlockBase *VectorPH = Plan.getVectorPreheader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to
assert((!CM.OptForSize ||
CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
"Runtime checks require forcing vectorization when optimizing for size");
here, but may need to convey OptForSize and ForcedToVectorize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to leave the assertions as is for now, but potentially move them to getSCEVChecks/getMemRuntimeChecks?
ayalz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks for accommodating!
| // Retrieve blocks with SCEV and memory runtime checks, if they have been | ||
| // connected to the CFG, otherwise they are unused and will be deleted. Their | ||
| // terminators and phis using them need adjusting below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Retrieve blocks with SCEV and memory runtime checks, if they have been | |
| // connected to the CFG, otherwise they are unused and will be deleted. Their | |
| // terminators and phis using them need adjusting below. | |
| // Adjust the terminators of runtime check blocks and phis using them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
|
|
||
| void LoopVectorizationPlanner::attachRuntimeChecks( | ||
| VPlan &Plan, GeneratedRTChecks &RTChecks, bool HasBranchWeights) const { | ||
| SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed thanks
| /// flat CFG into a hierarchical CFG. | ||
| static void createLoopRegions(VPlan &Plan); | ||
|
|
||
| /// Wrap runtime check block \p CHeckBlock in a VPIRBB and \p Cond in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Wrap runtime check block \p CHeckBlock in a VPIRBB and \p Cond in a | |
| /// Wrap runtime check block \p CheckBlock in a VPIRBB and \p Cond in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
As suggested in #143879, remove AddedAnyChecks member and directly check if there are any relevant runtime check blocks.
As suggested in llvm/llvm-project#143879, remove AddedAnyChecks member and directly check if there are any relevant runtime check blocks.
…k-scevcheck-early
…k-scevcheck-early
…k-scevcheck-early
…nsform (NFC). (#143879)
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and
ILV::emitMemRuntimeChecks.
The new logic is currently split across
LoopVectorizationPlanner::addRuntimeChecks which collects a list of
{Condition, CheckBlock} pairs and performs some checks and emits remarks
if needed. The list of checks is then added to VPlan in
VPlanTransforms::connectCheckBlocks.
PR: llvm/llvm-project#143879
Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks.
The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks.