-
Notifications
You must be signed in to change notification settings - Fork 15.9k
[AMDGPU][Scheduler] Scoring system for rematerializations #175050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-amdgpu Author: Lucas Ramirez (lucas-rami) ChangesThis is simply the last rebased version of #153092 which GitHub seems unable to open due to history length. All existing feedback was addressed, the last notable change being improvements to rollbacking which no longer has to re-create the original MI in the MIR. This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements. As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state. The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would
All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in A minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This lives original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored. Patch is 174.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175050.diff 7 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index c8ce3aab3f303..cb0cb6510ecd4 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -28,11 +28,20 @@
#include "GCNRegPressure.h"
#include "SIMachineFunctionInfo.h"
#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/CodeGen/CalcSpillWeights.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/RegisterClassInfo.h"
#include "llvm/MC/LaneBitmask.h"
+#include "llvm/MC/MCInstrItineraries.h"
+#include "llvm/MC/MCSchedule.h"
+#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/ErrorHandling.h"
+#include <limits>
+#include <string>
#define DEBUG_TYPE "machine-scheduler"
@@ -970,6 +979,8 @@ void GCNScheduleDAGMILive::schedule() {
GCNRegPressure
GCNScheduleDAGMILive::getRealRegPressure(unsigned RegionIdx) const {
+ if (Regions[RegionIdx].first == Regions[RegionIdx].second)
+ return llvm::getRegPressure(MRI, LiveIns[RegionIdx]);
GCNDownwardRPTracker RPTracker(*LIS);
RPTracker.advance(Regions[RegionIdx].first, Regions[RegionIdx].second,
&LiveIns[RegionIdx]);
@@ -1272,33 +1283,222 @@ bool ClusteredLowOccStage::initGCNSchedStage() {
#define REMAT_PREFIX "[PreRARemat] "
#define REMAT_DEBUG(X) LLVM_DEBUG(dbgs() << REMAT_PREFIX; X;)
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+Printable PreRARematStage::ScoredRemat::print() const {
+ return Printable([&](raw_ostream &OS) {
+ OS << '(' << MaxFreq << ", " << FreqDiff << ", " << RegionImpact << ')';
+ });
+}
+#endif
+
bool PreRARematStage::initGCNSchedStage() {
// FIXME: This pass will invalidate cached BBLiveInMap and MBBLiveIns for
// regions inbetween the defs and region we sinked the def to. Will need to be
// fixed if there is another pass after this pass.
assert(!S.hasNextStage());
- if (!GCNSchedStage::initGCNSchedStage() || DAG.Regions.size() == 1)
+ if (!GCNSchedStage::initGCNSchedStage() || DAG.Regions.size() <= 1)
return false;
+ // Maps all MIs (except lone terminators, which are not part of any region) to
+ // their parent region. Non-lone terminators are considered part of the region
+ // they delimitate.
+ DenseMap<MachineInstr *, unsigned> MIRegion(MF.getInstructionCount());
+
// Before performing any IR modification record the parent region of each MI
// and the parent MBB of each region.
const unsigned NumRegions = DAG.Regions.size();
- RegionBB.reserve(NumRegions);
for (unsigned I = 0; I < NumRegions; ++I) {
RegionBoundaries Region = DAG.Regions[I];
for (auto MI = Region.first; MI != Region.second; ++MI)
MIRegion.insert({&*MI, I});
- RegionBB.push_back(Region.first->getParent());
+ MachineBasicBlock *ParentMBB = Region.first->getParent();
+ if (Region.second != ParentMBB->end())
+ MIRegion.insert({&*Region.second, I});
+ RegionBB.push_back(ParentMBB);
+ }
+
+#ifndef NDEBUG
+ auto PrintTargetRegions = [&]() -> void {
+ if (TargetRegions.none()) {
+ dbgs() << REMAT_PREFIX << "No target regions\n";
+ return;
+ }
+ dbgs() << REMAT_PREFIX << "Target regions:\n";
+ for (unsigned I : TargetRegions.set_bits())
+ dbgs() << REMAT_PREFIX << " [" << I << "] " << RPTargets[I] << '\n';
+ };
+ auto PrintRematReg = [&](const RematReg &Remat) -> Printable {
+ return Printable([&, Remat](raw_ostream &OS) {
+ // Concatenate all region numbers in which the register is unused and
+ // live-through.
+ bool HasLiveThroughRegion = false;
+ OS << '[' << Remat.DefRegion << " -";
+ for (unsigned I = 0; I < NumRegions; ++I) {
+ if (Remat.isUnusedLiveThrough(I)) {
+ if (HasLiveThroughRegion) {
+ OS << ',';
+ } else {
+ OS << "- ";
+ HasLiveThroughRegion = true;
+ }
+ OS << I;
+ }
+ }
+ if (HasLiveThroughRegion)
+ OS << " -";
+ OS << "-> " << Remat.UseRegion << "] ";
+ Remat.DefMI->print(OS, /*IsStandalone=*/true, /*SkipOpers=*/false,
+ /*SkipDebugLoc=*/false, /*AddNewLine=*/false);
+ });
+ };
+#endif
+
+ // Set an objective for the stage based on current RP in each region.
+ REMAT_DEBUG({
+ dbgs() << "Analyzing ";
+ MF.getFunction().printAsOperand(dbgs(), false);
+ dbgs() << ": ";
+ });
+ if (!setObjective()) {
+ LLVM_DEBUG(dbgs() << "no objective to achieve, occupancy is maximal at "
+ << MFI.getMaxWavesPerEU() << '\n');
+ return false;
}
+ LLVM_DEBUG({
+ if (TargetOcc) {
+ dbgs() << "increase occupancy from " << *TargetOcc - 1 << '\n';
+ } else {
+ dbgs() << "reduce spilling (minimum target occupancy is "
+ << MFI.getMinWavesPerEU() << ")\n";
+ }
+ PrintTargetRegions();
+ });
+
+ if (!collectRematRegs(MIRegion)) {
+ REMAT_DEBUG(dbgs() << "No rematerializable registers\n");
+ return false;
+ }
+ const ScoredRemat::FreqInfo FreqInfo(MF, DAG);
+ REMAT_DEBUG({
+ dbgs() << "Rematerializable registers:\n";
+ for (const RematReg &Remat : RematRegs)
+ dbgs() << REMAT_PREFIX << " " << PrintRematReg(Remat) << '\n';
+ dbgs() << REMAT_PREFIX << "Region frequencies\n";
+ for (auto [I, Freq] : enumerate(FreqInfo.Regions)) {
+ dbgs() << REMAT_PREFIX << " [" << I << "] ";
+ if (Freq)
+ dbgs() << Freq;
+ else
+ dbgs() << "unknown ";
+ dbgs() << " | " << *DAG.Regions[I].first;
+ }
+ });
- if (!canIncreaseOccupancyOrReduceSpill())
+ SmallVector<ScoredRemat> ScoredRemats;
+ for (const RematReg &Remat : RematRegs)
+ ScoredRemats.emplace_back(&Remat, FreqInfo, DAG);
+
+// Rematerialize registers in successive rounds until all RP targets are
+// satisifed or until we run out of rematerialization candidates.
+#ifndef NDEBUG
+ unsigned RoundNum = 0;
+#endif
+ BitVector RecomputeRP(NumRegions);
+ do {
+ assert(!ScoredRemats.empty() && "no more remat candidates");
+
+ // (Re-)Score and (re-)sort all remats in increasing score order.
+ for (ScoredRemat &Remat : ScoredRemats)
+ Remat.update(TargetRegions, RPTargets, FreqInfo, !TargetOcc);
+ sort(ScoredRemats);
+
+ REMAT_DEBUG({
+ dbgs() << "==== ROUND " << RoundNum++ << " ====\n"
+ << REMAT_PREFIX
+ << "Candidates with non-null score, in rematerialization order:\n";
+ for (const ScoredRemat &RematDecision : reverse(ScoredRemats)) {
+ if (RematDecision.hasNullScore())
+ break;
+ dbgs() << REMAT_PREFIX << " " << RematDecision.print() << " | "
+ << *RematDecision.Remat->DefMI;
+ }
+ PrintTargetRegions();
+ });
+
+ RecomputeRP.reset();
+ unsigned RematIdx = ScoredRemats.size();
+
+ // Rematerialize registers in decreasing score order until we estimate
+ // that all RP targets are satisfied or until rematerialization candidates
+ // are no longer useful to decrease RP.
+ for (; RematIdx && TargetRegions.any(); --RematIdx) {
+ const ScoredRemat &Candidate = ScoredRemats[RematIdx - 1];
+ // Stop rematerializing on encountering a null score. Since scores
+ // monotonically decrease as we rematerialize, we know there is nothing
+ // useful left to do in such cases, even if we were to re-score.
+ if (Candidate.hasNullScore()) {
+ RematIdx = 0;
+ break;
+ }
+
+ const RematReg &Remat = *Candidate.Remat;
+ // When previous rematerializations in this round have already satisfied
+ // RP targets in all regions this rematerialization can impact, we have a
+ // good indication that our scores have diverged significantly from
+ // reality, in which case we interrupt this round and re-score. This also
+ // ensures that every rematerialization we perform is possibly impactful
+ // in at least one target region.
+ if (!Remat.maybeBeneficial(TargetRegions, RPTargets))
+ break;
+
+ REMAT_DEBUG(dbgs() << "** REMAT " << PrintRematReg(Remat) << '\n';);
+ // Every rematerialization we do here is likely to move the instruction
+ // into a higher frequency region, increasing the total sum latency of the
+ // instruction itself. This is acceptable if we are eliminating a spill in
+ // the process, but when the goal is increasing occupancy we get nothing
+ // out of rematerialization if occupancy is not increased in the end; in
+ // such cases we want to roll back the rematerialization.
+ RollbackInfo *Rollback =
+ TargetOcc ? &Rollbacks.emplace_back(&Remat) : nullptr;
+ rematerialize(Remat, RecomputeRP, Rollback);
+ unsetSatisifedRPTargets(Remat.Live);
+ }
+
+ REMAT_DEBUG({
+ if (!TargetRegions.any()) {
+ dbgs() << "** Interrupt round on all targets achieved\n";
+ } else if (RematIdx) {
+ dbgs() << "** Interrupt round on stale score for "
+ << *ScoredRemats[RematIdx - 1].Remat->DefMI;
+ } else {
+ dbgs() << "** Stop on exhausted rematerialization candidates\n";
+ }
+ });
+
+ // Peel off registers we already rematerialized from the vector's tail.
+ ScoredRemats.truncate(RematIdx);
+ } while ((updateAndVerifyRPTargets(RecomputeRP) || TargetRegions.any()) &&
+ !ScoredRemats.empty());
+ if (RescheduleRegions.none())
return false;
- // Rematerialize identified instructions and update scheduler's state.
- rematerialize();
- if (GCNTrackers)
- DAG.RegionLiveOuts.buildLiveRegMap();
+ // Commit all pressure changes to the DAG and compute minimum achieved
+ // occupancy in impacted regions.
+ REMAT_DEBUG(dbgs() << "==== REMAT RESULTS ====\n");
+ unsigned DynamicVGPRBlockSize = MFI.getDynamicVGPRBlockSize();
+ for (unsigned I : RescheduleRegions.set_bits()) {
+ DAG.Pressure[I] = RPTargets[I].getCurrentRP();
+ REMAT_DEBUG(dbgs() << '[' << I << "] Achieved occupancy "
+ << DAG.Pressure[I].getOccupancy(ST, DynamicVGPRBlockSize)
+ << " (" << RPTargets[I] << ")\n");
+ }
+ AchievedOcc = MFI.getMaxWavesPerEU();
+ for (const GCNRegPressure &RP : DAG.Pressure) {
+ AchievedOcc =
+ std::min(AchievedOcc, RP.getOccupancy(ST, DynamicVGPRBlockSize));
+ }
+
REMAT_DEBUG({
dbgs() << "Retrying function scheduling with new min. occupancy of "
<< AchievedOcc << " from rematerializing (original was "
@@ -1307,7 +1507,6 @@ bool PreRARematStage::initGCNSchedStage() {
dbgs() << ", target was " << *TargetOcc;
dbgs() << ")\n";
});
-
if (AchievedOcc > DAG.MinOccupancy) {
DAG.MinOccupancy = AchievedOcc;
SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
@@ -1341,6 +1540,10 @@ void UnclusteredHighRPStage::finalizeGCNSchedStage() {
}
bool GCNSchedStage::initGCNRegion() {
+ // Skip empty scheduling region.
+ if (DAG.begin() == DAG.end())
+ return false;
+
// Check whether this new region is also a new block.
if (DAG.RegionBegin->getParent() != CurrentMBB)
setupNewBlock();
@@ -1348,8 +1551,8 @@ bool GCNSchedStage::initGCNRegion() {
unsigned NumRegionInstrs = std::distance(DAG.begin(), DAG.end());
DAG.enterRegion(CurrentMBB, DAG.begin(), DAG.end(), NumRegionInstrs);
- // Skip empty scheduling regions (0 or 1 schedulable instructions).
- if (DAG.begin() == DAG.end() || DAG.begin() == std::prev(DAG.end()))
+ // Skip regions with 1 schedulable instruction.
+ if (DAG.begin() == std::prev(DAG.end()))
return false;
LLVM_DEBUG(dbgs() << "********** MI Scheduling **********\n");
@@ -1837,27 +2040,20 @@ void GCNSchedStage::revertScheduling() {
DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
}
-bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
+bool PreRARematStage::setObjective() {
const Function &F = MF.getFunction();
- // Maps optimizable regions (i.e., regions at minimum and register-limited
- // occupancy, or regions with spilling) to the target RP we would like to
- // reach.
- DenseMap<unsigned, GCNRPTarget> OptRegions;
+ // Set up "spilling targets" for all regions.
unsigned MaxSGPRs = ST.getMaxNumSGPRs(F);
unsigned MaxVGPRs = ST.getMaxNumVGPRs(F);
- auto ResetTargetRegions = [&]() {
- OptRegions.clear();
- for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
- const GCNRegPressure &RP = DAG.Pressure[I];
- GCNRPTarget Target(MaxSGPRs, MaxVGPRs, MF, RP);
- if (!Target.satisfied())
- OptRegions.insert({I, Target});
- }
- };
+ for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
+ const GCNRegPressure &RP = DAG.Pressure[I];
+ GCNRPTarget &Target = RPTargets.emplace_back(MaxSGPRs, MaxVGPRs, MF, RP);
+ if (!Target.satisfied())
+ TargetRegions.set(I);
+ }
- ResetTargetRegions();
- if (!OptRegions.empty() || DAG.MinOccupancy >= MFI.getMaxWavesPerEU()) {
+ if (TargetRegions.any() || DAG.MinOccupancy >= MFI.getMaxWavesPerEU()) {
// In addition to register usage being above addressable limits, occupancy
// below the minimum is considered like "spilling" as well.
TargetOcc = std::nullopt;
@@ -1865,94 +2061,68 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
// There is no spilling and room to improve occupancy; set up "increased
// occupancy targets" for all regions.
TargetOcc = DAG.MinOccupancy + 1;
- unsigned VGPRBlockSize =
- MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize();
+ const unsigned VGPRBlockSize = MFI.getDynamicVGPRBlockSize();
MaxSGPRs = ST.getMaxNumSGPRs(*TargetOcc, false);
MaxVGPRs = ST.getMaxNumVGPRs(*TargetOcc, VGPRBlockSize);
- ResetTargetRegions();
- }
- REMAT_DEBUG({
- dbgs() << "Analyzing ";
- MF.getFunction().printAsOperand(dbgs(), false);
- dbgs() << ": ";
- if (OptRegions.empty()) {
- dbgs() << "no objective to achieve, occupancy is maximal at "
- << MFI.getMaxWavesPerEU();
- } else if (!TargetOcc) {
- dbgs() << "reduce spilling (minimum target occupancy is "
- << MFI.getMinWavesPerEU() << ')';
- } else {
- dbgs() << "increase occupancy from " << DAG.MinOccupancy << " to "
- << TargetOcc;
- }
- dbgs() << '\n';
- for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
- if (auto OptIt = OptRegions.find(I); OptIt != OptRegions.end()) {
- dbgs() << REMAT_PREFIX << " [" << I << "] " << OptIt->getSecond()
- << '\n';
- }
+ for (auto [I, Target] : enumerate(RPTargets)) {
+ Target.setTarget(MaxSGPRs, MaxVGPRs);
+ if (!Target.satisfied())
+ TargetRegions.set(I);
}
- });
- if (OptRegions.empty())
- return false;
+ }
- // Accounts for a reduction in RP in an optimizable region. Returns whether we
- // estimate that we have identified enough rematerialization opportunities to
- // achieve our goal, and sets Progress to true when this particular reduction
- // in pressure was helpful toward that goal.
- auto ReduceRPInRegion = [&](auto OptIt, Register Reg, LaneBitmask Mask,
- bool &Progress) -> bool {
- GCNRPTarget &Target = OptIt->getSecond();
- if (!Target.isSaveBeneficial(Reg))
- return false;
- Progress = true;
- Target.saveReg(Reg, Mask, DAG.MRI);
- if (Target.satisfied())
- OptRegions.erase(OptIt->getFirst());
- return OptRegions.empty();
- };
+ return TargetRegions.any();
+}
+bool PreRARematStage::collectRematRegs(
+ const DenseMap<MachineInstr *, unsigned> &MIRegion) {
// We need up-to-date live-out info. to query live-out register masks in
// regions containing rematerializable instructions.
DAG.RegionLiveOuts.buildLiveRegMap();
- // Cache set of registers that are going to be rematerialized.
- DenseSet<unsigned> RematRegs;
+ // Set of registers already marked for potential remterialization; used to
+ // avoid rematerialization chains.
+ SmallSet<Register, 4> MarkedRegs;
+ auto IsMarkedForRemat = [&MarkedRegs](const MachineOperand &MO) -> bool {
+ return MO.isReg() && MarkedRegs.contains(MO.getReg());
+ };
// Identify rematerializable instructions in the function.
for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
- auto Region = DAG.Regions[I];
- for (auto MI = Region.first; MI != Region.second; ++MI) {
+ RegionBoundaries Bounds = DAG.Regions[I];
+ for (auto MI = Bounds.first; MI != Bounds.second; ++MI) {
// The instruction must be rematerializable.
MachineInstr &DefMI = *MI;
if (!isReMaterializable(DefMI))
continue;
- // We only support rematerializing virtual registers with one definition.
+ // We only support rematerializing virtual registers with one
+ // definition.
Register Reg = DefMI.getOperand(0).getReg();
if (!Reg.isVirtual() || !DAG.MRI.hasOneDef(Reg))
continue;
// We only care to rematerialize the instruction if it has a single
- // non-debug user in a different region. The using MI may not belong to a
- // region if it is a lone region terminator.
+ // non-debug user in a different region.
+ // FIXME: Allow rematerializations with multiple uses. This should be
+ // relatively easy to support using the current cost model.
MachineInstr *UseMI = DAG.MRI.getOneNonDBGUser(Reg);
if (!UseMI)
continue;
auto UseRegion = MIRegion.find(UseMI);
- if (UseRegion != MIRegion.end() && UseRegion->second == I)
+ if (UseRegion == MIRegion.end() || UseRegion->second == I)
continue;
// Do not rematerialize an instruction if it uses or is used by an
// instruction that we have designated for rematerialization.
// FIXME: Allow for rematerialization chains: this requires 1. updating
- // remat points to account for uses that are rematerialized, and 2. either
- // rematerializing the candidates in careful ordering, or deferring the
- // MBB RP walk until the entire chain has been rematerialized.
- if (Rematerializations.contains(UseMI) ||
- llvm::any_of(DefMI.operands(), [&RematRegs](MachineOperand &MO) {
- return MO.isReg() && RematRegs.contains(MO.getReg());
- }))
+ // remat points to account for uses that are rematerialized, and 2.
+ // either rematerializing the candidates in careful ordering, or
+ // deferring the MBB RP walk until the entire chain has been
+ // rematerialized.
+ const MachineOperand &UseMO = UseMI->getOperand(0);
+ if (IsMarkedForRemat(UseMO) ||
+ llvm::any_of(DefMI.operands(), IsMarkedForRemat))
continue;
// Do not rematerialize an instruction it it uses registers that aren't
@@ -1963,106 +2133,182 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
*DAG.TII))
continue;
- REMAT_DEBUG(dbgs() << "Region " << I << ": remat instruction " << DefMI);
- RematInstruction &Remat =
- Rematerializations.try_emplace(&DefMI, UseMI).first->second;
-
- bool RematUseful = false;
- if (auto It = OptRegions.find(I); It != OptRegions.end()) {
- // Optimistically consider that moving the instruction out of its
- // defining region will reduce RP in the latter; this assumes that
- // maximum RP in the region is reached somewhere between the defining
- // instruction and the end of the region.
- REMAT_DEBUG(dbgs() << " Defining region is optimizable\n");
- LaneBitmask Mask = DAG.RegionLiveOuts.getLiveRegsForRegionIdx(I)[Reg];
- if (ReduceRPInRegion(It, Reg, Mask, RematUseful))
- return true;
- }
-
- for (unsigned LIRegion = 0; LIRegion != E; ++LIRegion) {
- // We are only collecting regions in which the register is a live-in
- // (and may be live-through).
- auto It = DAG.LiveIns[LIRegion].find(Reg);
- if (It == DAG.LiveIns[LIRegion].end() || It->second.none())
- continue;
- Remat.LiveInRegions.insert(LIRegion);
-
- // Account for the reduction in RP due to the rematerialization in an
- ...
[truncated]
|
qcolombet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I had already approved the original one @lucas-rami .
Let me know if that's not the case and I take a closer look.
|
@qcolombet I don't see the approve on the original one. In any case, landed this one and closed the original (which managed to load this time). |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/15476 Here is the relevant piece of the build log for the reference |
|
@lucas-rami this is failing on my msvc build: |
|
@RKSimon Looking into this. Should I revert in the meantime? |
|
We see failures on our HIP bot after this one landed. Reverting would give us feedback on whether it is indeed the culprit. (I suspect it is) |
|
#175755 should fix this. |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/16530 Here is the relevant piece of the build log for the reference |
…#175755) On some configurations sorting `ScoredRemat` objects which contains const members causes a compile failure due to impossibility of swapping/moving objects. The problem was introduced in #175050. This removes const from those fields to address the issue. The design will soon change anyway to not rely on sorting objects of this type, and consts were only here for semantic clarity.
A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing a deleted instruction. This patch fixes this by reversing the order.
…175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after #175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 #2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.
…vm#175050)" This reverts commit 6aaa7fd.
…e deleting (#175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm/llvm-project#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 #2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.
…vm#175050)" This reverts commit 6aaa7fd.
This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements. As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state. The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would 1. (if the function is spilling) reduce RP in highest-frequency target regions, 2. be rematerialized to lowest-frequency target regions, and 3. reduce RP in the highest number of target regions. All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in `machine-scheduler-rematerialization-scoring.mir` showcase how the scoring system dictates which rematerialization are the most beneficial and therefore performed first A minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This leaves original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored.
…llvm#175755) On some configurations sorting `ScoredRemat` objects which contains const members causes a compile failure due to impossibility of swapping/moving objects. The problem was introduced in llvm#175050. This removes const from those fields to address the issue. The design will soon change anyway to not rely on sorting objects of this type, and consts were only here for semantic clarity.
…lvm#175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 llvm#1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.
…vm#175050)" (llvm#175813) This reverts 8ab7937 and f21e359 which are causing a HIP failure in a Blender test.
…lvm#175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.
…lvm#175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.
…lvm#175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.
…lvm#175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks. stack-info: PR: #8, branch: users/lucas-rami/stack/rematerialization-rollback-logi/4
…175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.
…175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.
…175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.
…lvm#175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 llvm#1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.
…vm#175050)" (llvm#175813) This reverts 8ab7937 and f21e359 which are causing a HIP failure in a Blender test.
This is simply the last rebased version of #153092 which GitHub seems unable to open due to history length. All existing feedback was addressed, the last notable change being improvements to rollbacking which no longer has to re-create the original MI in the MIR.
This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements.
As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state.
The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would
All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in
machine-scheduler-rematerialization-scoring.mirshowcase how the scoring system dictates which rematerialization are the most beneficial and therefore performed firstA minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This leaves original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored.