[AMDGPU][Scheduler] Scoring system for rematerializations #175050

lucas-rami · 2026-01-08T18:54:32Z

This is simply the last rebased version of #153092 which GitHub seems unable to open due to history length. All existing feedback was addressed, the last notable change being improvements to rollbacking which no longer has to re-create the original MI in the MIR.

This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements.

As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state.

The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would

(if the function is spilling) reduce RP in highest-frequency target regions,
be rematerialized to lowest-frequency target regions, and
reduce RP in the highest number of target regions.

All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in machine-scheduler-rematerialization-scoring.mir showcase how the scoring system dictates which rematerialization are the most beneficial and therefore performed first

A minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This leaves original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored.

llvmbot · 2026-01-08T18:54:57Z

@llvm/pr-subscribers-backend-amdgpu

Author: Lucas Ramirez (lucas-rami)

Changes

This is simply the last rebased version of #153092 which GitHub seems unable to open due to history length. All existing feedback was addressed, the last notable change being improvements to rollbacking which no longer has to re-create the original MI in the MIR.

This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements.

As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state.

The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would

(if the function is spilling) reduce RP in highest-frequency target regions,
be rematerialized to lowest-frequency target regions, and
reduce RP in the highest number of target regions.

All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in machine-scheduler-rematerialization-scoring.mir showcase how the scoring system dictates which rematerialization are the most beneficial and therefore performed first

A minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This lives original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored.

Patch is 174.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175050.diff

7 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+505-291)
(modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+207-49)
(added) llvm/test/CodeGen/AMDGPU/machine-scheduler-rematerialization-scoring.mir (+523)
(modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir (+194-194)
(modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir (+5-5)
(modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir (+242-35)
(modified) llvm/test/CodeGen/AMDGPU/mfma-loop.ll (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index c8ce3aab3f303..cb0cb6510ecd4 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -28,11 +28,20 @@
 #include "GCNRegPressure.h"
 #include "SIMachineFunctionInfo.h"
 #include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/CodeGen/CalcSpillWeights.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/RegisterClassInfo.h"
 #include "llvm/MC/LaneBitmask.h"
+#include "llvm/MC/MCInstrItineraries.h"
+#include "llvm/MC/MCSchedule.h"
+#include "llvm/MC/TargetRegistry.h"
 #include "llvm/Support/ErrorHandling.h"
+#include <limits>
+#include <string>
 
 #define DEBUG_TYPE "machine-scheduler"
 
@@ -970,6 +979,8 @@ void GCNScheduleDAGMILive::schedule() {
 
 GCNRegPressure
 GCNScheduleDAGMILive::getRealRegPressure(unsigned RegionIdx) const {
+  if (Regions[RegionIdx].first == Regions[RegionIdx].second)
+    return llvm::getRegPressure(MRI, LiveIns[RegionIdx]);
   GCNDownwardRPTracker RPTracker(*LIS);
   RPTracker.advance(Regions[RegionIdx].first, Regions[RegionIdx].second,
                     &LiveIns[RegionIdx]);
@@ -1272,33 +1283,222 @@ bool ClusteredLowOccStage::initGCNSchedStage() {
 #define REMAT_PREFIX "[PreRARemat] "
 #define REMAT_DEBUG(X) LLVM_DEBUG(dbgs() << REMAT_PREFIX; X;)
 
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+Printable PreRARematStage::ScoredRemat::print() const {
+  return Printable([&](raw_ostream &OS) {
+    OS << '(' << MaxFreq << ", " << FreqDiff << ", " << RegionImpact << ')';
+  });
+}
+#endif
+
 bool PreRARematStage::initGCNSchedStage() {
   // FIXME: This pass will invalidate cached BBLiveInMap and MBBLiveIns for
   // regions inbetween the defs and region we sinked the def to. Will need to be
   // fixed if there is another pass after this pass.
   assert(!S.hasNextStage());
 
-  if (!GCNSchedStage::initGCNSchedStage() || DAG.Regions.size() == 1)
+  if (!GCNSchedStage::initGCNSchedStage() || DAG.Regions.size() <= 1)
     return false;
 
+  // Maps all MIs (except lone terminators, which are not part of any region) to
+  // their parent region. Non-lone terminators are considered part of the region
+  // they delimitate.
+  DenseMap<MachineInstr *, unsigned> MIRegion(MF.getInstructionCount());
+
   // Before performing any IR modification record the parent region of each MI
   // and the parent MBB of each region.
   const unsigned NumRegions = DAG.Regions.size();
-  RegionBB.reserve(NumRegions);
   for (unsigned I = 0; I < NumRegions; ++I) {
     RegionBoundaries Region = DAG.Regions[I];
     for (auto MI = Region.first; MI != Region.second; ++MI)
       MIRegion.insert({&*MI, I});
-    RegionBB.push_back(Region.first->getParent());
+    MachineBasicBlock *ParentMBB = Region.first->getParent();
+    if (Region.second != ParentMBB->end())
+      MIRegion.insert({&*Region.second, I});
+    RegionBB.push_back(ParentMBB);
+  }
+
+#ifndef NDEBUG
+  auto PrintTargetRegions = [&]() -> void {
+    if (TargetRegions.none()) {
+      dbgs() << REMAT_PREFIX << "No target regions\n";
+      return;
+    }
+    dbgs() << REMAT_PREFIX << "Target regions:\n";
+    for (unsigned I : TargetRegions.set_bits())
+      dbgs() << REMAT_PREFIX << "  [" << I << "] " << RPTargets[I] << '\n';
+  };
+  auto PrintRematReg = [&](const RematReg &Remat) -> Printable {
+    return Printable([&, Remat](raw_ostream &OS) {
+      // Concatenate all region numbers in which the register is unused and
+      // live-through.
+      bool HasLiveThroughRegion = false;
+      OS << '[' << Remat.DefRegion << " -";
+      for (unsigned I = 0; I < NumRegions; ++I) {
+        if (Remat.isUnusedLiveThrough(I)) {
+          if (HasLiveThroughRegion) {
+            OS << ',';
+          } else {
+            OS << "- ";
+            HasLiveThroughRegion = true;
+          }
+          OS << I;
+        }
+      }
+      if (HasLiveThroughRegion)
+        OS << " -";
+      OS << "-> " << Remat.UseRegion << "] ";
+      Remat.DefMI->print(OS, /*IsStandalone=*/true, /*SkipOpers=*/false,
+                         /*SkipDebugLoc=*/false, /*AddNewLine=*/false);
+    });
+  };
+#endif
+
+  // Set an objective for the stage based on current RP in each region.
+  REMAT_DEBUG({
+    dbgs() << "Analyzing ";
+    MF.getFunction().printAsOperand(dbgs(), false);
+    dbgs() << ": ";
+  });
+  if (!setObjective()) {
+    LLVM_DEBUG(dbgs() << "no objective to achieve, occupancy is maximal at "
+                      << MFI.getMaxWavesPerEU() << '\n');
+    return false;
   }
+  LLVM_DEBUG({
+    if (TargetOcc) {
+      dbgs() << "increase occupancy from " << *TargetOcc - 1 << '\n';
+    } else {
+      dbgs() << "reduce spilling (minimum target occupancy is "
+             << MFI.getMinWavesPerEU() << ")\n";
+    }
+    PrintTargetRegions();
+  });
+
+  if (!collectRematRegs(MIRegion)) {
+    REMAT_DEBUG(dbgs() << "No rematerializable registers\n");
+    return false;
+  }
+  const ScoredRemat::FreqInfo FreqInfo(MF, DAG);
+  REMAT_DEBUG({
+    dbgs() << "Rematerializable registers:\n";
+    for (const RematReg &Remat : RematRegs)
+      dbgs() << REMAT_PREFIX << "  " << PrintRematReg(Remat) << '\n';
+    dbgs() << REMAT_PREFIX << "Region frequencies\n";
+    for (auto [I, Freq] : enumerate(FreqInfo.Regions)) {
+      dbgs() << REMAT_PREFIX << "  [" << I << "] ";
+      if (Freq)
+        dbgs() << Freq;
+      else
+        dbgs() << "unknown ";
+      dbgs() << " | " << *DAG.Regions[I].first;
+    }
+  });
 
-  if (!canIncreaseOccupancyOrReduceSpill())
+  SmallVector<ScoredRemat> ScoredRemats;
+  for (const RematReg &Remat : RematRegs)
+    ScoredRemats.emplace_back(&Remat, FreqInfo, DAG);
+
+// Rematerialize registers in successive rounds until all RP targets are
+// satisifed or until we run out of rematerialization candidates.
+#ifndef NDEBUG
+  unsigned RoundNum = 0;
+#endif
+  BitVector RecomputeRP(NumRegions);
+  do {
+    assert(!ScoredRemats.empty() && "no more remat candidates");
+
+    // (Re-)Score and (re-)sort all remats in increasing score order.
+    for (ScoredRemat &Remat : ScoredRemats)
+      Remat.update(TargetRegions, RPTargets, FreqInfo, !TargetOcc);
+    sort(ScoredRemats);
+
+    REMAT_DEBUG({
+      dbgs() << "==== ROUND " << RoundNum++ << " ====\n"
+             << REMAT_PREFIX
+             << "Candidates with non-null score, in rematerialization order:\n";
+      for (const ScoredRemat &RematDecision : reverse(ScoredRemats)) {
+        if (RematDecision.hasNullScore())
+          break;
+        dbgs() << REMAT_PREFIX << "  " << RematDecision.print() << " | "
+               << *RematDecision.Remat->DefMI;
+      }
+      PrintTargetRegions();
+    });
+
+    RecomputeRP.reset();
+    unsigned RematIdx = ScoredRemats.size();
+
+    // Rematerialize registers in decreasing score order until we estimate
+    // that all RP targets are satisfied or until rematerialization candidates
+    // are no longer useful to decrease RP.
+    for (; RematIdx && TargetRegions.any(); --RematIdx) {
+      const ScoredRemat &Candidate = ScoredRemats[RematIdx - 1];
+      // Stop rematerializing on encountering a null score. Since scores
+      // monotonically decrease as we rematerialize, we know there is nothing
+      // useful left to do in such cases, even if we were to re-score.
+      if (Candidate.hasNullScore()) {
+        RematIdx = 0;
+        break;
+      }
+
+      const RematReg &Remat = *Candidate.Remat;
+      // When previous rematerializations in this round have already satisfied
+      // RP targets in all regions this rematerialization can impact, we have a
+      // good indication that our scores have diverged significantly from
+      // reality, in which case we interrupt this round and re-score. This also
+      // ensures that every rematerialization we perform is possibly impactful
+      // in at least one target region.
+      if (!Remat.maybeBeneficial(TargetRegions, RPTargets))
+        break;
+
+      REMAT_DEBUG(dbgs() << "** REMAT " << PrintRematReg(Remat) << '\n';);
+      // Every rematerialization we do here is likely to move the instruction
+      // into a higher frequency region, increasing the total sum latency of the
+      // instruction itself. This is acceptable if we are eliminating a spill in
+      // the process, but when the goal is increasing occupancy we get nothing
+      // out of rematerialization if occupancy is not increased in the end; in
+      // such cases we want to roll back the rematerialization.
+      RollbackInfo *Rollback =
+          TargetOcc ? &Rollbacks.emplace_back(&Remat) : nullptr;
+      rematerialize(Remat, RecomputeRP, Rollback);
+      unsetSatisifedRPTargets(Remat.Live);
+    }
+
+    REMAT_DEBUG({
+      if (!TargetRegions.any()) {
+        dbgs() << "** Interrupt round on all targets achieved\n";
+      } else if (RematIdx) {
+        dbgs() << "** Interrupt round on stale score for "
+               << *ScoredRemats[RematIdx - 1].Remat->DefMI;
+      } else {
+        dbgs() << "** Stop on exhausted rematerialization candidates\n";
+      }
+    });
+
+    // Peel off registers we already rematerialized from the vector's tail.
+    ScoredRemats.truncate(RematIdx);
+  } while ((updateAndVerifyRPTargets(RecomputeRP) || TargetRegions.any()) &&
+           !ScoredRemats.empty());
+  if (RescheduleRegions.none())
     return false;
 
-  // Rematerialize identified instructions and update scheduler's state.
-  rematerialize();
-  if (GCNTrackers)
-    DAG.RegionLiveOuts.buildLiveRegMap();
+  // Commit all pressure changes to the DAG and compute minimum achieved
+  // occupancy in impacted regions.
+  REMAT_DEBUG(dbgs() << "==== REMAT RESULTS ====\n");
+  unsigned DynamicVGPRBlockSize = MFI.getDynamicVGPRBlockSize();
+  for (unsigned I : RescheduleRegions.set_bits()) {
+    DAG.Pressure[I] = RPTargets[I].getCurrentRP();
+    REMAT_DEBUG(dbgs() << '[' << I << "] Achieved occupancy "
+                       << DAG.Pressure[I].getOccupancy(ST, DynamicVGPRBlockSize)
+                       << " (" << RPTargets[I] << ")\n");
+  }
+  AchievedOcc = MFI.getMaxWavesPerEU();
+  for (const GCNRegPressure &RP : DAG.Pressure) {
+    AchievedOcc =
+        std::min(AchievedOcc, RP.getOccupancy(ST, DynamicVGPRBlockSize));
+  }
+
   REMAT_DEBUG({
     dbgs() << "Retrying function scheduling with new min. occupancy of "
            << AchievedOcc << " from rematerializing (original was "
@@ -1307,7 +1507,6 @@ bool PreRARematStage::initGCNSchedStage() {
       dbgs() << ", target was " << *TargetOcc;
     dbgs() << ")\n";
   });
-
   if (AchievedOcc > DAG.MinOccupancy) {
     DAG.MinOccupancy = AchievedOcc;
     SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
@@ -1341,6 +1540,10 @@ void UnclusteredHighRPStage::finalizeGCNSchedStage() {
 }
 
 bool GCNSchedStage::initGCNRegion() {
+  // Skip empty scheduling region.
+  if (DAG.begin() == DAG.end())
+    return false;
+
   // Check whether this new region is also a new block.
   if (DAG.RegionBegin->getParent() != CurrentMBB)
     setupNewBlock();
@@ -1348,8 +1551,8 @@ bool GCNSchedStage::initGCNRegion() {
   unsigned NumRegionInstrs = std::distance(DAG.begin(), DAG.end());
   DAG.enterRegion(CurrentMBB, DAG.begin(), DAG.end(), NumRegionInstrs);
 
-  // Skip empty scheduling regions (0 or 1 schedulable instructions).
-  if (DAG.begin() == DAG.end() || DAG.begin() == std::prev(DAG.end()))
+  // Skip regions with 1 schedulable instruction.
+  if (DAG.begin() == std::prev(DAG.end()))
     return false;
 
   LLVM_DEBUG(dbgs() << "********** MI Scheduling **********\n");
@@ -1837,27 +2040,20 @@ void GCNSchedStage::revertScheduling() {
   DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
 }
 
-bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
+bool PreRARematStage::setObjective() {
   const Function &F = MF.getFunction();
 
-  // Maps optimizable regions (i.e., regions at minimum and register-limited
-  // occupancy, or regions with spilling) to the target RP we would like to
-  // reach.
-  DenseMap<unsigned, GCNRPTarget> OptRegions;
+  // Set up "spilling targets" for all regions.
   unsigned MaxSGPRs = ST.getMaxNumSGPRs(F);
   unsigned MaxVGPRs = ST.getMaxNumVGPRs(F);
-  auto ResetTargetRegions = [&]() {
-    OptRegions.clear();
-    for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-      const GCNRegPressure &RP = DAG.Pressure[I];
-      GCNRPTarget Target(MaxSGPRs, MaxVGPRs, MF, RP);
-      if (!Target.satisfied())
-        OptRegions.insert({I, Target});
-    }
-  };
+  for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
+    const GCNRegPressure &RP = DAG.Pressure[I];
+    GCNRPTarget &Target = RPTargets.emplace_back(MaxSGPRs, MaxVGPRs, MF, RP);
+    if (!Target.satisfied())
+      TargetRegions.set(I);
+  }
 
-  ResetTargetRegions();
-  if (!OptRegions.empty() || DAG.MinOccupancy >= MFI.getMaxWavesPerEU()) {
+  if (TargetRegions.any() || DAG.MinOccupancy >= MFI.getMaxWavesPerEU()) {
     // In addition to register usage being above addressable limits, occupancy
     // below the minimum is considered like "spilling" as well.
     TargetOcc = std::nullopt;
@@ -1865,94 +2061,68 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
     // There is no spilling and room to improve occupancy; set up "increased
     // occupancy targets" for all regions.
     TargetOcc = DAG.MinOccupancy + 1;
-    unsigned VGPRBlockSize =
-        MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize();
+    const unsigned VGPRBlockSize = MFI.getDynamicVGPRBlockSize();
     MaxSGPRs = ST.getMaxNumSGPRs(*TargetOcc, false);
     MaxVGPRs = ST.getMaxNumVGPRs(*TargetOcc, VGPRBlockSize);
-    ResetTargetRegions();
-  }
-  REMAT_DEBUG({
-    dbgs() << "Analyzing ";
-    MF.getFunction().printAsOperand(dbgs(), false);
-    dbgs() << ": ";
-    if (OptRegions.empty()) {
-      dbgs() << "no objective to achieve, occupancy is maximal at "
-             << MFI.getMaxWavesPerEU();
-    } else if (!TargetOcc) {
-      dbgs() << "reduce spilling (minimum target occupancy is "
-             << MFI.getMinWavesPerEU() << ')';
-    } else {
-      dbgs() << "increase occupancy from " << DAG.MinOccupancy << " to "
-             << TargetOcc;
-    }
-    dbgs() << '\n';
-    for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-      if (auto OptIt = OptRegions.find(I); OptIt != OptRegions.end()) {
-        dbgs() << REMAT_PREFIX << "  [" << I << "] " << OptIt->getSecond()
-               << '\n';
-      }
+    for (auto [I, Target] : enumerate(RPTargets)) {
+      Target.setTarget(MaxSGPRs, MaxVGPRs);
+      if (!Target.satisfied())
+        TargetRegions.set(I);
     }
-  });
-  if (OptRegions.empty())
-    return false;
+  }
 
-  // Accounts for a reduction in RP in an optimizable region. Returns whether we
-  // estimate that we have identified enough rematerialization opportunities to
-  // achieve our goal, and sets Progress to true when this particular reduction
-  // in pressure was helpful toward that goal.
-  auto ReduceRPInRegion = [&](auto OptIt, Register Reg, LaneBitmask Mask,
-                              bool &Progress) -> bool {
-    GCNRPTarget &Target = OptIt->getSecond();
-    if (!Target.isSaveBeneficial(Reg))
-      return false;
-    Progress = true;
-    Target.saveReg(Reg, Mask, DAG.MRI);
-    if (Target.satisfied())
-      OptRegions.erase(OptIt->getFirst());
-    return OptRegions.empty();
-  };
+  return TargetRegions.any();
+}
 
+bool PreRARematStage::collectRematRegs(
+    const DenseMap<MachineInstr *, unsigned> &MIRegion) {
   // We need up-to-date live-out info. to query live-out register masks in
   // regions containing rematerializable instructions.
   DAG.RegionLiveOuts.buildLiveRegMap();
 
-  // Cache set of registers that are going to be rematerialized.
-  DenseSet<unsigned> RematRegs;
+  // Set of registers already marked for potential remterialization; used to
+  // avoid rematerialization chains.
+  SmallSet<Register, 4> MarkedRegs;
+  auto IsMarkedForRemat = [&MarkedRegs](const MachineOperand &MO) -> bool {
+    return MO.isReg() && MarkedRegs.contains(MO.getReg());
+  };
 
   // Identify rematerializable instructions in the function.
   for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-    auto Region = DAG.Regions[I];
-    for (auto MI = Region.first; MI != Region.second; ++MI) {
+    RegionBoundaries Bounds = DAG.Regions[I];
+    for (auto MI = Bounds.first; MI != Bounds.second; ++MI) {
       // The instruction must be rematerializable.
       MachineInstr &DefMI = *MI;
       if (!isReMaterializable(DefMI))
         continue;
 
-      // We only support rematerializing virtual registers with one definition.
+      // We only support rematerializing virtual registers with one
+      // definition.
       Register Reg = DefMI.getOperand(0).getReg();
       if (!Reg.isVirtual() || !DAG.MRI.hasOneDef(Reg))
         continue;
 
       // We only care to rematerialize the instruction if it has a single
-      // non-debug user in a different region. The using MI may not belong to a
-      // region if it is a lone region terminator.
+      // non-debug user in a different region.
+      // FIXME: Allow rematerializations with multiple uses. This should be
+      // relatively easy to support using the current cost model.
       MachineInstr *UseMI = DAG.MRI.getOneNonDBGUser(Reg);
       if (!UseMI)
         continue;
       auto UseRegion = MIRegion.find(UseMI);
-      if (UseRegion != MIRegion.end() && UseRegion->second == I)
+      if (UseRegion == MIRegion.end() || UseRegion->second == I)
         continue;
 
       // Do not rematerialize an instruction if it uses or is used by an
       // instruction that we have designated for rematerialization.
       // FIXME: Allow for rematerialization chains: this requires 1. updating
-      // remat points to account for uses that are rematerialized, and 2. either
-      // rematerializing the candidates in careful ordering, or deferring the
-      // MBB RP walk until the entire chain has been rematerialized.
-      if (Rematerializations.contains(UseMI) ||
-          llvm::any_of(DefMI.operands(), [&RematRegs](MachineOperand &MO) {
-            return MO.isReg() && RematRegs.contains(MO.getReg());
-          }))
+      // remat points to account for uses that are rematerialized, and 2.
+      // either rematerializing the candidates in careful ordering, or
+      // deferring the MBB RP walk until the entire chain has been
+      // rematerialized.
+      const MachineOperand &UseMO = UseMI->getOperand(0);
+      if (IsMarkedForRemat(UseMO) ||
+          llvm::any_of(DefMI.operands(), IsMarkedForRemat))
         continue;
 
       // Do not rematerialize an instruction it it uses registers that aren't
@@ -1963,106 +2133,182 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
                                               *DAG.TII))
         continue;
 
-      REMAT_DEBUG(dbgs() << "Region " << I << ": remat instruction " << DefMI);
-      RematInstruction &Remat =
-          Rematerializations.try_emplace(&DefMI, UseMI).first->second;
-
-      bool RematUseful = false;
-      if (auto It = OptRegions.find(I); It != OptRegions.end()) {
-        // Optimistically consider that moving the instruction out of its
-        // defining region will reduce RP in the latter; this assumes that
-        // maximum RP in the region is reached somewhere between the defining
-        // instruction and the end of the region.
-        REMAT_DEBUG(dbgs() << "  Defining region is optimizable\n");
-        LaneBitmask Mask = DAG.RegionLiveOuts.getLiveRegsForRegionIdx(I)[Reg];
-        if (ReduceRPInRegion(It, Reg, Mask, RematUseful))
-          return true;
-      }
-
-      for (unsigned LIRegion = 0; LIRegion != E; ++LIRegion) {
-        // We are only collecting regions in which the register is a live-in
-        // (and may be live-through).
-        auto It = DAG.LiveIns[LIRegion].find(Reg);
-        if (It == DAG.LiveIns[LIRegion].end() || It->second.none())
-          continue;
-        Remat.LiveInRegions.insert(LIRegion);
-
-        // Account for the reduction in RP due to the rematerialization in an
-     ...
[truncated]

qcolombet

I believe I had already approved the original one @lucas-rami .
Let me know if that's not the case and I take a closer look.

lucas-rami · 2026-01-13T11:26:47Z

@qcolombet I don't see the approve on the original one. In any case, landed this one and closed the original (which managed to load this time).

llvm-ci · 2026-01-13T11:51:30Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 6 "build-default".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/15476

Here is the relevant piece of the build log for the reference

Step 6 (build-default) failure: cmake (failure)
...
78.459 [29/12/4090] Linking CXX executable bin/llvm-profgen
78.477 [29/11/4091] Building CXX object tools/llc/CMakeFiles/llc.dir/NewPMDriver.cpp.o
78.478 [29/10/4092] Linking CXX executable bin/sancov
78.485 [29/9/4093] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
78.614 [29/8/4094] Building CXX object tools/opt/CMakeFiles/LLVMOptDriver.dir/NewPMDriver.cpp.o
78.625 [29/7/4095] Building CXX object tools/opt/CMakeFiles/LLVMOptDriver.dir/optdriver.cpp.o
78.899 [29/6/4096] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelDAGToDAG.cpp.o
79.081 [29/5/4097] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
79.849 [29/4/4098] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
85.309 [29/3/4099] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /usr/bin/clang++-21 -DEXPENSIVE_CHECKS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_DEBUG -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/lib/Target/AMDGPU -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/include -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include -U_GLIBCXX_DEBUG -Wno-misleading-indentation -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -c /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:26:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:16:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNRegPressure.h:20:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSubtarget.h:17:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.h:17:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h:17:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:12:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/Hashing.h:47:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/ADL.h:13:
In file included from /usr/include/c++/13/iterator:66:
In file included from /usr/include/c++/13/bits/streambuf_iterator.h:35:
In file included from /usr/include/c++/13/streambuf:43:
In file included from /usr/include/c++/13/bits/ios_base.h:41:
In file included from /usr/include/c++/13/bits/locale_classes.h:40:
In file included from /usr/include/c++/13/string:51:
/usr/include/c++/13/bits/stl_algobase.h:185:7: error: no matching function for call to 'swap'
  185 |       swap(*__a, *__b);
      |       ^~~~
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1538:12: note: in instantiation of function template specialization 'std::iter_swap<llvm::PreRARematStage::ScoredRemat *, llvm::PreRARematStage::ScoredRemat *>' requested here
 1538 |       std::iter_swap(first, first + offset);
      |            ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1573:9: note: in instantiation of function template specialization 'llvm::shuffle<llvm::PreRARematStage::ScoredRemat *, std::mersenne_twister_engine<unsigned long, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253> &>' requested here
 1573 |   llvm::shuffle(Start, End, Generator);
      |         ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1600:11: note: in instantiation of function template specialization 'llvm::detail::presortShuffle<llvm::PreRARematStage::ScoredRemat *>' requested here
 1600 |   detail::presortShuffle<IteratorTy>(Start, End);
      |           ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1638:5: note: in instantiation of function template specialization 'llvm::array_pod_sort<llvm::PreRARematStage::ScoredRemat *>' requested here
 1638 |     array_pod_sort(Start, End);
      |     ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1648:9: note: in instantiation of function template specialization 'llvm::sort<llvm::PreRARematStage::ScoredRemat *>' requested here
 1648 |   llvm::sort(adl_begin(C), adl_end(C));
      |         ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:1414:5: note: in instantiation of function template specialization 'llvm::sort<llvm::SmallVector<llvm::PreRARematStage::ScoredRemat> &>' requested here
 1414 |     sort(ScoredRemats);
      |     ^
/usr/include/c++/13/bits/move.h:189:5: note: candidate template ignored: requirement '__and_<std::__not_<std::__is_tuple_like<llvm::PreRARematStage::ScoredRemat>>, std::is_move_constructible<llvm::PreRARematStage::ScoredRemat>, std::is_move_assignable<llvm::PreRARematStage::ScoredRemat>>::value' was not satisfied [with _Tp = llvm::PreRARematStage::ScoredRemat]

RKSimon · 2026-01-13T11:54:57Z

@lucas-rami this is failing on my msvc build:

C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): error C2672: 'swap': no matching overloaded function found
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\optional(968): note: could be 'void std::swap(std::optional<_Ty> &,std::optional<_Ty> &) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(std::optional<_Ty> &,std::optional<_Ty> &) noexcept(<expr>)': could not deduce template argument for 'std::optional<_Ty> &' from 'T'
        with
        [
            T=llvm::PreRARematStage::ScoredRemat
        ]
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\utility(489): note: or       'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &) noexcept(<expr>)': could not deduce template argument for 'std::pair<_Ty1,_Ty2> &' from 'T'
        with
        [
            T=llvm::PreRARematStage::ScoredRemat
        ]
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\type_traits(2077): note: or       'void std::swap(_Ty (&)[_Size],_Ty (&)[_Size]) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(_Ty (&)[_Size],_Ty (&)[_Size]) noexcept(<expr>)': could not deduce template argument for '_Ty (&)[_Size]' from 'T'
        with
        [
            T=llvm::PreRARematStage::ScoredRemat
        ]
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\type_traits(2074): note: or       'void std::swap(_Ty &,_Ty &) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(_Ty &,_Ty &) noexcept(<expr>)': could not deduce template argument for '__formal'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\type_traits(2070): note: 'std::enable_if_t<false,int>' : Failed to specialize alias template
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: the template instantiation context (the oldest one first) is
E:\llvm\llvm-project\llvm\lib\Target\AMDGPU\GCNSchedStrategy.cpp(1414): note: see reference to function template instantiation 'void llvm::sort<llvm::SmallVector<llvm::PreRARematStage::ScoredRemat,1>&>(Container)' being compiled
        with
        [
            Container=llvm::SmallVector<llvm::PreRARematStage::ScoredRemat,1> &
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1648): note: see reference to function template instantiation 'void llvm::sort<llvm::PreRARematStage::ScoredRemat*>(IteratorTy,IteratorTy)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1638): note: see reference to function template instantiation 'void llvm::array_pod_sort<IteratorTy>(IteratorTy,IteratorTy)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1600): note: see reference to function template instantiation 'void llvm::detail::presortShuffle<IteratorTy>(IteratorTy,IteratorTy)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1573): note: see reference to function template instantiation 'void llvm::shuffle<IteratorTy,std::mt19937&>(Iterator,Iterator,RNG)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *,
            Iterator=llvm::PreRARematStage::ScoredRemat *,
            RNG=std::mt19937 &
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1538): note: see reference to function template instantiation 'void std::iter_swap<Iterator,Iterator>(_FwdIt1,_FwdIt2)' being compiled
        with
        [
            Iterator=llvm::PreRARematStage::ScoredRemat *,
            _FwdIt1=llvm::PreRARematStage::ScoredRemat *,
            _FwdIt2=llvm::PreRARematStage::ScoredRemat *
        ]
ninja: build stopped: subcommand failed.

lucas-rami · 2026-01-13T12:12:04Z

@RKSimon Looking into this. Should I revert in the meantime?

jplehr · 2026-01-13T12:22:38Z

We see failures on our HIP bot after this one landed. Reverting would give us feedback on whether it is indeed the culprit. (I suspect it is)
Edit: https://lab.llvm.org/buildbot/#/builders/123/builds/33477

lucas-rami · 2026-01-13T12:24:33Z

#175755 should fix this.

llvm-ci · 2026-01-13T12:45:38Z

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-asan running on sanitizer-buildbot8 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/16530

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93974 tests, 72 workers --
Testing:  0.. 10.. 20.. 30
FAIL: LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir (31971 of 93974)
******************** TEST 'LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
142.54s: Clang :: Preprocessor/riscv-target-features.c
136.42s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
132.18s: Clang :: Driver/arm-cortex-cpus-1.c
131.32s: Clang :: Driver/arm-cortex-cpus-2.c
128.29s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
124.73s: Clang :: OpenMP/target_update_codegen.cpp
111.14s: Clang :: Preprocessor/aarch64-target-features.c
111.13s: Clang :: Preprocessor/arm-target-features.c
96.80s: LLVM :: CodeGen/RISCV/attributes.ll
92.42s: Clang :: Preprocessor/predefined-arch-macros.c
90.90s: Clang :: Driver/fsanitize.c
89.52s: Clang :: Driver/linux-ld.c
85.44s: LLVM :: tools/llvm-exegesis/AArch64/all-opcodes.test
84.65s: Clang :: Driver/clang_f_opts.c
78.11s: Clang :: Driver/cl-options.c
71.68s: Clang :: Analysis/a_flaky_crash.cpp
70.14s: Clang :: Driver/range-warnings.c
69.95s: Clang :: Driver/x86-target-features.c
68.62s: Clang :: Preprocessor/init.c
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93974 tests, 72 workers --
Testing:  0.. 10.. 20.. 30
FAIL: LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir (31971 of 93974)
******************** TEST 'LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
142.54s: Clang :: Preprocessor/riscv-target-features.c
136.42s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
132.18s: Clang :: Driver/arm-cortex-cpus-1.c
131.32s: Clang :: Driver/arm-cortex-cpus-2.c
128.29s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
124.73s: Clang :: OpenMP/target_update_codegen.cpp
111.14s: Clang :: Preprocessor/aarch64-target-features.c
111.13s: Clang :: Preprocessor/arm-target-features.c
96.80s: LLVM :: CodeGen/RISCV/attributes.ll
92.42s: Clang :: Preprocessor/predefined-arch-macros.c
90.90s: Clang :: Driver/fsanitize.c
89.52s: Clang :: Driver/linux-ld.c
85.44s: LLVM :: tools/llvm-exegesis/AArch64/all-opcodes.test
84.65s: Clang :: Driver/clang_f_opts.c
78.11s: Clang :: Driver/cl-options.c
71.68s: Clang :: Analysis/a_flaky_crash.cpp
70.14s: Clang :: Driver/range-warnings.c
69.95s: Clang :: Driver/x86-target-features.c
68.62s: Clang :: Preprocessor/init.c

…#175755) On some configurations sorting `ScoredRemat` objects which contains const members causes a compile failure due to impossibility of swapping/moving objects. The problem was introduced in #175050. This removes const from those fields to address the issue. The design will soon change anyway to not rely on sorting objects of this type, and consts were only here for semantic clarity.

A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing a deleted instruction. This patch fixes this by reversing the order.

…175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after #175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 #2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.

…vm#175050)" This reverts commit 6aaa7fd.

…e deleting (#175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm/llvm-project#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 #2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.

…vm#175050)" This reverts commit 6aaa7fd.

…75050)" (#175813) This reverts 8ab7937 and f21e359 which are causing a HIP failure in a Blender test.

This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements. As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state. The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would 1. (if the function is spilling) reduce RP in highest-frequency target regions, 2. be rematerialized to lowest-frequency target regions, and 3. reduce RP in the highest number of target regions. All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in `machine-scheduler-rematerialization-scoring.mir` showcase how the scoring system dictates which rematerialization are the most beneficial and therefore performed first A minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This leaves original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored.

…llvm#175755) On some configurations sorting `ScoredRemat` objects which contains const members causes a compile failure due to impossibility of swapping/moving objects. The problem was introduced in llvm#175050. This removes const from those fields to address the issue. The design will soon change anyway to not rely on sorting objects of this type, and consts were only here for semantic clarity.

…lvm#175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 llvm#1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.

…vm#175050)" (llvm#175813) This reverts 8ab7937 and f21e359 which are causing a HIP failure in a Blender test.

…lvm#175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.

…lvm#175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks. stack-info: PR: #8, branch: users/lucas-rami/stack/rematerialization-rollback-logi/4

…175050)" This re-applies commit f21e359 along with the compile fix failure introduced in 8ab7937 before the initial patch was reverted and fixes for the previously observed assert failure. We were hitting the assert in the HIP Blender due to a combination of two issues that could happen when rematerializations are being rolled back. 1. Small changes in slots indices (while preserving instruction order) compared to the pre-re-scheduling state meand that we have to re-compute live ranges for all register operands of rolled back rematerializations. This was not being done before. 2. Re-scheduling can move registers that were rematerialized at arbitrary positions in their respective regions while their opcode is set to DBG_VALUE, even before their read operands are defined. This makes re-scheduling reverts mandatory before rolling back rematerializations, as otherwise def-use chains may be broken. The original patch did not guatantee that, but previous refactoring of the rollback/revert logic for the rematerialization stage now ensures that reverts always precede rollbacks.

…lvm#175807) A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm#175050: ``` ==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448 READ of size 8 at 0xe26e74e12368 thread T0 #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35 llvm#1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6 llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57 ... ``` This is because it was printing an instruction that had already been deleted. This patch fixes this by reversing the order.

…vm#175050)" (llvm#175813) This reverts 8ab7937 and f21e359 which are causing a HIP failure in a Blender test.

Scoring system for rematerializations

139dd46

lucas-rami requested review from arsenm, nhaehnle and qcolombet January 8, 2026 18:54

lucas-rami added the backend:AMDGPU label Jan 8, 2026

qcolombet approved these changes Jan 12, 2026

View reviewed changes

lucas-rami merged commit 6aaa7fd into llvm:main Jan 13, 2026
12 checks passed

lucas-rami mentioned this pull request Jan 13, 2026

[AMDGPU][Scheduler] Scoring system for rematerialization candidates #153092

Closed

lucas-rami mentioned this pull request Jan 13, 2026

[AMDGPU][Scheduler] Fix compile failure due to const/sort interaction #175755

Merged

ro-i mentioned this pull request Jan 13, 2026

[AMDGPU] misched: avoid subregister dependencies #140255

Open

thurstond mentioned this pull request Jan 13, 2026

[AMDGPU][Scheduler] Fix use-after-poison by printing before deleting #175807

Merged

lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 13, 2026

Revert "[AMDGPU][Scheduler] Scoring system for rematerializations (ll…

2e60d0f

…vm#175050)" This reverts commit 6aaa7fd.

lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 13, 2026

Revert "[AMDGPU][Scheduler] Scoring system for rematerializations (ll…

f21e359

…vm#175050)" This reverts commit 6aaa7fd.

jplehr pushed a commit that referenced this pull request Jan 13, 2026

Revert "[AMDGPU][Scheduler] Scoring system for rematerializations (#1…

7b699cc

…75050)" (#175813) This reverts 8ab7937 and f21e359 which are causing a HIP failure in a Blender test.

This was referenced Jan 21, 2026

rematerialization rollback logic lucas-rami/llvm-project#4

Closed

Re-apply "[AMDGPU][Scheduler] Scoring system for rematerializations (#175050)" lucas-rami/llvm-project#8

Closed

BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026

Revert "[AMDGPU][Scheduler] Scoring system for rematerializations (ll…

a62f2bb

…vm#175050)" (llvm#175813) This reverts 8ab7937 and f21e359 which are causing a HIP failure in a Blender test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU][Scheduler] Scoring system for rematerializations #175050

[AMDGPU][Scheduler] Scoring system for rematerializations #175050

Uh oh!

lucas-rami commented Jan 8, 2026 •

edited by jayfoad

Loading

Uh oh!

llvmbot commented Jan 8, 2026

Uh oh!

qcolombet left a comment

Uh oh!

Uh oh!

lucas-rami commented Jan 13, 2026

Uh oh!

llvm-ci commented Jan 13, 2026

Uh oh!

RKSimon commented Jan 13, 2026

Uh oh!

lucas-rami commented Jan 13, 2026

Uh oh!

jplehr commented Jan 13, 2026 •

edited

Loading

Uh oh!

lucas-rami commented Jan 13, 2026

Uh oh!

llvm-ci commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[AMDGPU][Scheduler] Scoring system for rematerializations #175050

[AMDGPU][Scheduler] Scoring system for rematerializations #175050

Uh oh!

Conversation

lucas-rami commented Jan 8, 2026 • edited by jayfoad Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 8, 2026

Uh oh!

qcolombet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucas-rami commented Jan 13, 2026

Uh oh!

llvm-ci commented Jan 13, 2026

Uh oh!

RKSimon commented Jan 13, 2026

Uh oh!

lucas-rami commented Jan 13, 2026

Uh oh!

jplehr commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucas-rami commented Jan 13, 2026

Uh oh!

llvm-ci commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lucas-rami commented Jan 8, 2026 •

edited by jayfoad

Loading

jplehr commented Jan 13, 2026 •

edited

Loading