Skip to content

Conversation

@lucas-rami
Copy link
Contributor

@lucas-rami lucas-rami commented Jan 8, 2026

This is simply the last rebased version of #153092 which GitHub seems unable to open due to history length. All existing feedback was addressed, the last notable change being improvements to rollbacking which no longer has to re-create the original MI in the MIR.


This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements.

As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state.

The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would

  1. (if the function is spilling) reduce RP in highest-frequency target regions,
  2. be rematerialized to lowest-frequency target regions, and
  3. reduce RP in the highest number of target regions.

All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in machine-scheduler-rematerialization-scoring.mir showcase how the scoring system dictates which rematerialization are the most beneficial and therefore performed first

A minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This leaves original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored.

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2026

@llvm/pr-subscribers-backend-amdgpu

Author: Lucas Ramirez (lucas-rami)

Changes

This is simply the last rebased version of #153092 which GitHub seems unable to open due to history length. All existing feedback was addressed, the last notable change being improvements to rollbacking which no longer has to re-create the original MI in the MIR.


This is a significant refactoring of the scheduler's rematerialization stage meant to improve rematerialization capabilities and lay strong foundations for future improvements.

As before, the stage identifies scheduling regions in which RP must be reduced (so-called "target regions"), then rematerializes registers to try and achieve the desired reduction. All regions affected by rematerializations are re-scheduled, and, if the MIR is deemed worse than before, rematerializations are rolled back to leave the MIR in its pre-stage state.

The core contribution is a scoring system to estimate the benefit of each rematerialization candidate. This score favors rematerializing candidates which, in order, would

  1. (if the function is spilling) reduce RP in highest-frequency target regions,
  2. be rematerialized to lowest-frequency target regions, and
  3. reduce RP in the highest number of target regions.

All rematerialization opportunities are initially scored and rematerialized in decreasing score order until RP objectives are met or pre-computed scores diverge from reality; in the latter case remaining candidates are re-scored and the process repeats. New tests in machine-scheduler-rematerialization-scoring.mir showcase how the scoring system dictates which rematerialization are the most beneficial and therefore performed first

A minor contribution included in this PR following previous feedback is that rollback now happens in-place i.e., without having to re-create the rematerialized MI. This lives original slot indices and registers untouched. We achieve this by temporarily switching the opcode of rollback-able instructions to a debug opcode during re-scheduling so that they are ignored.


Patch is 174.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175050.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+505-291)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+207-49)
  • (added) llvm/test/CodeGen/AMDGPU/machine-scheduler-rematerialization-scoring.mir (+523)
  • (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir (+194-194)
  • (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir (+5-5)
  • (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir (+242-35)
  • (modified) llvm/test/CodeGen/AMDGPU/mfma-loop.ll (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index c8ce3aab3f303..cb0cb6510ecd4 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -28,11 +28,20 @@
 #include "GCNRegPressure.h"
 #include "SIMachineFunctionInfo.h"
 #include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/CodeGen/CalcSpillWeights.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/RegisterClassInfo.h"
 #include "llvm/MC/LaneBitmask.h"
+#include "llvm/MC/MCInstrItineraries.h"
+#include "llvm/MC/MCSchedule.h"
+#include "llvm/MC/TargetRegistry.h"
 #include "llvm/Support/ErrorHandling.h"
+#include <limits>
+#include <string>
 
 #define DEBUG_TYPE "machine-scheduler"
 
@@ -970,6 +979,8 @@ void GCNScheduleDAGMILive::schedule() {
 
 GCNRegPressure
 GCNScheduleDAGMILive::getRealRegPressure(unsigned RegionIdx) const {
+  if (Regions[RegionIdx].first == Regions[RegionIdx].second)
+    return llvm::getRegPressure(MRI, LiveIns[RegionIdx]);
   GCNDownwardRPTracker RPTracker(*LIS);
   RPTracker.advance(Regions[RegionIdx].first, Regions[RegionIdx].second,
                     &LiveIns[RegionIdx]);
@@ -1272,33 +1283,222 @@ bool ClusteredLowOccStage::initGCNSchedStage() {
 #define REMAT_PREFIX "[PreRARemat] "
 #define REMAT_DEBUG(X) LLVM_DEBUG(dbgs() << REMAT_PREFIX; X;)
 
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+Printable PreRARematStage::ScoredRemat::print() const {
+  return Printable([&](raw_ostream &OS) {
+    OS << '(' << MaxFreq << ", " << FreqDiff << ", " << RegionImpact << ')';
+  });
+}
+#endif
+
 bool PreRARematStage::initGCNSchedStage() {
   // FIXME: This pass will invalidate cached BBLiveInMap and MBBLiveIns for
   // regions inbetween the defs and region we sinked the def to. Will need to be
   // fixed if there is another pass after this pass.
   assert(!S.hasNextStage());
 
-  if (!GCNSchedStage::initGCNSchedStage() || DAG.Regions.size() == 1)
+  if (!GCNSchedStage::initGCNSchedStage() || DAG.Regions.size() <= 1)
     return false;
 
+  // Maps all MIs (except lone terminators, which are not part of any region) to
+  // their parent region. Non-lone terminators are considered part of the region
+  // they delimitate.
+  DenseMap<MachineInstr *, unsigned> MIRegion(MF.getInstructionCount());
+
   // Before performing any IR modification record the parent region of each MI
   // and the parent MBB of each region.
   const unsigned NumRegions = DAG.Regions.size();
-  RegionBB.reserve(NumRegions);
   for (unsigned I = 0; I < NumRegions; ++I) {
     RegionBoundaries Region = DAG.Regions[I];
     for (auto MI = Region.first; MI != Region.second; ++MI)
       MIRegion.insert({&*MI, I});
-    RegionBB.push_back(Region.first->getParent());
+    MachineBasicBlock *ParentMBB = Region.first->getParent();
+    if (Region.second != ParentMBB->end())
+      MIRegion.insert({&*Region.second, I});
+    RegionBB.push_back(ParentMBB);
+  }
+
+#ifndef NDEBUG
+  auto PrintTargetRegions = [&]() -> void {
+    if (TargetRegions.none()) {
+      dbgs() << REMAT_PREFIX << "No target regions\n";
+      return;
+    }
+    dbgs() << REMAT_PREFIX << "Target regions:\n";
+    for (unsigned I : TargetRegions.set_bits())
+      dbgs() << REMAT_PREFIX << "  [" << I << "] " << RPTargets[I] << '\n';
+  };
+  auto PrintRematReg = [&](const RematReg &Remat) -> Printable {
+    return Printable([&, Remat](raw_ostream &OS) {
+      // Concatenate all region numbers in which the register is unused and
+      // live-through.
+      bool HasLiveThroughRegion = false;
+      OS << '[' << Remat.DefRegion << " -";
+      for (unsigned I = 0; I < NumRegions; ++I) {
+        if (Remat.isUnusedLiveThrough(I)) {
+          if (HasLiveThroughRegion) {
+            OS << ',';
+          } else {
+            OS << "- ";
+            HasLiveThroughRegion = true;
+          }
+          OS << I;
+        }
+      }
+      if (HasLiveThroughRegion)
+        OS << " -";
+      OS << "-> " << Remat.UseRegion << "] ";
+      Remat.DefMI->print(OS, /*IsStandalone=*/true, /*SkipOpers=*/false,
+                         /*SkipDebugLoc=*/false, /*AddNewLine=*/false);
+    });
+  };
+#endif
+
+  // Set an objective for the stage based on current RP in each region.
+  REMAT_DEBUG({
+    dbgs() << "Analyzing ";
+    MF.getFunction().printAsOperand(dbgs(), false);
+    dbgs() << ": ";
+  });
+  if (!setObjective()) {
+    LLVM_DEBUG(dbgs() << "no objective to achieve, occupancy is maximal at "
+                      << MFI.getMaxWavesPerEU() << '\n');
+    return false;
   }
+  LLVM_DEBUG({
+    if (TargetOcc) {
+      dbgs() << "increase occupancy from " << *TargetOcc - 1 << '\n';
+    } else {
+      dbgs() << "reduce spilling (minimum target occupancy is "
+             << MFI.getMinWavesPerEU() << ")\n";
+    }
+    PrintTargetRegions();
+  });
+
+  if (!collectRematRegs(MIRegion)) {
+    REMAT_DEBUG(dbgs() << "No rematerializable registers\n");
+    return false;
+  }
+  const ScoredRemat::FreqInfo FreqInfo(MF, DAG);
+  REMAT_DEBUG({
+    dbgs() << "Rematerializable registers:\n";
+    for (const RematReg &Remat : RematRegs)
+      dbgs() << REMAT_PREFIX << "  " << PrintRematReg(Remat) << '\n';
+    dbgs() << REMAT_PREFIX << "Region frequencies\n";
+    for (auto [I, Freq] : enumerate(FreqInfo.Regions)) {
+      dbgs() << REMAT_PREFIX << "  [" << I << "] ";
+      if (Freq)
+        dbgs() << Freq;
+      else
+        dbgs() << "unknown ";
+      dbgs() << " | " << *DAG.Regions[I].first;
+    }
+  });
 
-  if (!canIncreaseOccupancyOrReduceSpill())
+  SmallVector<ScoredRemat> ScoredRemats;
+  for (const RematReg &Remat : RematRegs)
+    ScoredRemats.emplace_back(&Remat, FreqInfo, DAG);
+
+// Rematerialize registers in successive rounds until all RP targets are
+// satisifed or until we run out of rematerialization candidates.
+#ifndef NDEBUG
+  unsigned RoundNum = 0;
+#endif
+  BitVector RecomputeRP(NumRegions);
+  do {
+    assert(!ScoredRemats.empty() && "no more remat candidates");
+
+    // (Re-)Score and (re-)sort all remats in increasing score order.
+    for (ScoredRemat &Remat : ScoredRemats)
+      Remat.update(TargetRegions, RPTargets, FreqInfo, !TargetOcc);
+    sort(ScoredRemats);
+
+    REMAT_DEBUG({
+      dbgs() << "==== ROUND " << RoundNum++ << " ====\n"
+             << REMAT_PREFIX
+             << "Candidates with non-null score, in rematerialization order:\n";
+      for (const ScoredRemat &RematDecision : reverse(ScoredRemats)) {
+        if (RematDecision.hasNullScore())
+          break;
+        dbgs() << REMAT_PREFIX << "  " << RematDecision.print() << " | "
+               << *RematDecision.Remat->DefMI;
+      }
+      PrintTargetRegions();
+    });
+
+    RecomputeRP.reset();
+    unsigned RematIdx = ScoredRemats.size();
+
+    // Rematerialize registers in decreasing score order until we estimate
+    // that all RP targets are satisfied or until rematerialization candidates
+    // are no longer useful to decrease RP.
+    for (; RematIdx && TargetRegions.any(); --RematIdx) {
+      const ScoredRemat &Candidate = ScoredRemats[RematIdx - 1];
+      // Stop rematerializing on encountering a null score. Since scores
+      // monotonically decrease as we rematerialize, we know there is nothing
+      // useful left to do in such cases, even if we were to re-score.
+      if (Candidate.hasNullScore()) {
+        RematIdx = 0;
+        break;
+      }
+
+      const RematReg &Remat = *Candidate.Remat;
+      // When previous rematerializations in this round have already satisfied
+      // RP targets in all regions this rematerialization can impact, we have a
+      // good indication that our scores have diverged significantly from
+      // reality, in which case we interrupt this round and re-score. This also
+      // ensures that every rematerialization we perform is possibly impactful
+      // in at least one target region.
+      if (!Remat.maybeBeneficial(TargetRegions, RPTargets))
+        break;
+
+      REMAT_DEBUG(dbgs() << "** REMAT " << PrintRematReg(Remat) << '\n';);
+      // Every rematerialization we do here is likely to move the instruction
+      // into a higher frequency region, increasing the total sum latency of the
+      // instruction itself. This is acceptable if we are eliminating a spill in
+      // the process, but when the goal is increasing occupancy we get nothing
+      // out of rematerialization if occupancy is not increased in the end; in
+      // such cases we want to roll back the rematerialization.
+      RollbackInfo *Rollback =
+          TargetOcc ? &Rollbacks.emplace_back(&Remat) : nullptr;
+      rematerialize(Remat, RecomputeRP, Rollback);
+      unsetSatisifedRPTargets(Remat.Live);
+    }
+
+    REMAT_DEBUG({
+      if (!TargetRegions.any()) {
+        dbgs() << "** Interrupt round on all targets achieved\n";
+      } else if (RematIdx) {
+        dbgs() << "** Interrupt round on stale score for "
+               << *ScoredRemats[RematIdx - 1].Remat->DefMI;
+      } else {
+        dbgs() << "** Stop on exhausted rematerialization candidates\n";
+      }
+    });
+
+    // Peel off registers we already rematerialized from the vector's tail.
+    ScoredRemats.truncate(RematIdx);
+  } while ((updateAndVerifyRPTargets(RecomputeRP) || TargetRegions.any()) &&
+           !ScoredRemats.empty());
+  if (RescheduleRegions.none())
     return false;
 
-  // Rematerialize identified instructions and update scheduler's state.
-  rematerialize();
-  if (GCNTrackers)
-    DAG.RegionLiveOuts.buildLiveRegMap();
+  // Commit all pressure changes to the DAG and compute minimum achieved
+  // occupancy in impacted regions.
+  REMAT_DEBUG(dbgs() << "==== REMAT RESULTS ====\n");
+  unsigned DynamicVGPRBlockSize = MFI.getDynamicVGPRBlockSize();
+  for (unsigned I : RescheduleRegions.set_bits()) {
+    DAG.Pressure[I] = RPTargets[I].getCurrentRP();
+    REMAT_DEBUG(dbgs() << '[' << I << "] Achieved occupancy "
+                       << DAG.Pressure[I].getOccupancy(ST, DynamicVGPRBlockSize)
+                       << " (" << RPTargets[I] << ")\n");
+  }
+  AchievedOcc = MFI.getMaxWavesPerEU();
+  for (const GCNRegPressure &RP : DAG.Pressure) {
+    AchievedOcc =
+        std::min(AchievedOcc, RP.getOccupancy(ST, DynamicVGPRBlockSize));
+  }
+
   REMAT_DEBUG({
     dbgs() << "Retrying function scheduling with new min. occupancy of "
            << AchievedOcc << " from rematerializing (original was "
@@ -1307,7 +1507,6 @@ bool PreRARematStage::initGCNSchedStage() {
       dbgs() << ", target was " << *TargetOcc;
     dbgs() << ")\n";
   });
-
   if (AchievedOcc > DAG.MinOccupancy) {
     DAG.MinOccupancy = AchievedOcc;
     SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
@@ -1341,6 +1540,10 @@ void UnclusteredHighRPStage::finalizeGCNSchedStage() {
 }
 
 bool GCNSchedStage::initGCNRegion() {
+  // Skip empty scheduling region.
+  if (DAG.begin() == DAG.end())
+    return false;
+
   // Check whether this new region is also a new block.
   if (DAG.RegionBegin->getParent() != CurrentMBB)
     setupNewBlock();
@@ -1348,8 +1551,8 @@ bool GCNSchedStage::initGCNRegion() {
   unsigned NumRegionInstrs = std::distance(DAG.begin(), DAG.end());
   DAG.enterRegion(CurrentMBB, DAG.begin(), DAG.end(), NumRegionInstrs);
 
-  // Skip empty scheduling regions (0 or 1 schedulable instructions).
-  if (DAG.begin() == DAG.end() || DAG.begin() == std::prev(DAG.end()))
+  // Skip regions with 1 schedulable instruction.
+  if (DAG.begin() == std::prev(DAG.end()))
     return false;
 
   LLVM_DEBUG(dbgs() << "********** MI Scheduling **********\n");
@@ -1837,27 +2040,20 @@ void GCNSchedStage::revertScheduling() {
   DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
 }
 
-bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
+bool PreRARematStage::setObjective() {
   const Function &F = MF.getFunction();
 
-  // Maps optimizable regions (i.e., regions at minimum and register-limited
-  // occupancy, or regions with spilling) to the target RP we would like to
-  // reach.
-  DenseMap<unsigned, GCNRPTarget> OptRegions;
+  // Set up "spilling targets" for all regions.
   unsigned MaxSGPRs = ST.getMaxNumSGPRs(F);
   unsigned MaxVGPRs = ST.getMaxNumVGPRs(F);
-  auto ResetTargetRegions = [&]() {
-    OptRegions.clear();
-    for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-      const GCNRegPressure &RP = DAG.Pressure[I];
-      GCNRPTarget Target(MaxSGPRs, MaxVGPRs, MF, RP);
-      if (!Target.satisfied())
-        OptRegions.insert({I, Target});
-    }
-  };
+  for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
+    const GCNRegPressure &RP = DAG.Pressure[I];
+    GCNRPTarget &Target = RPTargets.emplace_back(MaxSGPRs, MaxVGPRs, MF, RP);
+    if (!Target.satisfied())
+      TargetRegions.set(I);
+  }
 
-  ResetTargetRegions();
-  if (!OptRegions.empty() || DAG.MinOccupancy >= MFI.getMaxWavesPerEU()) {
+  if (TargetRegions.any() || DAG.MinOccupancy >= MFI.getMaxWavesPerEU()) {
     // In addition to register usage being above addressable limits, occupancy
     // below the minimum is considered like "spilling" as well.
     TargetOcc = std::nullopt;
@@ -1865,94 +2061,68 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
     // There is no spilling and room to improve occupancy; set up "increased
     // occupancy targets" for all regions.
     TargetOcc = DAG.MinOccupancy + 1;
-    unsigned VGPRBlockSize =
-        MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize();
+    const unsigned VGPRBlockSize = MFI.getDynamicVGPRBlockSize();
     MaxSGPRs = ST.getMaxNumSGPRs(*TargetOcc, false);
     MaxVGPRs = ST.getMaxNumVGPRs(*TargetOcc, VGPRBlockSize);
-    ResetTargetRegions();
-  }
-  REMAT_DEBUG({
-    dbgs() << "Analyzing ";
-    MF.getFunction().printAsOperand(dbgs(), false);
-    dbgs() << ": ";
-    if (OptRegions.empty()) {
-      dbgs() << "no objective to achieve, occupancy is maximal at "
-             << MFI.getMaxWavesPerEU();
-    } else if (!TargetOcc) {
-      dbgs() << "reduce spilling (minimum target occupancy is "
-             << MFI.getMinWavesPerEU() << ')';
-    } else {
-      dbgs() << "increase occupancy from " << DAG.MinOccupancy << " to "
-             << TargetOcc;
-    }
-    dbgs() << '\n';
-    for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-      if (auto OptIt = OptRegions.find(I); OptIt != OptRegions.end()) {
-        dbgs() << REMAT_PREFIX << "  [" << I << "] " << OptIt->getSecond()
-               << '\n';
-      }
+    for (auto [I, Target] : enumerate(RPTargets)) {
+      Target.setTarget(MaxSGPRs, MaxVGPRs);
+      if (!Target.satisfied())
+        TargetRegions.set(I);
     }
-  });
-  if (OptRegions.empty())
-    return false;
+  }
 
-  // Accounts for a reduction in RP in an optimizable region. Returns whether we
-  // estimate that we have identified enough rematerialization opportunities to
-  // achieve our goal, and sets Progress to true when this particular reduction
-  // in pressure was helpful toward that goal.
-  auto ReduceRPInRegion = [&](auto OptIt, Register Reg, LaneBitmask Mask,
-                              bool &Progress) -> bool {
-    GCNRPTarget &Target = OptIt->getSecond();
-    if (!Target.isSaveBeneficial(Reg))
-      return false;
-    Progress = true;
-    Target.saveReg(Reg, Mask, DAG.MRI);
-    if (Target.satisfied())
-      OptRegions.erase(OptIt->getFirst());
-    return OptRegions.empty();
-  };
+  return TargetRegions.any();
+}
 
+bool PreRARematStage::collectRematRegs(
+    const DenseMap<MachineInstr *, unsigned> &MIRegion) {
   // We need up-to-date live-out info. to query live-out register masks in
   // regions containing rematerializable instructions.
   DAG.RegionLiveOuts.buildLiveRegMap();
 
-  // Cache set of registers that are going to be rematerialized.
-  DenseSet<unsigned> RematRegs;
+  // Set of registers already marked for potential remterialization; used to
+  // avoid rematerialization chains.
+  SmallSet<Register, 4> MarkedRegs;
+  auto IsMarkedForRemat = [&MarkedRegs](const MachineOperand &MO) -> bool {
+    return MO.isReg() && MarkedRegs.contains(MO.getReg());
+  };
 
   // Identify rematerializable instructions in the function.
   for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-    auto Region = DAG.Regions[I];
-    for (auto MI = Region.first; MI != Region.second; ++MI) {
+    RegionBoundaries Bounds = DAG.Regions[I];
+    for (auto MI = Bounds.first; MI != Bounds.second; ++MI) {
       // The instruction must be rematerializable.
       MachineInstr &DefMI = *MI;
       if (!isReMaterializable(DefMI))
         continue;
 
-      // We only support rematerializing virtual registers with one definition.
+      // We only support rematerializing virtual registers with one
+      // definition.
       Register Reg = DefMI.getOperand(0).getReg();
       if (!Reg.isVirtual() || !DAG.MRI.hasOneDef(Reg))
         continue;
 
       // We only care to rematerialize the instruction if it has a single
-      // non-debug user in a different region. The using MI may not belong to a
-      // region if it is a lone region terminator.
+      // non-debug user in a different region.
+      // FIXME: Allow rematerializations with multiple uses. This should be
+      // relatively easy to support using the current cost model.
       MachineInstr *UseMI = DAG.MRI.getOneNonDBGUser(Reg);
       if (!UseMI)
         continue;
       auto UseRegion = MIRegion.find(UseMI);
-      if (UseRegion != MIRegion.end() && UseRegion->second == I)
+      if (UseRegion == MIRegion.end() || UseRegion->second == I)
         continue;
 
       // Do not rematerialize an instruction if it uses or is used by an
       // instruction that we have designated for rematerialization.
       // FIXME: Allow for rematerialization chains: this requires 1. updating
-      // remat points to account for uses that are rematerialized, and 2. either
-      // rematerializing the candidates in careful ordering, or deferring the
-      // MBB RP walk until the entire chain has been rematerialized.
-      if (Rematerializations.contains(UseMI) ||
-          llvm::any_of(DefMI.operands(), [&RematRegs](MachineOperand &MO) {
-            return MO.isReg() && RematRegs.contains(MO.getReg());
-          }))
+      // remat points to account for uses that are rematerialized, and 2.
+      // either rematerializing the candidates in careful ordering, or
+      // deferring the MBB RP walk until the entire chain has been
+      // rematerialized.
+      const MachineOperand &UseMO = UseMI->getOperand(0);
+      if (IsMarkedForRemat(UseMO) ||
+          llvm::any_of(DefMI.operands(), IsMarkedForRemat))
         continue;
 
       // Do not rematerialize an instruction it it uses registers that aren't
@@ -1963,106 +2133,182 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
                                               *DAG.TII))
         continue;
 
-      REMAT_DEBUG(dbgs() << "Region " << I << ": remat instruction " << DefMI);
-      RematInstruction &Remat =
-          Rematerializations.try_emplace(&DefMI, UseMI).first->second;
-
-      bool RematUseful = false;
-      if (auto It = OptRegions.find(I); It != OptRegions.end()) {
-        // Optimistically consider that moving the instruction out of its
-        // defining region will reduce RP in the latter; this assumes that
-        // maximum RP in the region is reached somewhere between the defining
-        // instruction and the end of the region.
-        REMAT_DEBUG(dbgs() << "  Defining region is optimizable\n");
-        LaneBitmask Mask = DAG.RegionLiveOuts.getLiveRegsForRegionIdx(I)[Reg];
-        if (ReduceRPInRegion(It, Reg, Mask, RematUseful))
-          return true;
-      }
-
-      for (unsigned LIRegion = 0; LIRegion != E; ++LIRegion) {
-        // We are only collecting regions in which the register is a live-in
-        // (and may be live-through).
-        auto It = DAG.LiveIns[LIRegion].find(Reg);
-        if (It == DAG.LiveIns[LIRegion].end() || It->second.none())
-          continue;
-        Remat.LiveInRegions.insert(LIRegion);
-
-        // Account for the reduction in RP due to the rematerialization in an
-     ...
[truncated]

Copy link
Collaborator

@qcolombet qcolombet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I had already approved the original one @lucas-rami .
Let me know if that's not the case and I take a closer look.

@lucas-rami lucas-rami merged commit 6aaa7fd into llvm:main Jan 13, 2026
12 checks passed
@lucas-rami
Copy link
Contributor Author

@qcolombet I don't see the approve on the original one. In any case, landed this one and closed the original (which managed to load this time).

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 13, 2026

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 6 "build-default".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/15476

Here is the relevant piece of the build log for the reference
Step 6 (build-default) failure: cmake (failure)
...
78.459 [29/12/4090] Linking CXX executable bin/llvm-profgen
78.477 [29/11/4091] Building CXX object tools/llc/CMakeFiles/llc.dir/NewPMDriver.cpp.o
78.478 [29/10/4092] Linking CXX executable bin/sancov
78.485 [29/9/4093] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
78.614 [29/8/4094] Building CXX object tools/opt/CMakeFiles/LLVMOptDriver.dir/NewPMDriver.cpp.o
78.625 [29/7/4095] Building CXX object tools/opt/CMakeFiles/LLVMOptDriver.dir/optdriver.cpp.o
78.899 [29/6/4096] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelDAGToDAG.cpp.o
79.081 [29/5/4097] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
79.849 [29/4/4098] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
85.309 [29/3/4099] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /usr/bin/clang++-21 -DEXPENSIVE_CHECKS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_DEBUG -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/lib/Target/AMDGPU -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/include -I/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include -U_GLIBCXX_DEBUG -Wno-misleading-indentation -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -c /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:26:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:16:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNRegPressure.h:20:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSubtarget.h:17:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.h:17:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h:17:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:12:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/Hashing.h:47:
In file included from /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/ADL.h:13:
In file included from /usr/include/c++/13/iterator:66:
In file included from /usr/include/c++/13/bits/streambuf_iterator.h:35:
In file included from /usr/include/c++/13/streambuf:43:
In file included from /usr/include/c++/13/bits/ios_base.h:41:
In file included from /usr/include/c++/13/bits/locale_classes.h:40:
In file included from /usr/include/c++/13/string:51:
/usr/include/c++/13/bits/stl_algobase.h:185:7: error: no matching function for call to 'swap'
  185 |       swap(*__a, *__b);
      |       ^~~~
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1538:12: note: in instantiation of function template specialization 'std::iter_swap<llvm::PreRARematStage::ScoredRemat *, llvm::PreRARematStage::ScoredRemat *>' requested here
 1538 |       std::iter_swap(first, first + offset);
      |            ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1573:9: note: in instantiation of function template specialization 'llvm::shuffle<llvm::PreRARematStage::ScoredRemat *, std::mersenne_twister_engine<unsigned long, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18, 1812433253> &>' requested here
 1573 |   llvm::shuffle(Start, End, Generator);
      |         ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1600:11: note: in instantiation of function template specialization 'llvm::detail::presortShuffle<llvm::PreRARematStage::ScoredRemat *>' requested here
 1600 |   detail::presortShuffle<IteratorTy>(Start, End);
      |           ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1638:5: note: in instantiation of function template specialization 'llvm::array_pod_sort<llvm::PreRARematStage::ScoredRemat *>' requested here
 1638 |     array_pod_sort(Start, End);
      |     ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1648:9: note: in instantiation of function template specialization 'llvm::sort<llvm::PreRARematStage::ScoredRemat *>' requested here
 1648 |   llvm::sort(adl_begin(C), adl_end(C));
      |         ^
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:1414:5: note: in instantiation of function template specialization 'llvm::sort<llvm::SmallVector<llvm::PreRARematStage::ScoredRemat> &>' requested here
 1414 |     sort(ScoredRemats);
      |     ^
/usr/include/c++/13/bits/move.h:189:5: note: candidate template ignored: requirement '__and_<std::__not_<std::__is_tuple_like<llvm::PreRARematStage::ScoredRemat>>, std::is_move_constructible<llvm::PreRARematStage::ScoredRemat>, std::is_move_assignable<llvm::PreRARematStage::ScoredRemat>>::value' was not satisfied [with _Tp = llvm::PreRARematStage::ScoredRemat]

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 13, 2026

@lucas-rami this is failing on my msvc build:

C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): error C2672: 'swap': no matching overloaded function found
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\optional(968): note: could be 'void std::swap(std::optional<_Ty> &,std::optional<_Ty> &) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(std::optional<_Ty> &,std::optional<_Ty> &) noexcept(<expr>)': could not deduce template argument for 'std::optional<_Ty> &' from 'T'
        with
        [
            T=llvm::PreRARematStage::ScoredRemat
        ]
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\utility(489): note: or       'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &) noexcept(<expr>)': could not deduce template argument for 'std::pair<_Ty1,_Ty2> &' from 'T'
        with
        [
            T=llvm::PreRARematStage::ScoredRemat
        ]
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\type_traits(2077): note: or       'void std::swap(_Ty (&)[_Size],_Ty (&)[_Size]) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(_Ty (&)[_Size],_Ty (&)[_Size]) noexcept(<expr>)': could not deduce template argument for '_Ty (&)[_Size]' from 'T'
        with
        [
            T=llvm::PreRARematStage::ScoredRemat
        ]
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\type_traits(2074): note: or       'void std::swap(_Ty &,_Ty &) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: 'void std::swap(_Ty &,_Ty &) noexcept(<expr>)': could not deduce template argument for '__formal'
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\type_traits(2070): note: 'std::enable_if_t<false,int>' : Failed to specialize alias template
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.44.35207\include\algorithm(3789): note: the template instantiation context (the oldest one first) is
E:\llvm\llvm-project\llvm\lib\Target\AMDGPU\GCNSchedStrategy.cpp(1414): note: see reference to function template instantiation 'void llvm::sort<llvm::SmallVector<llvm::PreRARematStage::ScoredRemat,1>&>(Container)' being compiled
        with
        [
            Container=llvm::SmallVector<llvm::PreRARematStage::ScoredRemat,1> &
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1648): note: see reference to function template instantiation 'void llvm::sort<llvm::PreRARematStage::ScoredRemat*>(IteratorTy,IteratorTy)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1638): note: see reference to function template instantiation 'void llvm::array_pod_sort<IteratorTy>(IteratorTy,IteratorTy)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1600): note: see reference to function template instantiation 'void llvm::detail::presortShuffle<IteratorTy>(IteratorTy,IteratorTy)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1573): note: see reference to function template instantiation 'void llvm::shuffle<IteratorTy,std::mt19937&>(Iterator,Iterator,RNG)' being compiled
        with
        [
            IteratorTy=llvm::PreRARematStage::ScoredRemat *,
            Iterator=llvm::PreRARematStage::ScoredRemat *,
            RNG=std::mt19937 &
        ]
E:\llvm\llvm-project\llvm\include\llvm/ADT/STLExtras.h(1538): note: see reference to function template instantiation 'void std::iter_swap<Iterator,Iterator>(_FwdIt1,_FwdIt2)' being compiled
        with
        [
            Iterator=llvm::PreRARematStage::ScoredRemat *,
            _FwdIt1=llvm::PreRARematStage::ScoredRemat *,
            _FwdIt2=llvm::PreRARematStage::ScoredRemat *
        ]
ninja: build stopped: subcommand failed.

@lucas-rami
Copy link
Contributor Author

@RKSimon Looking into this. Should I revert in the meantime?

@jplehr
Copy link
Contributor

jplehr commented Jan 13, 2026

We see failures on our HIP bot after this one landed. Reverting would give us feedback on whether it is indeed the culprit. (I suspect it is)
Edit: https://lab.llvm.org/buildbot/#/builders/123/builds/33477

@lucas-rami
Copy link
Contributor Author

#175755 should fix this.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 13, 2026

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-asan running on sanitizer-buildbot8 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/16530

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93974 tests, 72 workers --
Testing:  0.. 10.. 20.. 30
FAIL: LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir (31971 of 93974)
******************** TEST 'LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
142.54s: Clang :: Preprocessor/riscv-target-features.c
136.42s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
132.18s: Clang :: Driver/arm-cortex-cpus-1.c
131.32s: Clang :: Driver/arm-cortex-cpus-2.c
128.29s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
124.73s: Clang :: OpenMP/target_update_codegen.cpp
111.14s: Clang :: Preprocessor/aarch64-target-features.c
111.13s: Clang :: Preprocessor/arm-target-features.c
96.80s: LLVM :: CodeGen/RISCV/attributes.ll
92.42s: Clang :: Preprocessor/predefined-arch-macros.c
90.90s: Clang :: Driver/fsanitize.c
89.52s: Clang :: Driver/linux-ld.c
85.44s: LLVM :: tools/llvm-exegesis/AArch64/all-opcodes.test
84.65s: Clang :: Driver/clang_f_opts.c
78.11s: Clang :: Driver/cl-options.c
71.68s: Clang :: Analysis/a_flaky_crash.cpp
70.14s: Clang :: Driver/range-warnings.c
69.95s: Clang :: Driver/x86-target-features.c
68.62s: Clang :: Preprocessor/init.c
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:561: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93974 tests, 72 workers --
Testing:  0.. 10.. 20.. 30
FAIL: LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir (31971 of 93974)
******************** TEST 'LLVM :: CodeGen/AMDGPU/high-RP-reschedule.mir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -mtriple=amdgcn -mcpu=gfx908 -verify-misched -run-pass=machine-scheduler -debug-only=machine-scheduler -o - /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck -check-prefix=GCN /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir
# note: command had no output on stdout or stderr

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
142.54s: Clang :: Preprocessor/riscv-target-features.c
136.42s: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
132.18s: Clang :: Driver/arm-cortex-cpus-1.c
131.32s: Clang :: Driver/arm-cortex-cpus-2.c
128.29s: Clang :: OpenMP/target_defaultmap_codegen_01.cpp
124.73s: Clang :: OpenMP/target_update_codegen.cpp
111.14s: Clang :: Preprocessor/aarch64-target-features.c
111.13s: Clang :: Preprocessor/arm-target-features.c
96.80s: LLVM :: CodeGen/RISCV/attributes.ll
92.42s: Clang :: Preprocessor/predefined-arch-macros.c
90.90s: Clang :: Driver/fsanitize.c
89.52s: Clang :: Driver/linux-ld.c
85.44s: LLVM :: tools/llvm-exegesis/AArch64/all-opcodes.test
84.65s: Clang :: Driver/clang_f_opts.c
78.11s: Clang :: Driver/cl-options.c
71.68s: Clang :: Analysis/a_flaky_crash.cpp
70.14s: Clang :: Driver/range-warnings.c
69.95s: Clang :: Driver/x86-target-features.c
68.62s: Clang :: Preprocessor/init.c

lucas-rami added a commit that referenced this pull request Jan 13, 2026
…#175755)

On some configurations sorting `ScoredRemat` objects which contains
const members causes a compile failure due to impossibility of
swapping/moving objects. The problem was introduced in #175050.

This removes const from those fields to address the issue. The design
will soon change anyway to not rely on sorting objects of this type, and
consts were only here for semantic clarity.
thurstond added a commit to thurstond/llvm-project that referenced this pull request Jan 13, 2026
A buildbot was failing with a use-after-poison (https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after llvm#175050:
```
==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448
READ of size 8 at 0xe26e74e12368 thread T0
    #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35
    #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6
    llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57
...
```

This is because it was printing a deleted instruction. This patch fixes this by reversing the order.
thurstond added a commit that referenced this pull request Jan 13, 2026
…175807)

A buildbot was failing with a use-after-poison
(https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after
#175050:
```
==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448
READ of size 8 at 0xe26e74e12368 thread T0
    #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35
    #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6
    #2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57
...
```

This is because it was printing an instruction that had already been
deleted. This patch fixes this by reversing the order.
lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 13, 2026
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 13, 2026
…e deleting (#175807)

A buildbot was failing with a use-after-poison
(https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after
llvm/llvm-project#175050:
```
==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448
READ of size 8 at 0xe26e74e12368 thread T0
    #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35
    #1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6
    #2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57
...
```

This is because it was printing an instruction that had already been
deleted. This patch fixes this by reversing the order.
lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 13, 2026
jplehr pushed a commit that referenced this pull request Jan 13, 2026
…75050)" (#175813)

This reverts 8ab7937 and
f21e359 which are causing a HIP failure
in a Blender test.
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
This is a significant refactoring of the scheduler's rematerialization
stage meant to improve rematerialization capabilities and lay strong
foundations for future improvements.

As before, the stage identifies scheduling regions in which RP must be
reduced (so-called "target regions"), then rematerializes registers to
try and achieve the desired reduction. All regions affected by
rematerializations are re-scheduled, and, if the MIR is deemed worse
than before, rematerializations are rolled back to leave the MIR in its
pre-stage state.

The core contribution is a scoring system to estimate the benefit of
each rematerialization candidate. This score favors rematerializing
candidates which, in order, would

1. (if the function is spilling) reduce RP in highest-frequency target
regions,
2. be rematerialized to lowest-frequency target regions, and
3. reduce RP in the highest number of target regions.

All rematerialization opportunities are initially scored and
rematerialized in decreasing score order until RP objectives are met or
pre-computed scores diverge from reality; in the latter case remaining
candidates are re-scored and the process repeats. New tests in
`machine-scheduler-rematerialization-scoring.mir` showcase how the
scoring system dictates which rematerialization are the most beneficial
and therefore performed first

A minor contribution included in this PR following previous feedback is
that rollback now happens in-place i.e., without having to re-create the
rematerialized MI. This leaves original slot indices and registers
untouched. We achieve this by temporarily switching the opcode of
rollback-able instructions to a debug opcode during re-scheduling so
that they are ignored.
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…llvm#175755)

On some configurations sorting `ScoredRemat` objects which contains
const members causes a compile failure due to impossibility of
swapping/moving objects. The problem was introduced in llvm#175050.

This removes const from those fields to address the issue. The design
will soon change anyway to not rely on sorting objects of this type, and
consts were only here for semantic clarity.
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…lvm#175807)

A buildbot was failing with a use-after-poison
(https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after
llvm#175050:
```
==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448
READ of size 8 at 0xe26e74e12368 thread T0
    #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35
    llvm#1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6
    llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57
...
```

This is because it was printing an instruction that had already been
deleted. This patch fixes this by reversing the order.
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…vm#175050)" (llvm#175813)

This reverts 8ab7937 and
f21e359 which are causing a HIP failure
in a Blender test.
lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 21, 2026
…lvm#175050)"

This re-applies commit f21e359 along
with the compile fix failure introduced in
8ab7937 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back
   rematerializations, as otherwise def-use chains may be broken.
   The original patch did not guatantee that, but previous refactoring
   of the rollback/revert logic for the rematerialization stage now
   ensures that reverts always precede rollbacks.
lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 21, 2026
…lvm#175050)"

This re-applies commit f21e359 along
with the compile fix failure introduced in
8ab7937 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back
   rematerializations, as otherwise def-use chains may be broken.
   The original patch did not guatantee that, but previous refactoring
   of the rollback/revert logic for the rematerialization stage now
   ensures that reverts always precede rollbacks.
lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 21, 2026
…lvm#175050)"

This re-applies commit f21e359 along
with the compile fix failure introduced in
8ab7937 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back
   rematerializations, as otherwise def-use chains may be broken.
   The original patch did not guatantee that, but previous refactoring
   of the rollback/revert logic for the rematerialization stage now
   ensures that reverts always precede rollbacks.
lucas-rami added a commit to lucas-rami/llvm-project that referenced this pull request Jan 21, 2026
…lvm#175050)"

This re-applies commit f21e359 along
with the compile fix failure introduced in
8ab7937 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back
   rematerializations, as otherwise def-use chains may be broken.
   The original patch did not guatantee that, but previous refactoring
   of the rollback/revert logic for the rematerialization stage now
   ensures that reverts always precede rollbacks.

stack-info: PR: #8, branch: users/lucas-rami/stack/rematerialization-rollback-logi/4
lucas-rami added a commit that referenced this pull request Jan 21, 2026
…175050)"

This re-applies commit f21e359 along
with the compile fix failure introduced in
8ab7937 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back
   rematerializations, as otherwise def-use chains may be broken.
   The original patch did not guatantee that, but previous refactoring
   of the rollback/revert logic for the rematerialization stage now
   ensures that reverts always precede rollbacks.
lucas-rami added a commit that referenced this pull request Jan 21, 2026
…175050)"

This re-applies commit f21e359 along
with the compile fix failure introduced in
8ab7937 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back
   rematerializations, as otherwise def-use chains may be broken.
   The original patch did not guatantee that, but previous refactoring
   of the rollback/revert logic for the rematerialization stage now
   ensures that reverts always precede rollbacks.
lucas-rami added a commit that referenced this pull request Jan 21, 2026
…175050)"

This re-applies commit f21e359 along
with the compile fix failure introduced in
8ab7937 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back
   rematerializations, as otherwise def-use chains may be broken.
   The original patch did not guatantee that, but previous refactoring
   of the rollback/revert logic for the rematerialization stage now
   ensures that reverts always precede rollbacks.
BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026
…lvm#175807)

A buildbot was failing with a use-after-poison
(https://lab.llvm.org/buildbot/#/builders/24/builds/16530) after
llvm#175050:
```
==llc==1532559==ERROR: AddressSanitizer: use-after-poison on address 0xe26e74e12368 at pc 0xb36d41bd74dc bp 0xffffed72a450 sp 0xffffed72a448
READ of size 8 at 0xe26e74e12368 thread T0
    #0 0xb36d41bd74d8 in llvm::MachineInstr::print(llvm::raw_ostream&, bool, bool, bool, bool, llvm::TargetInstrInfo const*) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:1796:35
    llvm#1 0xb36d3e221b08 in operator<< /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:2150:6
    llvm#2 0xb36d3e221b08 in llvm::PreRARematStage::rollback(llvm::PreRARematStage::RollbackInfo const&, llvm::BitVector&) const /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:2363:57
...
```

This is because it was printing an instruction that had already been
deleted. This patch fixes this by reversing the order.
BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026
…vm#175050)" (llvm#175813)

This reverts 8ab7937 and
f21e359 which are causing a HIP failure
in a Blender test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants