Skip to content

Commit

Permalink
[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize
Browse files Browse the repository at this point in the history
The znver3/znver4 scheduler modules are outliers, specifying very
large LoopMicroOpBufferSizes at 512, while typical values for
other subtargets are on the order of ~50. Even if this information
is micro-architecturally correct (*), this does not mean that we
want to runtime unroll all loops to a size that completely fills
the loop buffer. Unless this is the single hot loop in the entire
application, the massive code size increase will bust the micro-op
and instruction caches.

Protect against this by clamping to the default PartialThreshold
of 150, which is the same as the default full-unroll threshold
and half the aggressive full-unroll threshold. Allowing more
partial unrolling than full unrolling is certainly non-sensical.

(*) I strongly doubt that this is actually correct -- I believe
this may derive from an incorrect reading of Agner Fog's
micro-architecture guide. The number 4096 that was originally
used here is the size of the general micro-op cache, not that of
a loop buffer. A separate loop buffer is not listed for the Zen
microarchitecture. Comparing this to the listing for Skylake, it
has a 1536 micro-op buffer, but only a 64 micro-op loopback buffer,
with a note that it's rarely fully utilized. Our scheduling model
specifies LoopMicroOpBufferSize of 50 in that case.
  • Loading branch information
nikic committed Sep 28, 2023
1 parent 4d5525e commit 3e0d138
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 742 deletions.
8 changes: 7 additions & 1 deletion llvm/include/llvm/CodeGen/BasicTTIImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -575,7 +575,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
if (PartialUnrollingThreshold.getNumOccurrences() > 0)
MaxOps = PartialUnrollingThreshold;
else if (ST->getSchedModel().LoopMicroOpBufferSize > 0)
MaxOps = ST->getSchedModel().LoopMicroOpBufferSize;
// Upper bound by the default PartialThreshold, which is the same as
// the default full-unroll Threshold. Even if the loop micro-op buffer
// is very large, this does not mean that we want to unroll all loops
// to that length, as it would increase code size beyond the limits of
// what unrolling normally allows.
MaxOps = std::min(ST->getSchedModel().LoopMicroOpBufferSize,
UP.PartialThreshold);
else
return;

Expand Down
Loading

0 comments on commit 3e0d138

Please sign in to comment.