You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
JIT: Add a disabled-by-default implementation of strength reduction (#104243)
This adds a disabled-by-default implementation of strength reduction. At
this point the implementation should be correct, however it is currently
both a size and perfscore regression when it is enabled. More work will
be needed to get the heuristics right and to make it kick in for more
cases.
Strength reduction replaces "expensive" operations computed on every
loop iteration with cheaper ones by creating more induction
variables. In C# terms it effectively transforms something like
```
private struct S
{
public int A, B, C;
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static float Sum(S[] ss)
{
int sum = 0;
foreach (S v in ss)
{
sum += v.A;
sum += v.B;
sum += v.C;
}
return sum;
}
```
into an equivalent
```
int sum = 0;
ref S curS = ref ss[0];
for (int i = 0; i < ss.Length; i++)
{
sum += curS.A;
sum += curS.B;
sum += curS.C;
curS = ref Unsafe.Add(ref curS, 1);
}
```
With strength reduction enabled this PR thus changes codegen of the
standard `foreach` version above from
```asm
G_M63518_IG03: ;; offset=0x0011
lea r10, [rdx+2*rdx]
lea r10, bword ptr [rcx+4*r10+0x10]
mov r9d, dword ptr [r10]
mov r11d, dword ptr [r10+0x04]
mov r10d, dword ptr [r10+0x08]
add eax, r9d
add eax, r11d
add eax, r10d
inc edx
cmp r8d, edx
jg SHORT G_M63518_IG03
;; size=36 bbWeight=4 PerfScore 39.00
```
to
```asm
G_M63518_IG04: ;; offset=0x0011
mov r8, rcx
mov r10d, dword ptr [r8]
mov r9d, dword ptr [r8+0x04]
mov r8d, dword ptr [r8+0x08]
add eax, r10d
add eax, r9d
add eax, r8d
add rcx, 12
dec edx
jne SHORT G_M63518_IG04
;; size=31 bbWeight=4 PerfScore 34.00
```
on x64. Further improvements can be made to enable better address modes.
The current heuristics try to ensure that we do not actually end up with
more primary induction variables. The strength reduction only kicks in
when it thinks that all uses of the primary IV can be replaced by the
new primary IV. However, uses inside loop exit tests are allowed to stay
unreplaced by the assumption that the downwards loop transformation
will be able to get rid of them.
Getting the cases around overflow right turned out to be hard and
required reasoning about trip counts that was added in a previous PR.
Generally, the issue is that we need to prove that transforming a zero
extension of an add recurrence to a 64-bit add recurrence is legal. For
example, for a simple case of
```
for (int i = 0; i < arr.Length; i++)
sum += arr[i];
```
the IV analysis is eventually going to end up wanting to show that
`zext<64>(int32 <L, 0, 1>) => int64 <L, 0, 1>` is a correct
transformation. This requires showing that the add recurrence does not
step past 2^32-1, which requires the bound on the trip count that we can
now compute. The reasoning done for both the trip count and around the
overflow is still very limited but can be improved incrementally.
The implementation works by considering every primary IV of the loop in
turn, and by initializing 'cursors' pointing to each use of the primary
IV. It then tries to repeatedly advance these cursors to the parent of
the uses while it results in a new set of cursors that still compute the
same (now derived) IV. If it manages to do this once, then replacing the
cursors by a new primary IV should result in the old primary IV no
longer being necessary, while having replaced some operations by cheaper
ones.
0 commit comments