-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: don't overlook strong nearby preference in LSRA #16028
Conversation
When allocating an interval, LSRA will look at related intervals for preference hints. LSRA may look past a related interval I1 to its related interval I2 if I1 looks like a simple copy. But if I1 has been previously allocated then it seems we are often better served using I1's preferences instead of using I2's. Addresses some issues seen in #11390.
@CarolEidt realize this is kind of from left field, but the diffs look surprisingly and consistently good.
There are some notes over in #11390 if you wondered how I ended up with this. Happy to post good and bad examples if it helps. |
Doesn't this also help with issues like #15782? It would be better if copies are eliminated earlier but after playing a bit with copy propagation I'm skeptical that the existing copy propagation phase can be significantly improved in this regard. Its reliance on heuristics is a bit of a problem. |
Sadly, no. I'll have to trace through what LSRA is thinking in that example and figure out why it's different. |
For the example in #15782 -- don't fully understand it yet but it's a result of a different part of the heuristic. After the G_M5638_IG02:
4885D2 test rdx, rdx
7507 jne SHORT G_M5638_IG03
33C0 xor rax, rax
4533C0 xor r8d, r8d
EB08 jmp SHORT G_M5638_IG04
G_M5638_IG03:
488D4210 lea rax, bword ptr [rdx+16]
448B4208 mov r8d, dword ptr [rdx+8]
G_M5638_IG04:
488901 mov bword ptr [rcx], rax
44894108 mov dword ptr [rcx+8], r8d
488BC1 mov rax, rcx
G_M5638_IG05:
C3 ret |
@dotnet-bot test CentOS7.1 x64 Release Innerloop Build and Test |
This LGTM, but it would be good to see some of the good & bad examples. |
Examples below. Note in some good cases (eg I didn't look at all the regressions but the last one (extra movs) seems pretty rare. +; *** GOOD ***
; Microsoft.CodeAnalysis.BitVector:AllSet(int):struct
G_M13204_IG03:
lea eax, [rsi-1]
- mov ebx, eax
- sar ebx, 5
- mov eax, ebx
+ sar eax, 5
G_M13204_IG04:
mov ebx, eax
test ebx, ebx
+; *** GOOD ***
; System.CurrentSystemTimeZone:.ctor():this
G_M13204_IG04:
call CachedData:CreateLocal():ref:this
mov rdi, rax
G_M12143_IG03:
- mov rax, rdi
- mov rdi, rax
mov rdx, qword ptr [rdi+56]
+; *** GOOD ***
; System.Memory`1[__Canon][System.__Canon]:Retain(bool):struct:this
G_M12143_IG03:
or rax, 1
mov rcx, rax
mov rax, rcx
- mov rcx, rax
- mov rax, rcx
lea rcx, bword ptr [rsi+16]
mov edx, dword ptr [rdi+8]
+; *** GOOD ***
; Assembly listing for method System.Memory`1[__Canon][System.__Canon]:Equals(ref):bool:this
G_M39589_IG15:
lea rax, bword ptr [rdi+8]
- mov rdx, gword ptr [rax]
- mov r14, rdx
- mov edx, dword ptr [rax+8]
- mov r15d, edx
- mov eax, dword ptr [rax+12]
- mov ecx, eax
+ mov r14, gword ptr [rax]
+ mov r15d, dword ptr [rax+8]
+ mov ecx, dword ptr [rax+12]
G_M39589_IG16:
- mov rdx, r14
- mov eax, ecx
test ebp, ebp
je SHORT G_M39589_IG19
- cmp gword ptr [rbx], rdx
+ cmp gword ptr [rbx], r14
jne SHORT G_M39589_IG17
- ; *** BAD *** Use of R14 incurs prolog/epilog cost
; Assembly listing for method System.RuntimeType:GetPropertyImpl(ref,int,ref,ref,ref,ref):ref:this
-; Lcl frame size = 72
+; Lcl frame size = 80
G_M462_IG01:
+ push r14
push rdi
...
G_M462_IG06:
- mov rcx, gword ptr [rsp+30H]
+ mov rcx, gword ptr [rsp+38H]
cmp dword ptr [rcx+8], 0
jbe G_M462_IG19
- mov rcx, gword ptr [rsp+30H]
- mov rdi, gword ptr [rcx+16]
- mov rcx, rdi
+ mov rcx, gword ptr [rsp+38H]
+ mov rcx, gword ptr [rcx+16]
G_M462_IG07:
- mov rdi, rcx
+ mov r14, rcx
test rbx, rbx
je SHORT G_M462_IG10
- mov rcx, rdi
- mov rax, qword ptr [rdi]
+ mov rcx, r14
+ mov rax, qword ptr [r14]
...
-
G_M462_IG09:
- add rsp, 72
+ add rsp, 80
pop rbx
pop rbp
pop rsi
pop rdi
- ret
+ pop r14
+ ret
- ; *** BAD *** Extra movs
; Assembly listing for method System.Linq.Expressions.Interpreter.NotEqualInstruction:Create(ref,bool):ref
G_M18433_IG05:
- mov rsi, rbp
+ mov rax, rbp
+ mov rsi, rax
jmp G_M18433_IG53
G_M18433_IG06:
call [CORINFO_HELP_READYTORUN_STATIC_BASE]
mov rbx, rax
- mov rbp, gword ptr [rbx+0B88H]
- test rbp, rbp
+ mov rsi, gword ptr [rbx+0B88H]
+ test rsi, rsi
jne SHORT G_M18433_IG07
call [CORINFO_HELP_READYTORUN_NEW]
- mov rbp, rax
+ mov rsi, rax
lea rcx, bword ptr [rbx+0B88H]
- mov rdx, rbp
+ mov rdx, rsi
call [CORINFO_HELP_ASSIGN_REF]
G_M18433_IG07:
- mov rsi, rbp
+ mov rax, rsi
+ mov rsi, rax
jmp G_M18433_IG53 |
@AndyAyersMS - thanks for the examples. In the past I've mostly dismissed the ones that cost an extra callee-reg save and restore, as I think that is a semi-orthogonal issue (and, if the reduction in instrs is in a loop, could be a CQ win even if a size increase). For the other, that one example looks like it might be a resolution move, which may also be somewhat orthogonal. The preferencing heuristics need some focused work, e.g. see #11463, but incremental improvement is great! |
BTW it sure would be nice if jit-diffs could automatically produce markdown-friendly diff output.What I did above required a bunch of manual collation. Opened dotnet/jitutils#125 in case anybody wants a fun side project. |
When allocating an interval, LSRA will look at related intervals
for preference hints.
LSRA may look past a related interval I1 to its related interval
I2 if I1 looks like a simple copy. But if I1 has been previously
allocated then it seems we are often better served using I1's
preferences instead of using I2's.
Addresses some issues seen in #11390.