-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: extend redundant branch opt to catch partially redundant cases #49713
Conversation
If the current relop has PHI inputs, see if any of those inputs would produce the same relop VN as a dominating compare; if so the current relop is partially redundant, and we may be able to optimize some of the paths through the relop block via jump threading. Addresses cases like the one seen in dotnet#48115, though that particular case is not optimized as the current relop block has a side effect.
@EgorBo PTAL SPMI sees ~300 cases across all the collections. Regressions are small.
Sample diff. In the Before picture, ;;; Assembly listing for method System.IO.Pipelines.Pipe:GetReadAsyncStatus():int:this
;;; BEFORE
G_M35282_IG05: ; gcVars=0000000000000000 {}, gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, gcvars, byref, isz
; gcrRegs +[rcx]
add rcx, 216
; gcrRegs -[rcx]
; byrRegs +[rcx]
mov rax, gword ptr [rcx]
; gcrRegs +[rax]
test rax, rax
je SHORT G_M35282_IG06
mov rdx, 0xD1FFAB1E
cmp qword ptr [rax], rdx
je SHORT G_M35282_IG06
xor rax, rax
;; bbWeight=0 PerfScore 0.00
G_M35282_IG06: ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref, isz
; byrRegs -[rcx]
test rax, rax
je SHORT G_M35282_IG08
mov eax, 2
; gcrRegs -[rax]
;; bbWeight=0 PerfScore 0.00
G_M35282_IG07: ; , epilog, nogc, extend
ret
;; bbWeight=0 PerfScore 0.00
G_M35282_IG08: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
mov eax, 1
;; bbWeight=0 PerfScore 0.00
G_M35282_IG09: ; , epilog, nogc, extend
ret
;; bbWeight=0 PerfScore 0.00
;;; AFTER
G_M35282_IG05: ; gcVars=0000000000000000 {}, gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, gcvars, byref, isz
; gcrRegs +[rcx]
add rcx, 216
; gcrRegs -[rcx]
; byrRegs +[rcx]
mov rax, gword ptr [rcx]
; gcrRegs +[rax]
test rax, rax
je SHORT G_M35282_IG07
mov eax, 2
; gcrRegs -[rax]
;; bbWeight=0 PerfScore 0.00
G_M35282_IG06: ; , epilog, nogc, extend
ret
;; bbWeight=0 PerfScore 0.00
G_M35282_IG07: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
; byrRegs -[rcx]
mov eax, 1
;; bbWeight=0 PerfScore 0.00
G_M35282_IG08: ; , epilog, nogc, extend
ret
;; bbWeight=0 PerfScore 0.00 Jit dump for the above
Still need to look at TP here as in the worst case (large number of PHI inputs, deep dominance tree) we could be doing a lot of searching. |
Here is some TP data (from spmi's -v v over the aspnnet collection, release builds). Doesn't appear to be anything problematic in the methods processed there (~39000 or so).
|
Test failures indicate there's a flaw in the logic here... not quite sure what it is just yet. Looks like I'll have plenty of cases to choose from when debugging, though. Suspect perhaps that we can only jump thread those preds that bring in the same VN as the one we matched, and right now we're optimizing all the preds, and this is too aggressive. |
A minimal repro: using System;
internal class SmallLoop1
{
public static void Main()
{
int j = 2;
for (int i = 1; i < 5; i++)
j++;
if (j != 6)
Console.WriteLine("FAILED");
Console.ReadKey();
}
}
|
Thanks, seems like there are actually several interesting aspects:
So the upshot is that if we substitute in a phi we can either get:
and we need to account for those paths and pass them down to the jump threader in some fashion; this will feed into the true/false/ambiguous preds detection done there. |
Need to rethink this. |
If the current relop has PHI inputs, see if any of those inputs would
produce the same relop VN as a dominating compare; if so the current
relop is partially redundant, and we may be able to optimize some of
the paths through the relop block via jump threading.
Addresses cases like the one seen in #48115, though that particular
case is not optimized as the current relop block has a side effect.