-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: Optimize data flow used in assertion prop/CSE #94701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Optimize data flow used in assertion prop/CSE #94701
Conversation
The data flow used by assertion prop and CSE utilize "all-succs" which includes handler successors. However, in reality neither analysis needs full treatment of handlers; we do not CSE variables that are live-in to handlers, and assertion prop never kills assertions that would need to be propagated to handlers. Additionally, the data flow framework used conflates end-of-block propagation of facts with reachability of the handler. If no facts changed at the ends of the preds of the handler, then the handler is not visited. This is despite the fact that the handler in the general case is also reachable with facts from the beginning of every enclosed basic block (or more generally with facts that were invalidated in the middle of enclosed basic blocks). This ends up working out today only because of the fake successor edges we have from preds of the try to the handler, in addition to the restrictions on the data flows described above. Since I'm removing these fake edges, we need a more explicit solution. The change here has such a solution, by making it more explicit what exactly is needed in the data flow for CSE and AP here: they only need to consider regular successors, with the added catch that they need to consider reachability of handlers once we see the corresponding 'try' being reachable.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThe data flow used by assertion prop and CSE utilize "all-succs" which includes handler successors. However, in reality neither analysis needs full treatment of handlers; we do not CSE variables that are live-in to handlers, and assertion prop never kills assertions that would need to be propagated to handlers. Additionally, the data flow framework used conflates end-of-block propagation of facts with reachability of the handler. If no facts changed at the ends of the preds of the handler, then the handler is not visited. This is despite the fact that the handler in the general case is also reachable with facts from the beginning of every enclosed basic block (or more generally with facts that were invalidated in the middle of enclosed basic blocks). This ends up working out today only because of the fake successor edges we have from preds of the try to the handler, in addition to the restrictions on the data flows described above. Since I'm removing these fake edges, we need a more explicit solution. The change here has such a solution, by making it more explicit what exactly is needed in the data flow for CSE and AP here: they only need to consider regular successors, with the added catch that they need to consider reachability of handlers once we see the corresponding 'try' being reachable.
|
Compiler::optDumpAssertionIndices("out -> ", lastTryBlock->bbAssertionOut, "\n"); | ||
} | ||
BitVecOps::IntersectionD(apTraits, block->bbAssertionIn, firstTryBlock->bbAssertionIn); | ||
BitVecOps::IntersectionD(apTraits, block->bbAssertionIn, lastTryBlock->bbAssertionOut); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was pretty meaningless -- intersecting with the lexically last block of a try region shouldn't be sufficient to guarantee any form of correctness here.
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 3 pipeline(s). |
There are some x86 diffs. I checked in detail, and they happen from the BB24 [0038] 1 1 [???..???) internal
BB01 [0000] 1 2 BB24 1 [000..00D)-> BB11 ( cond ) T2 try { keep i
BB02 [0001] 1 2 BB01 1 [00D..00E)-> BB04 (always) T2 i
BB03 [0027] 0 2 0 [00D..00E) (throw ) T2 i rare hascall gcsafe
BB04 [0028] 1 2 BB02 1 [00D..01C)-> BB07 ( cond ) T2 i hascall
BB05 [0002] 1 2 BB04 1 [01C..029)-> BB22 (callf ) T2 i hascall gcsafe nullcheck
BB06 [0022] 1 2 BB22 1 [???..???)-> BB19 (ALWAYS) T2 i internal gcsafe KEEP
BB07 [0004] 1 0 BB04 1 [029..038)-> BB15 ( cond ) T0 try { keep i hascall gcsafe
BB08 [0005] 1 0 BB07 1 [038..045)-> BB22 (callf ) T0 i hascall gcsafe nullcheck
BB09 [0020] 1 0 BB22 1 [???..???)-> BB20 (ALWAYS) T0 } i internal gcsafe KEEP
BB10 [0007] 1 2 0 0 [047..04A)-> BB15 (always) T2 H0 catch { } keep i rare hascall
BB11 [0009] 1 1 BB01 1 [04A..058)-> BB15 ( cond ) T1 try { keep i hascall gcsafe
BB12 [0010] 1 1 BB11 1 [058..05A)-> BB22 (callf ) T1 i gcsafe
BB13 [0018] 1 1 BB22 1 [???..???)-> BB21 (ALWAYS) T1 } i internal gcsafe KEEP
BB14 [0012] 1 2 1 0 [05C..05F) T2 H1 catch { } keep i rare hascall
BB15 [0013] 4 2 BB07,BB10,BB11,BB14 1 [05F..060)-> BB17 ( cond ) T2 i
BB16 [0034] 1 2 BB15 0 [05F..060) (throw ) T2 i rare hascall gcsafe
BB17 [0035] 1 2 BB15 1 [05F..067) T2 } i hascall
BB18 [0037] 1 BB17 1 [067..06E)-> BB23 (always) keep i hascall gcsafe cfb cfe
BB19 [0023] 1 BB06 1 [???..???)-> BB23 (ALWAYS) i internal gcsafe KEEP
BB20 [0021] 1 BB09 1 [???..???)-> BB23 (ALWAYS) i internal gcsafe KEEP
BB21 [0019] 1 BB13 1 [???..???)-> BB23 (ALWAYS) i internal gcsafe KEEP
BB22 [0014] 4 2 BB05,BB08,BB12 1 [067..06E)-> BB06,BB09,BB13 (finret) H2 finally { } keep i hascall gcsafe
BB23 [0015] 4 BB18,BB19,BB20,BB21 1 [06E..06F) (return) i gcsafe In this flowgraph, there is a way to enter |
cc @dotnet/jit-contrib PTAL @AndyAyersMS Diffs -- only a few on x86, that I investigated above. Other than that just some nice throughput improvements. |
src/coreclr/jit/assertionprop.cpp
Outdated
// kill assertions in global AP. Note that if we did kill assertions we | ||
// would need to be more careful about our mid-block handling when in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if we did kill assertions...
In my opinion, it is confusing to talk about the possibility of "killing assertions" in global AP. How would that work? It seems impossible almost by definition of the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean because we use SSA/VN? I mean if facts could become false in other ways than due to control flow joins. We would need to propagate that to the handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean because we use SSA/VN?
Right. VNs are "immutable", so if a a fact about a VN dominates a subgraph, it will always be true on the entirety of that subgraph. I understand the comment, I would just request to reword to not mention AP along the lines of "no clients need the concept of 'kill's".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can reword it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the wording looks better now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you!
src/coreclr/jit/dataflow.h
Outdated
{ | ||
FlowEdge* preds = m_pCompiler->BlockPredsWithEH(block); | ||
for (FlowEdge* pred = preds; pred; pred = pred->getNextPredEdge()) | ||
for (FlowEdge* pred = block->bbPreds; pred; pred = pred->getNextPredEdge()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can this loop use the PredEdges
iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let me simplify this tomorrow.
}); | ||
} | ||
|
||
if (m_pCompiler->bbIsTryBeg(block)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not need to consider the exceptional successors of filters here?
(Not saying we should)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the second pass EH successors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that because the filter and its enclosed handlers are all dominated by a same try entry there is no need to consider the filter as a pred of those handlers.
Ping @AndyAyersMS... when you have time. |
The data flow used by assertion prop and CSE utilize "all-succs" which includes handler successors. However, in reality neither analysis needs full treatment of handlers; we do not CSE variables that are live-in to handlers, and assertion prop never kills assertions that would need to be propagated to handlers.
Additionally, the data flow framework used conflates end-of-block propagation of facts with reachability of the handler. If no facts changed at the ends of the preds of the handler, then the handler is not visited. This is despite the fact that the handler in the general case is also reachable with facts from the beginning of every enclosed basic block (or more generally with facts that were invalidated in the middle of enclosed basic blocks).
This ends up working out today only because of the fake successor edges we have from preds of the try to the handler, in addition to the restrictions on the data flows described above. Since I'm removing these fake edges, we need a more explicit solution. The change here has such a solution, by making it more explicit what exactly is needed in the data flow for CSE and AP here: they only need to consider regular successors, with the added catch that they need to consider reachability of handlers once we see the corresponding 'try' being reachable.
Prerequisite to #94672.
No diffs are expected.