-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove mono specific SpanHelpers #79215
Remove mono specific SpanHelpers #79215
Conversation
Tagging subscribers to this area: @BrzVlad Issue Details#73768 did various changes that hurt span performance on mono. The change was reverted afterwards on mono by restoring old code in #75917. This PR removes mono specific code and solves performance problems via small tweaks within non vectorized code inside SpanHelpers, as well as by adding a couple of optimizations to mono interpreter.
|
91fa0cf
to
7ce055f
Compare
@vargaz Could you take another look at this ? |
7ce055f
to
374442d
Compare
…ssions (dotnet#75917)" This reverts commit 254844a.
This would replace code like ``` load b.neq add ret load b.neq add ret load .... ``` with ``` load b.eq load b.eq load ... ``` This makes the code more compact in the hot loop, reduces overall code size and thus improves performance. This pattern is widely used and it was also used before with Span lookups.
Before we were marking bblocks as dead if they had their in_count 0. This is not enough however, since it doesn't account for loops. We now do a full traversal of the bblock graph to detect unreachable bblocks.
Consider for example the following pattern used commonly with conditional branches: ``` br.s [nil <- nil], BB0 ... ceq0.i4 [32 <- 40], br.s [nil <- nil], BB1 BB0: ldc.i4.0 [32 <- nil], BB1: brfalse.i4.s [nil <- 32], BB_EXIT BB2: ldstr [56 <- nil], 2 ``` This commit reorders this code to look like: ``` br.s [nil <- nil], BB0 ... ceq0.i4 [32 <- 40], brfalse.i4.s [nil <- 32], BB_EXIT br.s [nil <- nil], BB2 BB0 ldc.i4.0 [32 <- nil], BB1: brfalse.i4.s [nil <- 32], BB_EXIT BB2: ldstr [56 <- nil], 2 ``` This means we will have duplicated brfalse instructions, but every basic block reaching the conditional branch will have information about the condition. For example ceq0.i4 + brfalse is equivalent to brtrue, ldc.i4.0 + brfalse is equivalent to unconditional branch. After other future optimizations applied on the bblocks graph, like removal, merging and propagation of target, the resulting code in this example would look like: ``` br.s [nil <- nil], BB_EXIT ... brtrue.i4.s [nil <- 40], BB_EXIT BB2: ldstr [56 <- nil], 2 ``` Which is a great simplification over the original code.
… targets Even though they can be become unreachable in the current method, they can still be called when the unoptimized method gets tiered at this point. Add assert to prevent such issues in the future.
If we are unlikely to gain anything from propagating the condition (if we don't have information about any of the condition operand vars), simply avoid the optimization.
If we store in a var and this var is not used and redefined by the end of the basic block, then we can clear the original store.
We detect if a var's value never escapes the definition of a bblock. We mark such vars and clear unused definitions of that var from other bblocks.
If a bblock contains only an unconditional br, then all bblocks branching into it can just call the target directly instead.
This pattern is used in low level unsafe code when using (var + ct1) as an index into an array, where ct2 is the sizeof of array element. Also fix diplay of two shorts when dumping instructions.
These new instructions can apply addition and multiplication with constant to the offset var.
374442d
to
32b8723
Compare
Nice! I see interpreter improvements in the SpanHelper bytes measurements. The chars are mixed, improvements in firefox, some slower in chrome. Overall it is great improvement. It also improved Index of chars with the aot/simd. I think that's because the current code uses /cc @lewing |
@MihaZupan I don't think there are any problems between these 2 PRs. As long as your code is guarded everywhere with |
#73768 did various changes that hurt span performance on mono. The change was reverted afterwards on mono by restoring old code in #75917.
This PR removes mono specific code and solves performance problems via small tweaks within non vectorized code inside SpanHelpers, as well as by adding a couple of optimizations to mono interpreter.