-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Check for potential store-to-load forwarding before reordering ldr -> ldp #105695
JIT: Check for potential store-to-load forwarding before reordering ldr -> ldp #105695
Conversation
…dr -> ldp Very targeted fix for dotnet#93401 and dotnet#101437: before reordering two indirections, check if there is a potential store in the same loop that looks like it could end up being a candidate for store-to-load forwarding into one of those indirections. Some hardware does not handle store-to-load forwarding with the same fidelity when `stp`/`ldp` is involved compared to multiple `str`/`ldr`. If we detect the situation then avoid doing the reordering.
@EgorBot -arm64 --disasm --envvars "DOTNET_JitDisasm:List" // Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using BenchmarkDotNet.Attributes;
namespace System.Collections
{
public class AddGivenSize
{
private int[] _uniqueValues;
public int Size = 512;
[GlobalSetup]
public void Setup() => _uniqueValues = Enumerable.Range(0, Size).ToArray();
[Benchmark]
public List<int> List()
{
var collection = new List<int>(Size);
var uniqueValues = _uniqueValues;
for (int i = 0; i < uniqueValues.Length; i++)
collection.Add(uniqueValues[i]);
return collection;
}
}
} |
Benchmark results on Arm64
|
cc @dotnet/jit-contrib PTAL @EgorBo @AndyAyersMS Diffs. Regressions as one would expect. I had feared TP regressions would be bad, but apparently not so. I wonder if similar implementation could be used to allow if-conversion in loops, or if we should perhaps try for some more general loop-carried dependency analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this only really matters for small loops where the loads are on the critical path, so a very small budget might suffice.
} | ||
}; | ||
|
||
pushPreds(m_block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lower runs in linear block order, is there any concern that we might not have yet lowered some of these preds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume it's not important since it's just a heuritics and STORIND for primitives look generally the same before & after lower (except the addressing modes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think there is a concern there. gtPeelOffsets
that this is using works both on pre and post lowered forms of address modes.
} | ||
}; | ||
|
||
pushPreds(m_block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume it's not important since it's just a heuritics and STORIND for primitives look generally the same before & after lower (except the addressing modes)
Improvements from this PR: |
Very targeted fix for #93401 and #101437: before reordering two indirections, check if there is a potential store in the same loop that looks like it could end up being a candidate for store-to-load forwarding into one of those indirections. Some hardware does not handle store-to-load forwarding with the same fidelity when
stp
/ldp
is involved compared to multiplestr
/ldr
.The detection is done by a graph walk that starts at the indirection and then walks backwards until it finds a store that would reach the indirection. The walk is limited to stay within the same loop as the indirections, and also limited by a budget of 100 nodes visited (for a large loop we expect this to not be as important).
If we detect the situation then avoid doing the reordering.
Not so happy with the complexity. I would probably rather disable this transformation entirely, but this is much more surgical.
Fix #93401
Fix #101437