Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Check for potential store-to-load forwarding before reordering ldr -> ldp #105695

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jul 30, 2024

Very targeted fix for #93401 and #101437: before reordering two indirections, check if there is a potential store in the same loop that looks like it could end up being a candidate for store-to-load forwarding into one of those indirections. Some hardware does not handle store-to-load forwarding with the same fidelity when stp/ldp is involved compared to multiple str/ldr.

The detection is done by a graph walk that starts at the indirection and then walks backwards until it finds a store that would reach the indirection. The walk is limited to stay within the same loop as the indirections, and also limited by a budget of 100 nodes visited (for a large loop we expect this to not be as important).

If we detect the situation then avoid doing the reordering.

Not so happy with the complexity. I would probably rather disable this transformation entirely, but this is much more surgical.

Fix #93401
Fix #101437

…dr -> ldp

Very targeted fix for dotnet#93401 and dotnet#101437: before reordering two
indirections, check if there is a potential store in the same loop that
looks like it could end up being a candidate for store-to-load
forwarding into one of those indirections. Some hardware does not handle
store-to-load forwarding with the same fidelity when `stp`/`ldp` is
involved compared to multiple `str`/`ldr`.

If we detect the situation then avoid doing the reordering.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 30, 2024
@jakobbotsch
Copy link
Member Author

@EgorBot -arm64 --disasm --envvars "DOTNET_JitDisasm:List"

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using BenchmarkDotNet.Attributes;

namespace System.Collections
{
    public class AddGivenSize
    {
        private int[] _uniqueValues;

        public int Size = 512;

        [GlobalSetup]
        public void Setup() => _uniqueValues = Enumerable.Range(0, Size).ToArray();

        [Benchmark]
        public List<int> List()
        {
            var collection = new List<int>(Size);
            var uniqueValues = _uniqueValues;
            for (int i = 0; i < uniqueValues.Length; i++)
                collection.Add(uniqueValues[i]);
            return collection;
        }
    }
}

@EgorBot
Copy link

EgorBot commented Jul 30, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-CSKZBQ : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-YTIDAA : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
EnvironmentVariables=DOTNET_JitDisasm=List
Method Toolchain Mean Error Ratio Code Size
List Main 1.718 μs 0.0229 μs 1.00 284 B
List PR 1.389 μs 0.0234 μs 0.82 288 B

BDN_Artifacts.zip

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 31, 2024

cc @dotnet/jit-contrib PTAL @EgorBo @AndyAyersMS

Diffs. Regressions as one would expect. I had feared TP regressions would be bad, but apparently not so.

I wonder if similar implementation could be used to allow if-conversion in loops, or if we should perhaps try for some more general loop-carried dependency analysis.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this only really matters for small loops where the loads are on the critical path, so a very small budget might suffice.

}
};

pushPreds(m_block);
Copy link
Member

@AndyAyersMS AndyAyersMS Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower runs in linear block order, is there any concern that we might not have yet lowered some of these preds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume it's not important since it's just a heuritics and STORIND for primitives look generally the same before & after lower (except the addressing modes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think there is a concern there. gtPeelOffsets that this is using works both on pre and post lowered forms of address modes.

}
};

pushPreds(m_block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume it's not important since it's just a heuritics and STORIND for primitives look generally the same before & after lower (except the addressing modes)

@jakobbotsch jakobbotsch merged commit 69c9bf8 into dotnet:main Aug 1, 2024
108 of 110 checks passed
@jakobbotsch jakobbotsch deleted the ldp-store-to-load-forwarding-dependencies branch August 1, 2024 12:32
@LoopedBard3
Copy link
Member

@github-actions github-actions bot locked and limited conversation to collaborators Sep 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
5 participants