Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calls to Interlocked.CompareExchange<T> prevents inlining #8662

Closed
redknightlois opened this issue Jul 31, 2017 · 9 comments · Fixed by #99265
Closed

Calls to Interlocked.CompareExchange<T> prevents inlining #8662

redknightlois opened this issue Jul 31, 2017 · 9 comments · Fixed by #99265
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions in-pr There is an active PR which will close this issue when it is merged optimization tenet-performance Performance related issue
Milestone

Comments

@redknightlois
Copy link

This is the smaller code that showcase the issue. This code is working very hard to avoid cache issues but still have to pay for the call because it refuses to inline in CoreCLR 1.1 and CoreCLR 2.0 (1w old build).

I may be missing something, but dont see any reason why it shouldn't inline.

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Threading;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var p = new Program<object>();
            p.Test();

            Console.WriteLine("Hello World!");
        }
    }

    public class Program<T> where T : class, new()
    {
        [StructLayout(LayoutKind.Sequential, Size = 128)]
        private struct CacheAwareElement
        {
            private readonly long _pad1;
            private T Value;

            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            public static bool TryClaim(ref CacheAwareElement bucket, out T item)
            {
                // Note that the initial read is optimistically not synchronized. That is intentional. 
                // We will interlock only when we have a candidate. in a worst case we may miss some
                // recently returned objects. Not a big deal.
                T inst = bucket.Value;
                if (inst != null && inst == Interlocked.CompareExchange(ref bucket.Value, null, inst))
                    goto Done;

                item = null;
                return false;

                Done:
                item = inst;
                return true;
            }
        }

        private CacheAwareElement _element;

        [MethodImpl(MethodImplOptions.NoInlining)]
        public T Test()
        {
            T instance;
            CacheAwareElement.TryClaim(ref _element, out instance);
            return instance;
        }
    }
}

cc @AndyAyersMS

category:cq
theme:inlining
skill-level:expert
cost:medium

@mikedn
Copy link
Contributor

mikedn commented Jul 31, 2017

Inlining isn't blocked by CompareExchange but by runtime dictionary lookup that's needed in the generic code.

@redknightlois
Copy link
Author

I would have never expect a dictionary lookup underlying Interlocked.CompareExchange when T is class and therefore known to be 8 bytes on 64 bits (or 4 bytes when 32bits architecture).

@mikedn
Copy link
Contributor

mikedn commented Jul 31, 2017

I would have never expect a dictionary lookup underlying Interlocked.CompareExchange when T is class and therefore known to be 8 bytes on 64 bits (or 4 bytes when 32bits architecture).

It is exactly when T is a class that you do get runtime lookups, that's how shared generic code works. But yes, the JIT could probably try a bit harder in the specific case of Interlocked.CompareExchange<T>. Probably it could make it look like a call to Interlocked.CompareExchange(ref object, object object) so runtime lookup is not required.

For example, if you change your code to not be generic you get the following code for Test:

G_M2434_IG01:
       56                   push     rsi
       4883EC20             sub      rsp, 32
G_M2434_IG02:
       4883C108             add      rcx, 8
       488B31               mov      rsi, gword ptr [rcx]
       4885F6               test     rsi, rsi
       740F                 je       SHORT G_M2434_IG03
       4C8BC6               mov      r8, rsi
       33D2                 xor      rdx, rdx
       E8C5B1505F           call     System.Threading.Interlocked:CompareExchange(byref,ref,ref):ref
       483BC6               cmp      rax, rsi
       7404                 je       SHORT G_M2434_IG04
G_M2434_IG03:
       33C0                 xor      rax, rax
       EB03                 jmp      SHORT G_M2434_IG05
G_M2434_IG04:
       488BC6               mov      rax, rsi
G_M2434_IG05:
       4883C420             add      rsp, 32
       5E                   pop      rsi
       C3                   ret

@AndyAyersMS
Copy link
Member

The inlines done are:

Inlines into 06000003 ConsoleApp1.Program`1[__Canon][System.__Canon]:Test():ref:this
  [0 IL=0008 TR=000006 06000005] [FAILED: runtime dictionary lookup] CacheAwareElement[__Canon][System.__Canon]:TryClaim(byref,byref):bool

Inlines into 06000005 CacheAwareElement[__Canon][System.__Canon]:TryClaim(byref,byref):bool
  [1 IL=0037 TR=000037 06001E87] [below ALWAYS_INLINE size] System.Threading.Interlocked:CompareExchange(byref,ref,ref):ref

You often can't inline from one shared generic class into another, so Program<T> and CacheAwareElement<T> are the culprits blocking the inline of TryClaim. The jit cannot foresee that if it did this inline and then the second-level inline of CompareExchange, the runtime lookups would not be required or would go unused.

Making the inner class non-generic or instantiating the outer class over a value type (eg via the struct wrapped ref class trick if you really wanted a ref class all along) would likely unblock this case.

From the second inline you can see that interlocked compare exchange is inlineable (and there's already a magic transformation to call the non-generic version thanks to IL intrinsics).

The inlined version calls into native code so there's nothing more the jit can do here. Even if all the inlines were unblocked you would still see a call to do the compare exchange.

I haven't verified this, and my understanding of this part of the runtime is sketchy, but the fact that TryClaim thinks it needs a runtime lookup to call CompareExchange may come from fact that the runtime's code-sharing analysis reflects the IL in the original method body for CompareExchange and not the less demanding IL intrinsic version. Since we don't ever expect the original body to be used it might be interesting to replace it with something more minimal and see if that also unblocks things.

@mikedn
Copy link
Contributor

mikedn commented Jul 31, 2017

Since we don't ever expect the original body to be used it might be interesting to replace it with something more minimal and see if that also unblocks things.

Tried, same result.

@redknightlois
Copy link
Author

No luck either.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@AndyAyersMS
Copy link
Member

This is a runtime restriction so not much the jit can do here until there is runtime support.

@davidwrighton has prototyped some changes that would unblock such cases (#31833).

@AndyAyersMS AndyAyersMS removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@AndyAyersMS
Copy link
Member

Nothing much has changed here. @EgorBo weren't you working on some changes to remove the shared-shared inlining restrictions?

@EgorBo
Copy link
Member

EgorBo commented Aug 16, 2023

Nothing much has changed here. @EgorBo weren't you working on some changes to remove the shared-shared inlining restrictions?

Perhaps, for .NET 9.0 🙂

@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Mar 5, 2024
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Mar 9, 2024
@EgorBo EgorBo self-assigned this Mar 9, 2024
@EgorBo EgorBo modified the milestones: Future, 9.0.0 Mar 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions in-pr There is an active PR which will close this issue when it is merged optimization tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants