Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default(T) produces non-optimized code in contrast to new T() on (generic) value types when the instance is not used #51825

Open
Enderlook opened this issue Apr 24, 2021 · 4 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@Enderlook
Copy link

Enderlook commented Apr 24, 2021

Description

The runtime is not optimizing the instantiation of a struct (by default) when it's actually not used at all (due to optimizations of the code that make the instance not required).

The following minimal snippet:

using SharpLab.Runtime;

public class C
{    
    [JitGeneric(typeof(One))]
    public int New<T>()
        where T : struct, IValue
        => new T().Value;
    
    [JitGeneric(typeof(One))]
    public int Default<T>()
        where T : struct, IValue
        => default(T).Value;
}

public interface IValue
{
     int Value { get; }
}

public struct One : IValue
{
     public int Value => 1;
}

Produces this JIT asm:

C.New[[One, _]]()
    L0000: mov eax, 1
    L0005: ret

C.Default[[One, _]]()
    L0000: push eax
    L0001: xor eax, eax
    L0003: mov [esp], eax
    L0006: lea eax, [esp]
    L0009: xor edx, edx
    L000b: mov [eax], dl
    L000d: mov eax, 1
    L0012: pop ecx
    L0013: ret

The expected code generation of C.Default[[One, _]]() was the same as C.New[[One, _]]().

Configuration

Run on SharpLab (Core CLR v5.0.421.11614 on x86) on release mode.

Regression?

Don't know. Sorry.

Other information

No idea how to fix it. Sorry.

Though you currently can use new T() instead of default(T) to get the optimized code as a workaround.

By the way, I noticed the generated code is optimized if the type is actually returned to the caller.
For example:

[JitGeneric(typeof(One))]
public T New2<T>()
    where T : struct, IValue
    => new T();

[JitGeneric(typeof(One))]
public T Default2<T>()
    where T : struct, IValue
    => default(T);

Produces:

C.New2[[One, _]]()
    L0000: xor eax, eax
    L0002: ret

C.Default2[[One, _]]()
    L0000: xor eax, eax
    L0002: ret

Or when it's used as a parameter to another function (even if the function is actually inlined):

[JitGeneric(typeof(One))]
public int New3<T>()
    where T : struct, IValue
    => Consume(new T());
    
[JitGeneric(typeof(One))]
public int Default3<T>()
    where T : struct, IValue
    => Consume(default(T));
    
public int Consume<T>(T t)
    where T : IValue
    => t.Value;

Produces:

C.New3[[One, _]]()
    L0000: mov eax, 1
    L0005: ret

C.Default3[[One, _]]()
    L0000: mov eax, 1
    L0005: ret

Or if the instance is mutated:

[JitGeneric(typeof(One))]
public int New4<T>()
    where T : struct, IValue
{
    T t = new T();
    t.Value = 4;
    return t.Value;
}

[JitGeneric(typeof(One))]
public int Default4<T>()
    where T : struct, IValue
{
    T t = default(T);
    t.Value = 4;
    return t.Value;
}

public interface IValue
{
    int Value { get; set; }
}

public struct One : IValue
{
    public int Value { get; set; }
}

Produces:

C.New4[[One, _]]()
    L0000: mov eax, 4
    L0005: ret

C.Default4[[One, _]]()
    L0000: mov eax, 4
    L0005: ret

So, I think the problem only happens when an instance is created but not returned, passed as parameter nor mutated.

category:cq
theme:generics
skill-level:intermediate
cost:medium
impact:small

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 24, 2021
@acaly
Copy link

acaly commented Apr 25, 2021

I guess it's related to the dup IL instruction generated by C# compiler, which it thinks is better than another ldloca, but actually the JIT can't optimize that dup.

Your two examples gives IL output like

New:

IL_0000: call !!0 [System.Private.CoreLib]System.Activator::CreateInstance<!!T>()
IL_0005: stloc.0
IL_0006: ldloca.s 0
IL_0008: constrained. !!T
IL_000e: callvirt instance int32 IValue::get_Value()
IL_0013: ret

Default (notice the dup at IL_0002):

IL_0000: ldloca.s 0
IL_0002: dup
IL_0003: initobj !!T
IL_0009: constrained. !!T
IL_000f: callvirt instance int32 IValue::get_Value()
IL_0014: ret

Actually, if you explicitly store that default into a local, the C# compiler won't do that and the JIT can properly optimize it out, although the IL output is otherwise the same:

IL_0000: ldloca.s 0
IL_0002: initobj !!T
IL_0008: ldloca.s 0
IL_000a: constrained. !!T
IL_0010: callvirt instance int32 IValue::get_Value()
IL_0015: ret
[JitGeneric(typeof(One))]
public int Local<T>() where T : struct, IValue
{
    T local = default;
    return local.Value;
}

So it's probably not the runtime's issue but the compiler's. Also it would be easier to make the compiler not to emit the dup than to teach the runtime to handle dups following a ldloca.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Apr 25, 2021

Yep, it is the dup causing issues. Relevant imported IR looks like this for an equivalent reproduction:

    [ 0]   0 (0x000) ldloca.s 0
    [ 1]   2 (0x002) dup

STMT00000 (IL 0x000...  ???)
               [000003] -A----------              *  ASG       byref 
               [000002] D------N----              +--*  LCL_VAR   byref  V03 tmp1         
               [000001] ------------              \--*  ADDR      byref 
               [000000] -------N----                 \--*  LCL_VAR   struct<S, 1> V01 loc0         

    [ 2]   3 (0x003) initobj 02000003
STMT00001 (IL   ???...  ???)
               [000008] IA----------              *  ASG       struct (init)
               [000007] -------N----              +--*  BLK       struct<1>
               [000005] ------------              |  \--*  LCL_VAR   byref  V03 tmp1         
               [000006] ------------              \--*  CNS_INT   int    0

    [ 1]   9 (0x009) call 0A000008
STMT00002 (IL   ???...  ???)
               [000009] I-C-G-------              *  CALL      int    S.Value (exactContextHnd=0x00000000D1FFAB1E)
               [000004] ------------ this in rcx  \--*  LCL_VAR   byref  V03 tmp1         

While for the case when we have a local as the source:

    [ 0]   0 (0x000) ldloca.s 0
    [ 1]   2 (0x002) initobj 02000003
STMT00000 (IL 0x000...  ???)
               [000003] IA----------              *  ASG       struct (init)
               [000000] D------N----              +--*  LCL_VAR   struct<S, 1> V01 loc0         
               [000002] ------------              \--*  CNS_INT   int    0

    [ 0]   8 (0x008) ldloca.s 0
    [ 1]  10 (0x00a) call 0A000008
STMT00001 (IL 0x008...  ???)
               [000006] I-C-G-------              *  CALL      int    S.Value (exactContextHnd=0x00000000D1FFAB1E)
               [000005] ------------ this in rcx  \--*  ADDR      byref 
               [000004] -------N----                 \--*  LCL_VAR   struct<S, 1> V01 loc0         

The difference in how initobj is imported is because gtNewBlockVal has a small peep that looks for the ADDR(LCL_VAR) pattern and returns the raw LCL_VAR instead of the full BLK if it is present. In the dup case however, the importer assigns ADDR(LCL_VAR) to a temporary because it considers it a complex expression, so no optimization takes place.

My thinking on the direction of the fix is along the lines of importing ldlocas as LCL_VAR_ADDRs and not spilling them when duped. This has non-trivial implications.

@EgorBo
Copy link
Member

EgorBo commented Apr 26, 2021

I guess it's one of those issues where Forward-Sub could help us again 🙂 to substitute that local so the peephole optimization could fold GT_BLK

@EgorBo EgorBo added this to the Future milestone Apr 26, 2021
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Apr 26, 2021
@AndyAyersMS
Copy link
Member

Address of local should not be considered "too complex to clone".

Seems like the dup logic should be willing to use the same heuristic that gtClone uses. In fact it probably should just call gtClone (with the "complex ok" flag set to true).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Status: Optimizations
Development

No branches or pull requests

5 participants