[NativeAOT] 230 kB size regression in Hello World from Enum changes #79437

MichalStrehovsky · 2022-12-09T07:38:38Z

230 kB is about 8% of the hello world.

Caused by the enum changes in #78580.

Map files before/after:

There's no clear culprit, it's a death by thousand papercuts. There are some odd things in the diff, like why do we need so many new delegates, arraysorthelpers, System.Half, System.Int128, etc.

ghost · 2022-12-09T07:38:43Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

130 kB is about 8% of the hello world.

Caused by the enum changes in #78580.

Map files before/after:

NAotHello.map.txt
NAotHello.map.txt

There's no clear culprit, it's a death by thousand papercuts. There are some odd things in the diff, like why do we need so many new delegates, arraysorthelpers, System.Half, System.Int128, etc.

Author:	MichalStrehovsky
Assignees:	-
Labels:	`area-NativeAOT-coreclr`
Milestone:	-

stephentoub · 2022-12-09T14:44:42Z

@MichalStrehovsky, would it be relatively straightforward for you to determine how much of this increase is due to:

The changes in Enum itself to pivot to using EnumInfo<TUnderlyingType> rather than storing everything as a long

The special-casing of typeof(T).IsEnum in the interpolated string handlers, e.g.

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/DefaultInterpolatedStringHandler.cs

Lines 309 to 319 in add75d0

    
           if (typeof(T).IsEnum) 
        
           { 
        
               int charsWritten; 
        
               while (!Enum.TryFormatUnconstrained(value, _chars.Slice(_pos), out charsWritten)) 
        
               { 
        
                   Grow(); 
        
               } 
        
               _pos += charsWritten; 
        
               return; 
        
           }

Making Enum implement ISpanFormattable:

runtime/src/libraries/System.Private.CoreLib/src/System/Enum.cs

Line 26 in add75d0

public abstract partial class Enum : ValueType, IComparable, ISpanFormattable, IConvertible

?

If it turns out the size increase in both NativeAOT and mono AOT are due to (2) and/or (3), I can temporarily back those out until we come up with a solution.

There are some odd things in the diff, like why do we need so many new delegates, arraysorthelpers, System.Half, System.Int128, etc.

Yeah, I don't know how the enum changes would have impacted that, though they obviously did.

jkotas · 2022-12-09T15:13:36Z

arraysorthelpers

Each Array.Sort instantiation costs about 10kB of AOT binary footprint. The change introduced 13 Array.Sort instantiations, so that explains more than half of the regression.

System.Half, System.Int128,

The numeric interface methods have less than ideal behavior with trimming. If a numeric interface method is used on one type, it gets preserved on all types that implement. It explains the new methods on System.Half, System.Int128.

stephentoub · 2022-12-09T15:22:15Z

Each Array.Sort instantiation costs about 10kB of AOT binary footprint. The change introduced 13 Array.Sort instantiations, so that explains more than half of the regression.

I see. Well native aot doesn't support 5 of those (right?). We could trivially ifdef out the types not expressible in C#. That'd eliminate 40% of that. We could pay the one-time-per-enum costs to use the non-generic sort to address the rest, at least for aot?

stephentoub · 2022-12-09T15:25:02Z

The numeric interface methods have less than ideal behavior with trimming. If a numeric interface method is used on one type, it gets preserved on all types that implement. It explains the new methods on System.Half, System.Int128.

I can change the implementations to not use the generic interfaces and just special-case everything to each of the underlying types, but that's very far from ideal, and these instantiation will all come right back the moment someone else in the app uses them. What would you recommend?

agocke · 2022-12-09T20:01:05Z

I think there were some significant improvements in trimming added to ILLink recently. @jtschuster would know more about whether they would work in these cases.

If we can improve trimming such that these interfaces are removable when unused I think that's probably the right direction.

marek-safar · 2022-12-10T08:06:51Z

@agocke the linker is not used for code trimming for nativeaot

MichalStrehovsky · 2022-12-12T09:00:31Z

@MichalStrehovsky, would it be relatively straightforward for you to determine how much of this increase is due to:

It's not immediately obvious from the logs in the top post and actually trying it would involve manually backing out pieces of the change and recompiling. Not keen on doing that.

System.Half, System.Int128,

The numeric interface methods have less than ideal behavior with trimming. If a numeric interface method is used on one type, it gets preserved on all types that implement. It explains the new methods on System.Half, System.Int128.

I started drilling into this part - the numeric interfaces add about 29 kB of the total cost. I think it's doable to get rid of that in the compiler. The problem is patterns like this:

runtime/src/libraries/System.Private.CoreLib/src/System/Byte.cs

Lines 1088 to 1093 in b1a2080

    
           else if (typeof(TOther) == typeof(Half)) 
        
           { 
        
               Half actualResult = value; 
        
               result = (TOther)(object)actualResult; 
        
               return true; 
        
           }

This brings a boxed Half into the system. RyuJIT can already eliminate the entire branch (because we compile instantiated code), but we run the compilation in two phases (the second one is RyuJIT) and the first phase ends up unnecessarily rooting a boxed Half. There are two ways to approach this: either remove unreachable branches in the first phase (we can do it without the whole program view we're building because this scans instantiated code too), or make it so that we don't pass the rooted Half to the second phase. I already have a working prototype for the former, but looking into the latter as well because it might be more generally applicable.

That will get rid of the 29 kB but there's still plenty left.

stephentoub · 2022-12-12T13:21:19Z

actually trying it would involve manually backing out pieces of the change

In case you change your mind:
(2): https://github.com/stephentoub/runtime/tree/commentoutisenum
(3): https://github.com/stephentoub/runtime/tree/commentoutenumspanformattable
(2+3): https://github.com/stephentoub/runtime/tree/commentoutboth

agocke · 2022-12-12T17:23:09Z

@agocke the linker is not used for code trimming for nativeaot

Right, we would implement the same thing @jtschuster did for Native AOT, or more advanced if necessary.

However, that typeof pattern is tricky. Branch elimination was not what I had in mind.

MichalStrehovsky · 2022-12-13T09:15:11Z

In case you change your mind:

Thanks, I'll take a look at that tomorrow.

In the meantime, #79594 gets rid of the costs associated with Half/Int128/UInt128 (29 kB) and a little bit extra on top.

MichalStrehovsky · 2022-12-14T08:46:18Z

With:

Minus 100 kB: Use non-generic Array.Sort in EnumInfo on nativeaot #79473
Minus 36 kB: ifdef out unsupported Enum underlying types for nativeaot #79472
Minus 30 kB: Reduce the number of forced MethodTables #79594

We're down to a ~50 kB regression (2%).

Here's the MAP files (diffable with windiff):

With the enum change reverted: NAotHello.map.txt
After all 3 pull requests: NAotHello.map.txt

I think we could get rid of a good chunk if we could do something about the sorting. We now sort the enums with the object-based Array.Sort, but that means it brings code to compare DateTime, decimal, strings and a bunch of other stuff. Commenting out the Array.Sort saves another 24 kB. If we could save that, we could close this issue.

stephentoub · 2022-12-14T09:14:55Z

I think we could get rid of a good chunk if we could do something about the sorting

What if we just used a simple but O(n^2) sort like insertion sort, banking on enum's being relatively small in size in general? If we went back to it being generic, such that there were still eight copies, but it was small, I wonder if that would suffice. We could try calling

runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs

Line 1034 in c635ae2

private static void InsertionSort(Span<TKey> keys, Span<TValue> values)

?

jkotas · 2022-12-14T14:57:13Z

We now sort the enums with the object-based Array.Sort, but that means it brings code to compare DateTime, decimal, strings and a bunch of other stuff.

We used ulong-based sort before. How did ulong-based sort avoided bringing in all this stuff?

MichalStrehovsky · 2022-12-14T21:33:46Z

We now sort the enums with the object-based Array.Sort, but that means it brings code to compare DateTime, decimal, strings and a bunch of other stuff.

We used ulong-based sort before. How did ulong-based sort avoided bringing in all this stuff?

Here's the relevant chunk from the WhyDgml output:

        (Secondary) VirtualMethodUse [S.P.CoreLib]System.IComparable.CompareTo(object)
          (Interface method use) __InterfaceDispatchCell_S_P_CoreLib_System_IComparable__CompareTo
            (callvirt) S_P_CoreLib_System_Collections_Comparer__Compare
              (Instance method on a constructed type) (Tentative instance method: S_P_CoreLib_System_Collections_Comparer__Compare, ??_7S_P_CoreLib_System_Collections_Comparer@@6B@ constructed)
                (Primary) Tentative instance method: S_P_CoreLib_System_Collections_Comparer__Compare
                  (callvirt) S_P_CoreLib_System_Collections_Generic_ObjectComparer_1<System___Canon>__Compare
                    (Virtual method) (??_7S_P_CoreLib_System_Collections_Generic_ObjectComparer_1<System___Canon>@@6B@, VirtualMethodUse [S.P.CoreLib]System.Collections.Generic.Comparer`1<System.__Canon>.Compare(__Canon,__Canon))
                      (Primary) ??_7S_P_CoreLib_System_Collections_Generic_ObjectComparer_1<System___Canon>@@6B@
                        (Template MethodTable) NativeLayoutTemplateTypeLayoutVertexNode_S_P_CoreLib_System_Collections_Generic_ObjectComparer_1<T_System___Canon>
                          (Generic comparer) S_P_CoreLib_System_Collections_Generic_Comparer_1<System___Canon>__Create
                            (call) S_P_CoreLib_System_Collections_Generic_Comparer_1<System___Canon>__get_Default
                              (call) S_P_CoreLib_System_Collections_Generic_ArraySortHelper_2<System___Canon__System___Canon>__Sort

The default comparers are special cased by the compiler. If we're in unshared code, we know exactly which comparer to use. If we're in shared generic code, ObjectComparer it is and that one uses IComparable.CompareTo, bringing comparison functionality for everything that was ever allocated or boxed in the program.

kg · 2022-12-20T18:27:24Z

I think we could get rid of a good chunk if we could do something about the sorting

What if we just used a simple but O(n^2) sort like insertion sort, banking on enum's being relatively small in size in general? If we went back to it being generic, such that there were still eight copies, but it was small, I wonder if that would suffice. We could try calling

What's "relatively small" in this context? For example, just searching some of the C# projects I've recently touched on my machine, I have an enum of wasm opcodes that's got about 250 values in it (and would have more if I added further extensions). I've got a D3DFORMAT enum with upwards of 50 entries by my count. I also see a bunch of mid-to-large-sized enums in libraries like SDL2, mojoshader, sharpfont, etc. How badly would this stuff degrade for a large enum? Is there a path for end users to work around the bad sort? I don't have a good intuition for how bad insertion sort would be for an enum with 250 values in it, but I suspect regular production software has enums big enough to get into the Trouble Zone for O(N^2) unless O is very fast or this code path is only hit for obscure scenarios.

stephentoub · 2022-12-20T20:07:19Z

I have an enum of wasm opcodes that's got about 250 values in it ... How badly would this stuff degrade for a large enum?

It would depend on how close to sorted the values were. Here, for 250 values, you can see cases where the data is already perfectly sorted (Mode 0), where the first and last elements are swapped (Mode 1), and where the elements are entirely reversed (Mode 2). If the data is already sorted, insertion sort is typically really fast. If it's mostly sorted, it's reasonably on par. If it's not at all sorted, it's an order of magnitude slower.

Method	Mode	Mean
IntroSort	0	2.233 us
InsertionSort	0	1.174 us
IntroSort	1	2.281 us
InsertionSort	1	2.724 us
IntroSort	2	4.639 us
InsertionSort	2	98.910 us

Benchmark

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;
using System.Numerics;

[MemoryDiagnoser]
public partial class Program
{
    private string[] _names = Enumerable.Range(1, 250).Select(i => $"Value{i}").ToArray();
    private int[] _values = Enumerable.Range(1, 250).ToArray();

    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    [Params(0, 1, 2)]
    public int Mode { get; set; }

    private void HandleMode()
    {
        switch (Mode)
        {
            case 0:
                break;
            case 1:
                (_values[0], _values[^1]) = (_values[^1], _values[0]);
                break;
            case 2:
                Array.Reverse(_values);
                break;
        }
    }

    [Benchmark]
    public void IntroSort()
    {
        HandleMode();
        Array.Sort(_values, _names);
    }

    [Benchmark]
    public void InsertionSort()
    {
        HandleMode();
        InsertionSort(_values, _names);
    }

    private static void InsertionSort<TKey, TValue>(TKey[] keys, TValue[] values) where TKey : IComparisonOperators<TKey, TKey, bool>
    {
        for (int i = 0; i < keys.Length - 1; i++)
        {
            TKey t = keys[i + 1];
            TValue tValue = values[i + 1];

            int j = i;
            while (j >= 0 && (t == null || t < keys[j]))
            {
                keys[j + 1] = keys[j];
                values[j + 1] = values[j];
                j--;
            }

            keys[j + 1] = t!;
            values[j + 1] = tValue;
        }
    }
}

But it sounds like my question/thought is mostly moot anyway as it seems Michal has a solution that fixes the problem with a custom comparer.

stephentoub · 2023-01-04T17:58:01Z

We now sort the enums with the object-based Array.Sort, but that means it brings code to compare DateTime, decimal, strings and a bunch of other stuff. Commenting out the Array.Sort saves another 24 kB. If we could save that, we could close this issue.

@MichalStrehovsky, is there still more to do here, or does #79845 address this?

MichalStrehovsky · 2023-01-05T08:34:10Z

We now sort the enums with the object-based Array.Sort, but that means it brings code to compare DateTime, decimal, strings and a bunch of other stuff. Commenting out the Array.Sort saves another 24 kB. If we could save that, we could close this issue.

@MichalStrehovsky, is there still more to do here, or does #79845 address this?

#79594 mentioned in the calculations above recovers 30 kB of the regression but didn't merge yet.

MichalStrehovsky · 2023-01-11T02:51:06Z

All of the pull requests have been merged.

MichalStrehovsky added the area-NativeAOT-coreclr label Dec 9, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Dec 9, 2022

MichalStrehovsky mentioned this issue Dec 9, 2022

[mono][perf] iOS and WASM disk size regressions on 06 Dec 2022 #79285

Closed

MichalStrehovsky changed the title ~~[NativeAOT] 130 kB size regression in Hello World from Enum changes~~ [NativeAOT] 230 kB size regression in Hello World from Enum changes Dec 9, 2022

This was referenced Dec 9, 2022

ifdef out unsupported Enum underlying types for nativeaot #79472

Merged

Use non-generic Array.Sort in EnumInfo on nativeaot #79473

Merged

lewing mentioned this issue Dec 9, 2022

[Perf] Linux/x64: 73 Regressions on 12/8/2022 11:11:00 AM dotnet/perf-autofiling-issues#10544

Open

MichalStrehovsky mentioned this issue Dec 20, 2022

Fix size regression from enum sorting #79845

Merged

MichalStrehovsky closed this as completed Jan 11, 2023

ghost removed the untriaged New issue has not been triaged by the area owner label Jan 11, 2023

ghost locked as resolved and limited conversation to collaborators Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NativeAOT] 230 kB size regression in Hello World from Enum changes #79437

[NativeAOT] 230 kB size regression in Hello World from Enum changes #79437

MichalStrehovsky commented Dec 9, 2022 •

edited

Loading

ghost commented Dec 9, 2022

stephentoub commented Dec 9, 2022 •

edited

Loading

jkotas commented Dec 9, 2022

stephentoub commented Dec 9, 2022

stephentoub commented Dec 9, 2022

agocke commented Dec 9, 2022

marek-safar commented Dec 10, 2022

MichalStrehovsky commented Dec 12, 2022

stephentoub commented Dec 12, 2022

agocke commented Dec 12, 2022 •

edited

Loading

MichalStrehovsky commented Dec 13, 2022

MichalStrehovsky commented Dec 14, 2022

stephentoub commented Dec 14, 2022 •

edited

Loading

jkotas commented Dec 14, 2022

MichalStrehovsky commented Dec 14, 2022 •

edited

Loading

kg commented Dec 20, 2022

stephentoub commented Dec 20, 2022 •

edited

Loading

stephentoub commented Jan 4, 2023

MichalStrehovsky commented Jan 5, 2023

MichalStrehovsky commented Jan 11, 2023

[NativeAOT] 230 kB size regression in Hello World from Enum changes #79437

[NativeAOT] 230 kB size regression in Hello World from Enum changes #79437

Comments

MichalStrehovsky commented Dec 9, 2022 • edited Loading

ghost commented Dec 9, 2022

stephentoub commented Dec 9, 2022 • edited Loading

jkotas commented Dec 9, 2022

stephentoub commented Dec 9, 2022

stephentoub commented Dec 9, 2022

agocke commented Dec 9, 2022

marek-safar commented Dec 10, 2022

MichalStrehovsky commented Dec 12, 2022

stephentoub commented Dec 12, 2022

agocke commented Dec 12, 2022 • edited Loading

MichalStrehovsky commented Dec 13, 2022

MichalStrehovsky commented Dec 14, 2022

stephentoub commented Dec 14, 2022 • edited Loading

jkotas commented Dec 14, 2022

MichalStrehovsky commented Dec 14, 2022 • edited Loading

kg commented Dec 20, 2022

stephentoub commented Dec 20, 2022 • edited Loading

stephentoub commented Jan 4, 2023

MichalStrehovsky commented Jan 5, 2023

MichalStrehovsky commented Jan 11, 2023

MichalStrehovsky commented Dec 9, 2022 •

edited

Loading

stephentoub commented Dec 9, 2022 •

edited

Loading

agocke commented Dec 12, 2022 •

edited

Loading

stephentoub commented Dec 14, 2022 •

edited

Loading

MichalStrehovsky commented Dec 14, 2022 •

edited

Loading

stephentoub commented Dec 20, 2022 •

edited

Loading