-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] 230 kB size regression in Hello World from Enum changes #79437
Comments
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue Details130 kB is about 8% of the hello world. Caused by the enum changes in #78580. Map files before/after: NAotHello.map.txt There's no clear culprit, it's a death by thousand papercuts. There are some odd things in the diff, like why do we need so many new delegates, arraysorthelpers, System.Half, System.Int128, etc.
|
@MichalStrehovsky, would it be relatively straightforward for you to determine how much of this increase is due to:
? If it turns out the size increase in both NativeAOT and mono AOT are due to (2) and/or (3), I can temporarily back those out until we come up with a solution.
Yeah, I don't know how the enum changes would have impacted that, though they obviously did. |
Each Array.Sort instantiation costs about 10kB of AOT binary footprint. The change introduced 13 Array.Sort instantiations, so that explains more than half of the regression.
The numeric interface methods have less than ideal behavior with trimming. If a numeric interface method is used on one type, it gets preserved on all types that implement. It explains the new methods on System.Half, System.Int128. |
I see. Well native aot doesn't support 5 of those (right?). We could trivially ifdef out the types not expressible in C#. That'd eliminate 40% of that. We could pay the one-time-per-enum costs to use the non-generic sort to address the rest, at least for aot? |
I can change the implementations to not use the generic interfaces and just special-case everything to each of the underlying types, but that's very far from ideal, and these instantiation will all come right back the moment someone else in the app uses them. What would you recommend? |
I think there were some significant improvements in trimming added to ILLink recently. @jtschuster would know more about whether they would work in these cases. If we can improve trimming such that these interfaces are removable when unused I think that's probably the right direction. |
@agocke the linker is not used for code trimming for nativeaot |
It's not immediately obvious from the logs in the top post and actually trying it would involve manually backing out pieces of the change and recompiling. Not keen on doing that.
I started drilling into this part - the numeric interfaces add about 29 kB of the total cost. I think it's doable to get rid of that in the compiler. The problem is patterns like this: runtime/src/libraries/System.Private.CoreLib/src/System/Byte.cs Lines 1088 to 1093 in b1a2080
This brings a boxed That will get rid of the 29 kB but there's still plenty left. |
In case you change your mind: |
Right, we would implement the same thing @jtschuster did for Native AOT, or more advanced if necessary. However, that typeof pattern is tricky. Branch elimination was not what I had in mind. |
Thanks, I'll take a look at that tomorrow. In the meantime, #79594 gets rid of the costs associated with Half/Int128/UInt128 (29 kB) and a little bit extra on top. |
With:
We're down to a ~50 kB regression (2%). Here's the MAP files (diffable with windiff):
I think we could get rid of a good chunk if we could do something about the sorting. We now sort the enums with the object-based Array.Sort, but that means it brings code to compare DateTime, decimal, strings and a bunch of other stuff. Commenting out the Array.Sort saves another 24 kB. If we could save that, we could close this issue. |
What if we just used a simple but O(n^2) sort like insertion sort, banking on enum's being relatively small in size in general? If we went back to it being generic, such that there were still eight copies, but it was small, I wonder if that would suffice. We could try calling runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs Line 1034 in c635ae2
|
We used ulong-based sort before. How did ulong-based sort avoided bringing in all this stuff? |
Here's the relevant chunk from the WhyDgml output:
The default comparers are special cased by the compiler. If we're in unshared code, we know exactly which comparer to use. If we're in shared generic code, |
What's "relatively small" in this context? For example, just searching some of the C# projects I've recently touched on my machine, I have an |
It would depend on how close to sorted the values were. Here, for 250 values, you can see cases where the data is already perfectly sorted (Mode 0), where the first and last elements are swapped (Mode 1), and where the elements are entirely reversed (Mode 2). If the data is already sorted, insertion sort is typically really fast. If it's mostly sorted, it's reasonably on par. If it's not at all sorted, it's an order of magnitude slower.
Benchmarkusing BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;
using System.Numerics;
[MemoryDiagnoser]
public partial class Program
{
private string[] _names = Enumerable.Range(1, 250).Select(i => $"Value{i}").ToArray();
private int[] _values = Enumerable.Range(1, 250).ToArray();
static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
[Params(0, 1, 2)]
public int Mode { get; set; }
private void HandleMode()
{
switch (Mode)
{
case 0:
break;
case 1:
(_values[0], _values[^1]) = (_values[^1], _values[0]);
break;
case 2:
Array.Reverse(_values);
break;
}
}
[Benchmark]
public void IntroSort()
{
HandleMode();
Array.Sort(_values, _names);
}
[Benchmark]
public void InsertionSort()
{
HandleMode();
InsertionSort(_values, _names);
}
private static void InsertionSort<TKey, TValue>(TKey[] keys, TValue[] values) where TKey : IComparisonOperators<TKey, TKey, bool>
{
for (int i = 0; i < keys.Length - 1; i++)
{
TKey t = keys[i + 1];
TValue tValue = values[i + 1];
int j = i;
while (j >= 0 && (t == null || t < keys[j]))
{
keys[j + 1] = keys[j];
values[j + 1] = values[j];
j--;
}
keys[j + 1] = t!;
values[j + 1] = tValue;
}
}
} But it sounds like my question/thought is mostly moot anyway as it seems Michal has a solution that fixes the problem with a custom comparer. |
@MichalStrehovsky, is there still more to do here, or does #79845 address this? |
#79594 mentioned in the calculations above recovers 30 kB of the regression but didn't merge yet. |
All of the pull requests have been merged. |
230 kB is about 8% of the hello world.
Caused by the enum changes in #78580.
Map files before/after:
NAotHello.map.txt
NAotHello.map.txt
There's no clear culprit, it's a death by thousand papercuts. There are some odd things in the diff, like why do we need so many new delegates, arraysorthelpers, System.Half, System.Int128, etc.
The text was updated successfully, but these errors were encountered: