-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve perf of Enumerable.Sum/Average/Max/Min for arrays and lists #64624
Conversation
Tagging subscribers to this area: @dotnet/area-system-linq Issue DetailsIt's common to use these terminal functions for quick stats on arrays and lists of values. Just the overhead of enumerating as an enumerable (involving multiple interface dispatch) per iteration is significant, and it's much faster to directly enumerate the contents of the array or the list. In some cases, we can further use vectorization to speed up the processing. This change:
@tannergooding, I assume the use of vectorization for floats/doubles could change the answer in some cases due to lack of associativity, yes? Thoughts on how much we should care about that? Also, it seemed like it should be possible to vectorize some of the methods doing checked arithmetic, but I wasn't sure how to do so correctly and skipped those. It also seemed like it should be possible to vectorize min/max for floats/doubles, but they have special-handling of NaN I couldn't figure out how to replicate with Vector. @eiriktsarpalis, please let me know in general if you're ok with the extra code here for this special-casing. We don't have to do it, but it seems like an inexpensive win for what appears to be common, e.g. building up a
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
[Params(2, 32)]
public int Length { get; set; }
[Params("Array", "Enumerable")]
public string Mode { get; set; }
private IEnumerable<int> _ints;
private IEnumerable<long> _longs;
private IEnumerable<float> _floats;
private IEnumerable<decimal> _decimals;
[Benchmark] public float MinFloat() => _floats.Min();
[Benchmark] public int SumInt32() => _ints.Sum();
[Benchmark] public long SumInt64() => _longs.Sum();
[Benchmark] public float SumFloat() => _floats.Sum();
[Benchmark] public decimal SumDecimal() => _decimals.Sum();
[Benchmark] public double AverageInt32() => _ints.Average();
[Benchmark] public double AverageInt64() => _longs.Average();
[Benchmark] public float AverageFloat() => _floats.Average();
[Benchmark] public decimal AverageDecimal() => _decimals.Average();
[GlobalSetup]
public void Setup()
{
_ints = Enumerable.Range(1, Length);
if (Mode == "Array") _ints = _ints.ToArray();
_longs = Enumerable.Range(1, Length).Select(i => (long)i);
if (Mode == "Array") _longs = _longs.ToArray();
_floats = Enumerable.Range(1, Length).Select(i => (float)i);
if (Mode == "Array") _floats = _floats.ToArray();
_decimals = Enumerable.Range(1, Length).Select(i => (decimal)i);
if (Mode == "Array") _decimals = _decimals.ToArray();
}
}
|
Should be ok, but per your own comment in #64470 (comment) it would be nice if the code be moved out of Linq eventually. |
@stephentoub right. There are a number of cases where we cannot trivially vectorize float/double (particularly operations that compute a new value like The simplest scenario to explain is that with What this means is that if you take a case like Vectorization is impacted here because it ends up adding Things that do not compute a "new value" like Min/Max (it only does a comparison and returns one of the inputs) can be vectorized but it needs to take into account things like NaN handling as you indicated. For overflow checking, it can be simple when you know the inputs are always positive or negative. It's not as simple when it can randomly be positive or negative. |
Ok, deleting. |
d598db8
to
73a9d35
Compare
It's very common to use these terminal functions for quick stats on arrays and lists of values. Just the overhead of enumerating as an enumerable (involving multiple interface dispatch) per iteration is significant, and it's much faster to directly enumerate the contents of the array or the list. In some cases, we can further use vectorization to speed up the processing. This change: - Adds a helper that does a fast check to see if it can extract a span from an enumerable that's actually an array or a list. It could be augmented to detect other interesting types, but `T[]` and `List<T>` are the most relevant from the data I've seen, and we can fairly quickly do type checks to get the most benefit for a small amount of cost. - Uses that helper in the int/long/float/double/decimal overloads of Sum/Average/Min/Max to add a span-based path. - Vectorizes Sum for float and double - Vectorizes Average for int, float, and double (the latter two via use of Sum)
73a9d35
to
0669b9d
Compare
Vector<long> sums = default; | ||
do | ||
{ | ||
Vector.Widen(new Vector<int>(span.Slice(i)), out Vector<long> low, out Vector<long> high); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the bounds checks actually getting elided here for new Vector<int>(span.Slice(i))
? This seems like its going to be doing a bunch of extra work each iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so. But I also didn't want to introduce unsafe code. Is there a way to write it that's "safe" (i.e. if I made a mistake it would result in an exception rather than potential corruption / security problems) and that avoids the bounds checks? My assumption is even if I eliminate the checks in Slice by using a loop pattern the JIT recognizes, Vector itself will still be doing a length check on the length of the span supplied, and that won't be removed (or will it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other places in the runtime use a helper built around Unsafe.ReadUnaligned
and then the xplat helpers now have Vector128.LoadUnsafe(ref T, nuint index)
.
There's never really going to be a "safe" way to do this unless the JIT gets special support, however. You functionally have some T
and want to read 2-32 of that T
(depending on what the actual type is). So the best you'll generally get is doing the right checks up front and potentially adding some asserts
. Otherwise, you pay the cost of slicing, copying the span, and doing the relevant bounds checks each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that in this code. If these helpers are ever moved into MemoryExtensions, we can go to town on optimizing the heck out of them, both with avoiding bounds checks and with adding whatever additional paths are necessary to get the best perf. For LINQ, I think this is good enough, and there's benefit to no proliferating unsafe code here.
It's common to use these terminal functions for quick stats on arrays and lists of values. Just the overhead of enumerating as an enumerable (involving multiple interface dispatch) per iteration is significant, and it's much faster to directly enumerate the contents of the array or the list. In some cases, we can further use vectorization to speed up the processing.
This change:
T[]
andList<T>
are the most relevant from the data I've seen, and we can fairly quickly do type checks to get the most benefit for a small amount of cost.@tannergooding, I assume the use of vectorization for floats/doubles could change the answer in some cases due to lack of associativity, yes? Thoughts on how much we should care about that? Also, it seemed like it should be possible to vectorize some of the methods doing checked arithmetic, but I wasn't sure how to do so correctly and skipped those. It also seemed like it should be possible to vectorize min/max for floats/doubles, but they have special-handling of NaN I couldn't figure out how to replicate with Vector.
@eiriktsarpalis, please let me know in general if you're ok with the extra code here for this special-casing. We don't have to do it, but it seems like an inexpensive win for what appears to be common, e.g. building up a
List<int>
and calling Average/Sum/Min/Max on it. There are certainly other patterns common with these, e.g. calling Min on the result of a Select or on other collection types. We could subsequently choose to also special-caseIList<T>
in order to save an interface dispatch per invocation, though I'm hopeful we'll get most of that for free with dynamic PGO, and we could subsequently choose to special-case the internal partitioning interfaces; the difficulty with those is the check for whether the type implements them is more expensive, and there's some aspect of diminishing returns because you're already doing more work to compute each element (whereas with arrays / lists, the amount of work per element is tiny).