Performance issue of long-running small loops #43743
Replies: 3 comments
-
Can you please open an issue, with one or more code examples in C# and Java, and some data about the configuration, how long they take relatively, etc? |
Beta Was this translation helpful? Give feedback.
-
// Adapted from CountUppercase.java (https://www.graalvm.org/examples/java-performance-examples/).
public class CountUppercase {
public static boolean isUpperCase(int ch) {
return 'A' <= ch && ch <= 'Z';
}
static final int ITERATIONS = Math.max(Integer.getInteger("iterations", 1), 1);
public static void main(String[] args) {
String sentence = String.join(" ", args);
for (int iter = 0; iter < ITERATIONS; iter++) {
if (ITERATIONS != 1) System.out.println("-- iteration " + (iter + 1) + " --");
long total = 0, start = System.currentTimeMillis(), last = start;
for (int i = 1; i < 10_000_000; i++) {
total += sentence.chars().filter(CountUppercase::isUpperCase).count();
if (i % 1_000_000 == 0) {
long now = System.currentTimeMillis();
System.out.printf("%d (%d ms)%n", i / 1_000_000, now - last);
last = now;
}
}
System.out.printf("total: %d (%d ms)%n", total, System.currentTimeMillis() - start);
}
}
} Run with
Here is my C# code. using Internal;
using System;
public class CountUppercase {
struct IsUpperCaseT : IPredicate<char> {
bool IPredicate<char>.Invoke(char arg) => IsUpperCase(arg);
}
static bool IsUpperCase(char arg) => 'A' <= arg && arg <= 'Z';
static readonly int ITERATIONS = Math.Max(Convert.ToInt32(Environment.GetEnvironmentVariable("iterations") ?? $"{1}"), 1);
public static unsafe void Main(String[] args) {
System.Diagnostics.Process.GetCurrentProcess().PriorityClass = System.Diagnostics.ProcessPriorityClass.High;
String sentence = String.Join(" ", args);
for (int iter = 0; iter < ITERATIONS; iter++) {
if (ITERATIONS != 1) Console.Out.WriteLine("-- iteration " + (iter + 1) + " --");
long total = 0, start = Environment.TickCount64, last = start;
for (int i = 1; i < 10_000_000; i++) {
total += sentence.AsSpan().Where(IsUpperCase).Count();
// total += sentence.AsSpan().Where(&IsUpperCase).Count();
// total += sentence.AsSpan().Where(default(IsUpperCaseT)).Count();
if (i % 1_000_000 == 0) {
long now = Environment.TickCount64;
Console.Out.Write("{0} ({1} ms)\n", i / 1_000_000, now - last);
last = now;
}
}
Console.Out.Write("total: {0} ({1} ms)\n", total, Environment.TickCount64 - start);
}
}
}
namespace Internal {
using System.Runtime.CompilerServices;
public interface IPredicate<T> {
bool Invoke(T arg);
}
public static class Extensions {
public static unsafe SpanWhereEnumerable<TSource, TPredicate> Where<TSource, TPredicate>(this ReadOnlySpan<TSource> source, TPredicate predicate)
where TPredicate : IPredicate<TSource> {
return new SpanWhereEnumerable<TSource, TPredicate>(source, predicate);
}
public static unsafe SpanWhereEnumerable<TSource> Where<TSource>(this ReadOnlySpan<TSource> source, delegate*<TSource, bool> predicate) {
return new SpanWhereEnumerable<TSource>(source, predicate);
}
public static unsafe SpanWhereEnumerableDelegate<TSource> Where<TSource>(this ReadOnlySpan<TSource> source, Predicate<TSource> predicate) {
return new SpanWhereEnumerableDelegate<TSource>(source, predicate);
}
public readonly ref struct SpanWhereEnumerable<TSource, TPredicate>
where TPredicate : IPredicate<TSource> {
private readonly ReadOnlySpan<TSource> source;
private readonly TPredicate predicate;
public unsafe SpanWhereEnumerable(ReadOnlySpan<TSource> source, TPredicate predicate) {
this.source = source;
this.predicate = predicate;
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public unsafe int Count() {
int result = 0;
for (var i = 0; source.Length > i; ++i) {
if (predicate.Invoke(source[i])) {
++result;
}
}
return result;
}
}
public readonly ref struct SpanWhereEnumerable<TSource> {
private readonly ReadOnlySpan<TSource> source;
private unsafe readonly delegate*<TSource, bool> predicate;
public unsafe SpanWhereEnumerable(ReadOnlySpan<TSource> source, delegate*<TSource, bool> predicate) {
this.source = source;
this.predicate = predicate;
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public unsafe int Count() {
int result = 0;
for (var i = 0; source.Length > i; ++i) {
if (predicate(source[i])) {
++result;
}
}
return result;
}
}
public readonly ref struct SpanWhereEnumerableDelegate<TSource> {
private readonly ReadOnlySpan<TSource> source;
private unsafe readonly Predicate<TSource> predicate;
public unsafe SpanWhereEnumerableDelegate(ReadOnlySpan<TSource> source, Predicate<TSource> predicate) {
this.source = source;
this.predicate = predicate;
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public unsafe int Count() {
int result = 0;
for (var i = 0; source.Length > i; ++i) {
if (predicate(source[i])) {
++result;
}
}
return result;
}
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static unsafe int Count<TSource>(this ReadOnlySpan<TSource> source, delegate*<TSource, bool> predicate) {
int result = 0;
for (var i = 0; source.Length > i; ++i) {
if (predicate(source[i])) {
++result;
}
}
return result;
}
}
} Outpus:
Function pointer
Value-typed devirtualization
A test with the second example Blender.java (https://www.graalvm.org/examples/java-performance-examples/) reveals much worse results. Even |
Beta Was this translation helpful? Give feedback.
-
This test is mixing too much - delegate and potential delegate inlining, small loops, mixture of cold-launching and steady-state performance, and suboptimal timing with
The optimized implementation is still not optimal. Accessing span from field will suffer from bound check (and it's required because the delegate has way change it), while Since .NET 5, the JIT has employed more optimization around loops, like code alignment. Posting performance issue to discussions is unlikely to get attention from the team. |
Beta Was this translation helpful? Give feedback.
-
Long-running small loops are generally ~30% slower than equivalent Java codes running on GraalVM. :(
Beta Was this translation helpful? Give feedback.
All reactions