Count substring occurrences in a string #73441
-
As the title describes, I find an interesting question. using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser]
public class SubStringCountTest
{
private readonly string Source;
private readonly string Value;
public SubStringCountTest()
{
Source = "eteeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeteeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeete";
Value = "ete";
}
[Benchmark(Baseline = true)]
public int Replace_Count()
{
string strReplaced = Source.Replace(Value, "");
return (Source.Length - strReplaced.Length) / Value.Length;
}
[Benchmark]
public int IndexOf_Count()
{
int count = 0;
int index = Source.IndexOf(Value);
while (index >= 0)
{
count++;
index = Source.IndexOf(Value, index + Value.Length);
}
return count;
}
[Benchmark]
public int Regex_Count()
{
return Regex.Matches(Source, Value).Count;
}
} The benchmark result is that, BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1826 (21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.302
[Host] : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
DefaultJob : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
I'm surprised about the result which tells |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Culture-aware IndexOf is slow by definition, it uses ICU under the hood. If it works for you consider setting Ordinal mode, e.g.: public int IndexOf_Count()
{
int count = 0;
int index = Source.IndexOf(Value, StringComparison.Ordinal);
while (index >= 0)
{
count++;
index = Source.IndexOf(Value, index + Value.Length, StringComparison.Ordinal);
}
return count;
} However, for the best performance you need .NET 7.0 where we introduced a new algorithm for IndexOf for substrings - #63285
|
Beta Was this translation helpful? Give feedback.
-
This time I get a more reasonable result. using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser]
public class SubStringCountTest
{
private readonly string Source;
private readonly string Value;
public SubStringCountTest()
{
Source = "eteeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeteeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeete";
Value = "ete";
}
[Benchmark(Baseline = true)]
public int Replace_Count()
{
string strReplaced = Source.Replace(Value, "", StringComparison.Ordinal);
return (Source.Length - strReplaced.Length) / Value.Length;
}
[Benchmark]
public int IndexOf_Count()
{
int count = 0;
int index = Source.IndexOf(Value, StringComparison.Ordinal);
while (index >= 0)
{
count++;
index = Source.IndexOf(Value, index + Value.Length, StringComparison.Ordinal);
}
return count;
}
[Benchmark]
public int Regex_Count()
{
return Regex.Matches(Source, Value).Count;
}
} The result:
But if I modify That's really interesting and I can't wait to see the result on .NET 7. |
Beta Was this translation helpful? Give feedback.
Culture-aware IndexOf is slow by definition, it uses ICU under the hood. If it works for you consider setting Ordinal mode, e.g.:
However, for the best performance you need .NET 7.0 where we introduced a new algorithm for IndexOf for substrings - #63285