-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SearchValues<char> implementation for two sets of 128 chars #103216
Add SearchValues<char> implementation for two sets of 128 chars #103216
Conversation
Tagging subscribers to this area: @dotnet/area-system-buffers |
is it worth adding one or two more non ASCII? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
3eb335f
to
a10c8ae
Compare
@MihuBot benchmark RustLang_Sherlock https://github.com/MihaZupan/performance/tree/compiled-regex-only -medium |
@MihuBot fuzz SearchValues |
This comment was marked as outdated.
This comment was marked as outdated.
@MihaZupan you need to either omit BenchmarkRunner.Run<Bench>(args); Otherwise |
Aaah right, thanks. I also forgot to remove the |
@EgorBot -intel using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Buffers;
#nullable disable
public class Bench
{
private static readonly SearchValues<char> _allowedAscii = SearchValues.Create("1234567890abcdefghijklmnopqrstuvwxyz");
private static readonly SearchValues<char> _allowedMixed = SearchValues.Create("äöü1234567890abcdefghijklmnopqrstuvw");
private string _asciiInput;
private string _mixedInput;
[Params(16, 100, 10_000)]
public int Length;
[GlobalSetup]
public void Setup()
{
_asciiInput = new string('a', Length);
_mixedInput = 'ä' + new string('a', Length - 1);
}
[Benchmark]
public bool ContainsOnlyAscii() => !_asciiInput.AsSpan().ContainsAnyExcept(_allowedAscii);
[Benchmark]
public bool ContainsOnlyMixed() => !_mixedInput.AsSpan().ContainsAnyExcept(_allowedMixed);
} |
This comment was marked as outdated.
This comment was marked as outdated.
Benchmark results on Intel
|
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock
|
src/libraries/System.Private.CoreLib/src/System/SearchValues/IndexOfAnyAsciiSearcher.cs
Show resolved
Hide resolved
Vector512<byte> secondBitmap512 = state.SecondBitmap512; | ||
Vector512<ushort> offset512 = Vector512.Create(state.Offset); | ||
|
||
if (searchSpaceLength > 2 * Vector512<short>.Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GrabYourPitchforks, in the UTF8 experiment you shared with me, you had code that, after having validated that Vector256 was hardware accelerated, aligned an address and then read a full vector, without concern for whether that vector read under or overread the target region. Is that safe to do on all platforms? It seems we could avoid some branching with similar techniques in many of our implementations that use vectorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that safe to do on all platforms?
As long as the data is pinned, otherwise GC can interrupt at any point and break the alignment assumption
f378aca
to
e747fd0
Compare
Does this mean someone searching mostly ASCII text with IgnoreCase for [a-z] will see a 50% regression? |
If you had a long run of text that was just ASCII and didn't match, yes, it would be slower. I reran Sherlock, and the throughput difference doesn't seem to be affecting it: MihuBot/runtime-utils#505 (comment) We could also choose to use the two bitmaps approach only on |
Any concerns with merging this one as-is and seeing if benchmarks complain? |
Ok, let's give it a try but be ready to back it out if any meaningful regressions pop up. |
#101001 significantly improved the performance of the non-vectorized
-Except
paths of non-ASCIISearchValues<char>
.However, they are still not vectorized, and this PR changes that for values where the non-ASCII part can fit into a 128-bit bitmap.
This adds an implementation that's almost the same as the
AsciiCharSearchValues
, but where the core lookup checks against two 128-bit bitmaps, with the second one at a variable offset:(All the numbers below are using the Avx2 path of the new implementation -- measured before #103710)
Avx2 where the text is mostly non-ASCII (previous 'Mixed' would use the prob map / scalar fallback)
Avx512 machine (the probabilistic map is a lot faster than on Avx2)
Early matches
From that, we can see that the Ascii-only search is at ~1.5x the throughput of the two bitmaps impl.
The two bitmaps have ~1.5x the throughput of the probabilistic map on Avx512 and ~2.5x on Avx2.
In other words, this change is a throughput regression for
ProbabilisticWithAsciiCharSearchValues
if the text is all ASCII, and an improvement otherwise. It's always chepaer for early matches though.For the
-Except
paths whereProbabilisticWithAsciiCharSearchValues
uses a scalar fallback, the two bitmaps approach is obviously a lot (10x+) faster.The change doesn't seem to impact existing Regex benchmarks much, likely because they're very focused on ASCII.
I also tried different implementation approaches to try and reduce code duplication between the existing Ascii and "ascii with second set" implementations:
AsciiState
, which does save some duplication, but increases the memory consumption of all existing Ascii-onlySearchValues
.Looking at patterns from Regex_RealWorldPatterns.json, ~75% of non-ASCII sets would use the new implementation over the Ascii+ProbMap, most of which because of the kelvin sign.
Parsed data for the above:
Regex_RealWorldPatterns.SearchValues.json