-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a SearchValues ProbabilisticMap implementation that uses an ASCII fast path #89155
Add a SearchValues ProbabilisticMap implementation that uses an ASCII fast path #89155
Conversation
Tagging subscribers to this area: @dotnet/area-system-buffers Issue DetailsRelated: #89140 This is an implementation that we would use when we have a mixed set of ASCII and non-ASCII characters. For For the cases where the probabilistic map is not vectorized ( The throughput difference between the ASCII fast path and the probabilistic map is large, but it might shrink somewhat once we add AVX512 support to these types, as AVX512 can use a more efficient ProbMap approach (similar to ARM64). ReplaceLineEndings benchmarks
IndexOfAnyExcept and ASCII early-match overhead benchmarksThis is for a set of only 6 values. The more values you use, the larger the factor would be for
To highlight the difference when finding early matches with ASCII text:
With all that in mind, do we still want to do this?
|
I'd like to see this change be merged, as not all target-machines have AVX-512 (now). |
src/libraries/System.Private.CoreLib/src/System/SearchValues/ProbabilisticCharSearchValues.cs
Show resolved
Hide resolved
...ies/System.Private.CoreLib/src/System/SearchValues/ProbabilisticWithAsciiCharSearchValues.cs
Show resolved
Hide resolved
What is the AsciiText=False case in the benchmark? Is that with the input being entirely non-ASCII so that every call for the ASCII search immediately fails and is pure overhead? |
Yes, |
Related: #89140
This is an implementation that we would use when we have a mixed set of ASCII and non-ASCII characters.
It's similar to the existing
ProbabilisticMap
implementation, but with a fast path that only looks for the ASCII characters with a faster implementation first.For
IndexOfAny
, theReplaceLineEndings
benchmark shows large improvements for ASCII inputs, and asNewLineFrequency
increases, it highlights the worst-case per-call overhead the new fast path introduces for non-ASCII texts.For the cases where the probabilistic map is not vectorized (
IndexOfAnyExcept
,LastIndexOfAny
,LastIndexOfAnyExcept
), this brings massive improvements with the worst-case of relatively minor regressions.The
-Except
variants even more so as they otherwise fall back toO(n * m)
implementations.The throughput difference between the ASCII fast path and the vectorized probabilistic map is large, but it might shrink somewhat once we add AVX512 support to these types, as AVX512 can use a more efficient ProbMap approach (similar to ARM64).
ReplaceLineEndings benchmarks
This was done with a CR => CR replace, such that there is no allocation overhead, to highlight the
SearchValues
part of the cost.IndexOfAnyExcept and ASCII early-match overhead benchmarks
This is for a set of only 6 values. The more values you use, the larger the factor would be for
IndexOfAnyExcept
.To highlight the difference when finding early matches with ASCII text:
With all that in mind, do we still want to do this?