-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM #104975
Comments
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
Seems related to this pr, #102655. |
My guess is that perf regression by 7%-8% on this (Twain) benchmark is related to a tradeoff in search heuristics |
Some trade-offs were definitely expected, I will have a closer look at the benchmarks soon. There's one substantial regression in dotnet/perf-autofiling-issues#38275, going from 55.81 μs to 76.42 μs with I remember the patterns |
This reproduces on my machine as well: The problem here is that we're doing more initial checks per match to speed up the inner loop which causes the 5% slowdown here, but we never really use the inner loop to its full extent either. These two benchmarks could be ~25% faster with findOptimizations disabled. The change would be to turn this line here: Line 213 in 27776e2
if (findOptimizations.FindMode is FindNextStartingPositionMode.LeadingSet_LeftToRight or FindNextStartingPositionMode.FixedDistanceSets_LeftToRight)
{
if (findOptimizations.FixedDistanceSets![0].Negated)
{
_findOpts = null;
}
}
else
{
_findOpts = findOptimizations;
} This would disable avx for searching negated sets like @stephentoub thoughts on special casing this? |
This does not reproduce on my machine at all, can someone else confirm this? |
I'd be ok special-casing negated sets for non-backtracking, if we measure on our known set of benchmarks and demonstrate that it does more good than harm. I'd prefer, however, we do it as part of actually picking the optimizations, e.g. at Lines 223 to 239 in fcb9b18
rather than subsequently determining in the non-backtracking engine whether to store the ones already picked. |
That said, I'm still nervous about possible ramifications of turning off the find optimizations. It's quite possible it's the right choice, but doing it in response to this feels incongruent. We were using these find optimizations with these patterns both before and after the recent round of non-backtracking optimizations, right? So the problem is actually what's cited as "The problem here is that we're doing more initial checks per match to speed up the inner loop which causes the 5% slowdown here"... there's no way to re-reduce that overhead? Which are the initial checks you're referring to? |
Yes, this is the same as before. My initial thoughts were that the overhead could be from the extra work done entering/leaving this function per-match but i'm uncertain about this. Lines 625 to 729 in 2746c03
What i wanted to point out that if the performance of this individual benchmark is important then a lot more could be gained elsewhere. I understand it's better for maintainability overall to not make the engine too different from the rest, so it's definitely OK leaving it on as well. Profiling said that 99% of the time is spent in Lines 663 to 675 in 2746c03
Another source of overhead could be memory: the minterm lookup array used to be strictly 128 elements for ascii, and the benchmark input only contains ascii, i think With that said, the performance loss itself the way i see it is marginal. If there's no large performance losses and in exchange we increased performance somewhere else then it's fine leaving things as is. If ASCII performance is vital then making the engine work on bytes directly would benefit a lot more. if Regex.Count performance is vital then it could be a separate loop that doesn't do extra work per-match. If consistency is vital then findOptimizations could always be off to work like re2. It's not possible to win in all cases without making sacrifices somewhere. |
Let me also add here that I agree with Ian here. I don't see it worthwhile to special case it and think we could leave it as is. I am going to meet with Ian in a couple of weeks in Estonia where one thing we will discuss is further steps for other optimizations. |
The regressions were indeed fixed (more than fixed) by #105668: |
Awesome that you figured out the fix for these regressions Stephen. Thank you! |
Run Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{2,4}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)
ETL Files
Histogram
JIT Disasms
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{0,2}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)
ETL Files
Histogram
JIT Disasms
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: