[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM #104975

performanceautofiler · 2024-07-16T09:16:56Z

Run Information

Name	Value
Architecture	x64
OS	ubuntu 22.04
Queue	ViperUbuntu
Baseline	8ba8249272917366e2382bb4c67a2347d19d2fb6
Compare	cae3aec80d32316026d85228e12a94dd4887d57f
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Count - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	45.62 ms	48.73 ms	1.07	0.03	False
Count - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	44.85 ms	48.43 ms	1.08	0.03	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig*'

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{2,4}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{0,2}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

dotnet-policy-service · 2024-07-16T16:02:17Z

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

DrewScoggins · 2024-07-16T16:04:00Z

Seems related to this pr, #102655.

veanes · 2024-07-16T16:30:39Z

My guess is that perf regression by 7%-8% on this (Twain) benchmark is related to a tradeoff in search heuristics
that on the other hand boosted other benchmarks, but I asked @ieviev to take a closer look into this too.

LoopedBard3 · 2024-07-16T16:41:29Z

Related Regressions:

[Perf] Windows/x64: 8 Regressions on 7/11/2024 6:16:28 PM perf-autofiling-issues#38297
[Perf] Linux/x64: 4 Regressions on 7/12/2024 7:02:19 PM perf-autofiling-issues#38275
[Perf] Windows/arm64: 1 Regression on 7/11/2024 4:22:20 PM perf-autofiling-issues#38531
[Perf] Linux/arm64: 4 Regressions on 7/11/2024 4:22:20 PM perf-autofiling-issues#38485

ericstj · 2024-07-17T15:06:02Z

cc @stephentoub @veanes @ieviev

ieviev · 2024-07-17T15:24:04Z

Some trade-offs were definitely expected, I will have a closer look at the benchmarks soon.

There's one substantial regression in dotnet/perf-autofiling-issues#38275, going from 55.81 μs to 76.42 μs with (?m)^Sherlock Holmes|Sherlock Holmes$.

I remember the patterns .{0,2}(Tom|Sawyer|Huckleberry|Finn) and .{2,4}(Tom|Sawyer|Huckleberry|Finn) were faster than before, perhaps it was from some changes along the way.

ieviev · 2024-07-17T21:55:23Z

This reproduces on my machine as well:

The problem here is that we're doing more initial checks per match to speed up the inner loop which causes the 5% slowdown here, but we never really use the inner loop to its full extent either. These two benchmarks could be ~25% faster with findOptimizations disabled.

The change would be to turn this line here:

runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs

Line 213 in 27776e2

_findOpts = findOptimizations;

to

if (findOptimizations.FindMode is FindNextStartingPositionMode.LeadingSet_LeftToRight or FindNextStartingPositionMode.FixedDistanceSets_LeftToRight)
{
    if (findOptimizations.FixedDistanceSets![0].Negated)
    {
        _findOpts = null;
    }
}
else
{
    _findOpts = findOptimizations;
}

This would disable avx for searching negated sets like [^\n]. In the end of the PR i decided to avoid special casing since it's complex and hard to know when it's slower/faster but perhaps turning it off for large negated sets is reliable enough and speeds things up in most cases. Generally i think the speedup in other cases is big enough to justify some losses around 5% here and there, but there will be input-dependent tradeoffs either way.

@stephentoub thoughts on special casing this?

ieviev · 2024-07-17T22:08:02Z

There's one substantial regression in dotnet/perf-autofiling-issues#38275, going from 55.81 μs to 76.42 μs with (?m)^Sherlock Holmes|Sherlock Holmes$.

This does not reproduce on my machine at all, can someone else confirm this?

stephentoub · 2024-07-22T02:22:07Z

@stephentoub thoughts on special casing this?

I'd be ok special-casing negated sets for non-backtracking, if we measure on our known set of benchmarks and demonstrate that it does more good than harm. I'd prefer, however, we do it as part of actually picking the optimizations, e.g. at

runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFindOptimizations.cs

Lines 223 to 239 in fcb9b18

    
           else 
        
           { 
        
               // Limit how many sets we use to avoid doing lots of unnecessary work.  The list was already 
        
               // sorted from best to worst, so just keep the first ones up to our limit. 
        
               const int MaxSetsToUse = 3; // arbitrary tuned limit 
        
               if (fixedDistanceSets.Count > MaxSetsToUse) 
        
               { 
        
                   fixedDistanceSets.RemoveRange(MaxSetsToUse, fixedDistanceSets.Count - MaxSetsToUse); 
        
               } 
        
               // Store the sets, and compute which mode to use. 
        
               FixedDistanceSets = fixedDistanceSets; 
        
               FindMode = (fixedDistanceSets.Count == 1 && fixedDistanceSets[0].Distance == 0) ? 
        
                   FindNextStartingPositionMode.LeadingSet_LeftToRight : 
        
                   FindNextStartingPositionMode.FixedDistanceSets_LeftToRight; 
        
               _asciiLookups = new uint[fixedDistanceSets.Count][]; 
        
           }

rather than subsequently determining in the non-backtracking engine whether to store the ones already picked.

stephentoub · 2024-07-22T03:11:14Z

That said, I'm still nervous about possible ramifications of turning off the find optimizations. It's quite possible it's the right choice, but doing it in response to this feels incongruent. We were using these find optimizations with these patterns both before and after the recent round of non-backtracking optimizations, right? So the problem is actually what's cited as "The problem here is that we're doing more initial checks per match to speed up the inner loop which causes the 5% slowdown here"... there's no way to re-reduce that overhead? Which are the initial checks you're referring to?

ieviev · 2024-07-22T20:29:44Z

@stephentoub

That said, I'm still nervous about possible ramifications of turning off the find optimizations. It's quite possible it's the right choice, but doing it in response to this feels incongruent. We were using these find optimizations with these patterns both before and after the recent round of non-backtracking optimizations, right? So the problem is actually what's cited as "The problem here is that we're doing more initial checks per match to speed up the inner loop which causes the 5% slowdown here"... there's no way to re-reduce that overhead? Which are the initial checks you're referring to?

Yes, this is the same as before. My initial thoughts were that the overhead could be from the extra work done entering/leaving this function per-match but i'm uncertain about this.

runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs

Lines 625 to 729 in 2746c03

    
           private bool FindEndPositionDeltasDFAOptimized<TInitialStateHandler, TOptimizedNullabilityHandler>( 
        
               ReadOnlySpan<char> input, int lengthMinus1, RegexRunnerMode mode, 
        
               long timeoutOccursAt, ref int posRef, ref int currentStateIdRef, ref int endPosRef) 
        
               where TInitialStateHandler : struct, IInitialStateHandler 
        
               where TOptimizedNullabilityHandler : struct, IDfaNoZAnchorOptimizedNullabilityHandler 
        
           { 
        
               // Initial check for input end lifted out of the subsequent hot-path loop. 
        
               if (posRef == input.Length) 
        
               { 
        
                   if (_stateArray[currentStateIdRef]!.IsNullableFor(_positionKinds[0])) 
        
                   { 
        
                       // the end position kind was nullable 
        
                       endPosRef = posRef; 
        
                   } 
        
                   return true; 
        
               } 
        
               // To avoid frequent reads/writes to ref and out values, make and operate on local copies, which we then copy back once before returning. 
        
               int pos = posRef; 
        
               int currStateId = currentStateIdRef; 
        
               int endPos = endPosRef; 
        
               byte[] mintermsLookup = _mintermClassifier.ByteLookup!; 
        
               int deadStateId = _deadStateId; 
        
               int initialStateId = _initialStateId; 
        
               try 
        
               { 
        
                   // The goal is to make this loop as fast as it can possibly be, 
        
                   // every single piece of overhead should be removed here 
        
                   while (true) 
        
                   { 
        
                       if (currStateId == deadStateId) 
        
                       { 
        
                           return true; 
        
                       } 
        
                       if (TInitialStateHandler.IsOptimized && currStateId == initialStateId) 
        
                       { 
        
                           TInitialStateHandler.TryFindNextStartingPosition(this, input, ref currStateId, ref pos, mintermsLookup); 
        
                           if (pos == input.Length) 
        
                           { 
        
                               // patterns such as ^$ can be nullable right away 
        
                               if (_stateArray[currStateId]!.IsNullableFor(_positionKinds[0])) 
        
                               { 
        
                                   // the end position kind was nullable 
        
                                   endPos = pos; 
        
                               } 
        
                               currStateId = deadStateId; 
        
                               return true; 
        
                           } 
        
                       } 
        
                       // Get the next character. 
        
                       char c = input[pos]; 
        
                       // If the state is nullable for the next character, we found a potential end state. 
        
                       if (TOptimizedNullabilityHandler.IsNullable(this, _nullabilityArray[currStateId], c, mintermsLookup)) 
        
                       { 
        
                           endPos = pos; 
        
                           // A match is known to exist.  If that's all we need to know, we're done. 
        
                           if (mode == RegexRunnerMode.ExistenceRequired) 
        
                           { 
        
                               return true; 
        
                           } 
        
                       } 
        
                       // If there is more input available try to transition with the next character. 
        
                       // Note: the order here is important so the transition itself gets taken 
        
                       if (!DfaStateHandler.TryTakeDFATransition(this, ref currStateId, GetMintermId(mintermsLookup, c), timeoutOccursAt) || 
        
                           pos >= lengthMinus1) 
        
                       { 
        
                           if (pos + 1 < input.Length) 
        
                           { 
        
                               return false; 
        
                           } 
        
                           pos++; 
        
                           // one off check for the final position 
        
                           // this is just to move it out of the hot loop 
        
                           if (!(_stateFlagsArray[currStateId].IsNullable() || 
        
                                 _stateArray[currStateId]!.IsNullableFor(GetPositionKind(-1)))) 
        
                           { 
        
                               return true; 
        
                           } 
        
                           // the end position (-1) was nullable 
        
                           endPos = pos; 
        
                           return true; 
        
                       } 
        
                       // We successfully transitioned, so update our current input index to match. 
        
                       pos++; 
        
                   } 
        
               } 
        
               finally 
        
               { 
        
                   // Write back the local copies of the ref values. 
        
                   posRef = pos; 
        
                   endPosRef = endPos; 
        
                   currentStateIdRef = currStateId; 
        
               } 
        
           }

What i wanted to point out that if the performance of this individual benchmark is important then a lot more could be gained elsewhere. I understand it's better for maintainability overall to not make the engine too different from the rest, so it's definitely OK leaving it on as well.

Profiling said that 99% of the time is spent in FindEndPositionDeltasDFAOptimized, so the only sources of extra loop overhead compared to the pre-PR state i can think of could be this if case after prefix search here:

runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs

Lines 663 to 675 in 2746c03

    
           TInitialStateHandler.TryFindNextStartingPosition(this, input, ref currStateId, ref pos, mintermsLookup); 
        
           if (pos == input.Length) 
        
           { 
        
               // patterns such as ^$ can be nullable right away 
        
               if (_stateArray[currStateId]!.IsNullableFor(_positionKinds[0])) 
        
               { 
        
                   // the end position kind was nullable 
        
                   endPos = pos; 
        
               } 
        
               currStateId = deadStateId; 
        
               return true; 
        
           }

Another source of overhead could be memory: the minterm lookup array used to be strictly 128 elements for ascii, and the benchmark input only contains ascii, i think . here allocates more memory and perhaps uses a different level of cache.

With that said, the performance loss itself the way i see it is marginal. If there's no large performance losses and in exchange we increased performance somewhere else then it's fine leaving things as is. If ASCII performance is vital then making the engine work on bytes directly would benefit a lot more. if Regex.Count performance is vital then it could be a separate loop that doesn't do extra work per-match. If consistency is vital then findOptimizations could always be off to work like re2. It's not possible to win in all cases without making sacrifices somewhere.

veanes · 2024-07-26T22:31:24Z

Let me also add here that I agree with Ian here. I don't see it worthwhile to special case it and think we could leave it as is. I am going to meet with Ian in a couple of weeks in Estonia where one thing we will discuss is further steps for other optimizations.

stephentoub · 2024-08-03T20:15:56Z

The regressions were indeed fixed (more than fixed) by #105668:

veanes · 2024-08-05T02:22:16Z

Awesome that you figured out the fix for these regressions Stephen. Thank you!

performanceautofiler bot added arch-x64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Jul 16, 2024

performanceautofiler bot mentioned this issue Jul 16, 2024

[SENTINEL] Autofile run complete at 7/16/2024 9:42:04 AM. 20 issues filed. dotnet/perf-autofiling-issues#38384

Closed

DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Jul 16, 2024

dotnet-issue-labeler bot added the area-System.Text.RegularExpressions label Jul 16, 2024

DrewScoggins changed the title ~~[Perf] Linux/x64: 2 Regressions on 7/11/2024 6:16:28 PM~~ [Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM Jul 16, 2024

DrewScoggins assigned veanes and stephentoub Jul 16, 2024

ericstj removed the untriaged New issue has not been triaged by the area owner label Jul 17, 2024

ericstj added this to the 9.0.0 milestone Jul 17, 2024

stephentoub mentioned this issue Jul 30, 2024

Remove some overhead from SymbolicRegexMatcher #105668

Merged

stephentoub closed this as completed in #105668 Jul 30, 2024

DrewScoggins mentioned this issue Aug 22, 2024

Performance Fundamental: Net 8 -> Net 9 Manual Comparison Report #106824

Closed

github-actions bot locked and limited conversation to collaborators Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM #104975

[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM #104975

performanceautofiler bot commented Jul 16, 2024

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{2,4}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{0,2}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)

ETL Files

Histogram

JIT Disasms

Docs

dotnet-policy-service bot commented Jul 16, 2024

DrewScoggins commented Jul 16, 2024

veanes commented Jul 16, 2024

LoopedBard3 commented Jul 16, 2024 •

edited

Loading

ericstj commented Jul 17, 2024

ieviev commented Jul 17, 2024

ieviev commented Jul 17, 2024

ieviev commented Jul 17, 2024

stephentoub commented Jul 22, 2024 •

edited

Loading

stephentoub commented Jul 22, 2024

ieviev commented Jul 22, 2024 •

edited

Loading

veanes commented Jul 26, 2024

stephentoub commented Aug 3, 2024

veanes commented Aug 5, 2024

[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM #104975

[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM #104975

Comments

performanceautofiler bot commented Jul 16, 2024

Run Information

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig

Repro

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{2,4}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)

ETL Files

Histogram

JIT Disasms

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{0,2}(Tom|Sawyer|Huckleberry|Finn)", Options: NonBacktracking)

ETL Files

Histogram

JIT Disasms

Docs

dotnet-policy-service bot commented Jul 16, 2024

DrewScoggins commented Jul 16, 2024

veanes commented Jul 16, 2024

LoopedBard3 commented Jul 16, 2024 • edited Loading

ericstj commented Jul 17, 2024

ieviev commented Jul 17, 2024

ieviev commented Jul 17, 2024

ieviev commented Jul 17, 2024

stephentoub commented Jul 22, 2024 • edited Loading

stephentoub commented Jul 22, 2024

ieviev commented Jul 22, 2024 • edited Loading

veanes commented Jul 26, 2024

stephentoub commented Aug 3, 2024

veanes commented Aug 5, 2024

LoopedBard3 commented Jul 16, 2024 •

edited

Loading

stephentoub commented Jul 22, 2024 •

edited

Loading

ieviev commented Jul 22, 2024 •

edited

Loading