-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Optimize regex patterns with both beginning and end anchors for early fail-fast #120916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement fail-fast optimization for patterns like ^abc\z that have both a leading Beginning anchor and trailing End anchor with fixed length. Instead of checking if input is at least N characters, check if input is exactly N characters AND position is 0. Co-authored-by: stephentoub <[email protected]>
.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFindOptimizations.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
- Remove trailing anchor computation from RegexFindOptimizations.cs - Compute trailing anchor lazily in RegexGenerator.Emitter.cs only when needed - Add same optimization to RegexCompiler.cs for consistency This avoids adding overhead to the interpreter while keeping benefits for compiled/generated code Co-authored-by: stephentoub <[email protected]>
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes regex patterns with both beginning (^
or \A
) and end (\z
) anchors where the pattern has a fixed length. The optimization enables earlier fail-fast by checking if the input length exactly matches the pattern length (instead of just checking minimum length), avoiding unnecessary entry into TryMatchAtCurrentPosition
when the length doesn't match.
- Moved trailing anchor computation from
RegexFindOptimizations.cs
to the emitters (source generator and compiler) to avoid interpreter overhead - Added exact length check for patterns with both beginning and end anchors with fixed length
- Added comprehensive test cases covering exact match, too long, too short, and offset scenarios
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
Regex.Match.Tests.cs | Added test cases verifying the optimization works for various scenarios with fixed-length anchored patterns |
RegexCompiler.cs | Added optimization logic to compiled regexes for exact length checking when both anchors are present |
RegexGenerator.Emitter.cs | Added optimization logic to source-generated regexes for exact length checking when both anchors are present |
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
- Simplify trailing anchor check to directly compare ComputeMaxLength() with MinRequiredLength - Remove unnecessary success label in RegexCompiler and use Bne(returnFalse) directly - Move minRequiredLength declaration after the condition check Co-authored-by: stephentoub <[email protected]>
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
- Remove unnecessary braces around if statements (no longer needed after removing extra local variables) - Update comment in RegexCompiler.cs to use 'minRequiredLength' instead of 'exactLength' - Update comment in RegexGenerator.Emitter.cs to mention both '^' and '\A' anchors (^/\A) Co-authored-by: stephentoub <[email protected]>
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Show resolved
Hide resolved
…variations - Update comment in RegexGenerator.Emitter.cs to use 'leads with' wording - Inline MinRequiredLength directly in Ldc call in RegexCompiler.cs - Add test variations for edge cases: without starting anchor, with Multiline, with \Z/$ anchors, with content before/after anchors Co-authored-by: stephentoub <[email protected]>
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
Issue Summary
Optimize regex patterns with both leading Beginning anchor (^) and trailing End anchor (\z) with fixed length. Instead of checking if input is at least N characters, check if input is exactly N characters AND position is 0.
Implementation Details (Updated)
Modified three files:
The optimization only applies when:
Changes from Previous Versions
Test Results
✅ All unit tests pass (1,005 tests)
✅ All functional tests pass (30,391 tests - includes 56 new tests for edge cases)
✅ No security vulnerabilities detected
Security Summary
No security vulnerabilities were introduced or discovered.
Original prompt
Fixes #118489
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.