-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Labels
area-System.Text.RegularExpressionstenet-performancePerformance related issuePerformance related issue
Milestone
Description
Description
When having a regex pattern which contains both leading+trailing anchors and the pattern is of fixed length, it seems to me that we might be missing an opportunity to fail fast earlier.
Take e.g. the regex pattern ^1234$.
Currently the emitted TryFindNextPossibleStartingPosition checks that the input has at least 4 characters to continue.
if (pos <= inputSpan.Length - 4 && pos == 0)When invoking IsMatch("12345") on the generated regex, the flow will be:
- pass
TryFindNextPossibleStartingPositionas the input is longer than 4 characters, - enter
TryMatchAtCurrentPositionand call.StartsWith("1234"), - and first then fail on
5 < slice.Lengthbecause the input was too long.
If we tightened the check in TryFindNextPossibleStartingPosition to "exactly 4 characters", we wouldn't have to enter TryMatchAtCurrentPosition.
If I understand the code correctly, we can detect this situation in EmitTryFindNextPossibleStartingPosition when:
rm.Tree.FindOptimizations.LeadingAnchor is RegexNodeKind.Beginningandrm.Tree.FindOptimizations.FindMode is FindNextStartingPositionMode.TrailingAnchor_FixedLength_LeftToRight_End or FindNextStartingPositionMode.TrailingAnchor_FixedLength_LeftToRight_EndZ
Am I missing something?
Copilot
Metadata
Metadata
Assignees
Labels
area-System.Text.RegularExpressionstenet-performancePerformance related issuePerformance related issue