- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
Closed
Labels
area-System.Text.RegularExpressionstenet-performancePerformance related issuePerformance related issue
Milestone
Description
Description
When having a regex pattern which contains both leading+trailing anchors and the pattern is of fixed length, it seems to me that we might be missing an opportunity to fail fast earlier.
Take e.g. the regex pattern ^1234$.
Currently the emitted TryFindNextPossibleStartingPosition checks that the input has at least 4 characters to continue.
if (pos <= inputSpan.Length - 4 && pos == 0)When invoking IsMatch("12345") on the generated regex, the flow will be:
- pass TryFindNextPossibleStartingPositionas the input is longer than 4 characters,
- enter TryMatchAtCurrentPositionand call.StartsWith("1234"),
- and first then fail on 5 < slice.Lengthbecause the input was too long.
If we tightened the check in TryFindNextPossibleStartingPosition to "exactly 4 characters", we wouldn't have to enter TryMatchAtCurrentPosition.
If I understand the code correctly, we can detect this situation in EmitTryFindNextPossibleStartingPosition when:
- rm.Tree.FindOptimizations.LeadingAnchor is RegexNodeKind.Beginningand
- rm.Tree.FindOptimizations.FindMode is FindNextStartingPositionMode.TrailingAnchor_FixedLength_LeftToRight_End or FindNextStartingPositionMode.TrailingAnchor_FixedLength_LeftToRight_EndZ
Am I missing something?
Copilot
Metadata
Metadata
Assignees
Labels
area-System.Text.RegularExpressionstenet-performancePerformance related issuePerformance related issue