Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable regex to use IndexOf(..., OrdinalIgnoreCase) for prefix searching #85438

Merged
merged 3 commits into from
May 1, 2023

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Apr 27, 2023

As one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes.

With #85437, here's the benchmark https://github.com/dotnet/performance/blob/6dccc9979e9a99ebabee2a9b8b9e657c08c3f4a0/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Industry.cs#L86 on my machine:

Method Toolchain Options Mean Error StdDev Median Min Max Ratio
Count \main\corerun.exe IgnoreCase, Compiled 1,708.2 ms 17.58 ms 2.72 ms 1,708.6 ms 1,704.7 ms 1,711.2 ms 1.00
Count \pr\corerun.exe IgnoreCase, Compiled 769.1 ms 14.94 ms 3.88 ms 766.7 ms 765.8 ms 773.8 ms 0.45

Note that without #85437, this PR will result in some usage being slower, as the compiler / source generator is already doing the same approach as IndexOf(..., OrdinalIgnoreCase) does today of searching for a set of characters with IndexOfAny, but it's frequently picking a better set of characters to search for based on frequency analysis. So we shouldn't merge this without the other PR (though this does have other benefits, like simpler codegen).

As one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it.  With this change, it can also do so for OrdinalIgnoreCase.  With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes.
@ghost
Copy link

ghost commented Apr 27, 2023

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

As one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes.

With #85437, here's the benchmark https://github.com/dotnet/performance/blob/6dccc9979e9a99ebabee2a9b8b9e657c08c3f4a0/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Industry.cs#L86 on my machine:

Method Toolchain Options Mean Error StdDev Median Min Max Ratio
Count \main\corerun.exe IgnoreCase, Compiled 1,708.2 ms 17.58 ms 2.72 ms 1,708.6 ms 1,704.7 ms 1,711.2 ms 1.00
Count \pr\corerun.exe IgnoreCase, Compiled 769.1 ms 14.94 ms 3.88 ms 766.7 ms 765.8 ms 773.8 ms 0.45
Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance

Milestone: 8.0.0

@stephentoub stephentoub merged commit b4ecf10 into dotnet:main May 1, 2023
@stephentoub stephentoub deleted the regexcaseinsensitiveprefix branch May 1, 2023 16:56
Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for the delay.

@ghost ghost locked as resolved and limited conversation to collaborators Jun 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants