-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
RFC: Add match! function for in-place matching #12353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This seems like such an unfortunate API. What about making the RegexMatch object part of the Regex object itself? I mean, what else are you going to do with a regex, after all? #threadsafe |
That's a good idea. Since the we're already allocating just one PCRE-internal buffer for the matches in the |
(Which is to say, not thread safe) |
I think we'd still need something like |
(more |
¡match! |
Another option is to make a breaking API change where |
Or maybe best is to add a new default argument to |
Ah, but the point is that I want the fast version to be the default without any ugly nonsense. Not sure if we can accomplish that, but I'd like to. |
Just brainstorming here, we could also have a new "lightweight" match result object that is stack-allocatable - it only stores offset indices (not substrings), and uses tuples instead of arrays for storing the indices (so the type is parameterized by the number of capture groups). |
Instead of introducing |
Now we're getting somewhere... Since the actual data is in the string itself, which is immutable (by convention), that's not an issue either. In practice, I suspect it's rare to hang onto anything from the match object but the substrings that are matched. |
@StefanKarpinski not sure which of my comments is the one that's 'getting somewhere'. |
I was talking about this idea:
That sounds like the kind of interface we'd want if you can make it work. |
It's difficult to do while maintaining a nice API - if the match object maintains a reference to the regex and subject string, then it's going to be heap-allocated and performance is compromised. But it it doesn't, you'll have to manually dereference the subject with the indices stored in the match object to get the match and captures as substrings (or call some new method that takes both the match and the subject). |
Hi, It looks like this was merged a few year ago, Perhaps this issue can be closed? Matt. |
This is to speed up inner loops that keep calling
match
with the same regex. Here's a benchmark for a hypothetical use case of collecting all letters that occur 1 or 2 characters after 'a' in a document: