-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid match offsets with optional groups #7
Comments
Yeah; this is a confusing issue to solve, moreso do to ambiguity in semantics than anything, for instance what should be the behaviour of something like: I've had an ignored test case for something like this sitting around, maybe now's a good time to figure out what the semantics should be haha. Following Can you add a bit of detail about the types of cases you'll be using it so I make sure I cover all the cases in the test suite? Thanks! |
Looks like this is actually affecting the match groups too:
Can hopefully solve them both at once 🤞 |
Hello. I need to parse github urls, using something such as: >>> fileContent = "hello https://github.com/foo/bar/issues/123 hello I'm a cow https://github.com/guibou/PyF/issues/12345 https://www.github.com/Foo/Bar/issues/1234"
>>> mapM_ print $ toListOf (regex [rx|(https?://)?(www.)?github.com/(.*?)/(.*?)/issues/([0-9]+)|] . matchAndGroups) fileContent
("https://github.com/foo/bar/issues/123",["http://","","/12","",""])
("https://github.com/guibou/PyF/issues/12345",["https://","","","",""])
("https://www.github.com/Foo/Bar/issues/1234",["https://","www.","Foo","Bar","1234"]) My issue is that non-optional capturing groups can be empty when they should not or may not be empty but wrong. Especially see the Note that if the "optional" groups are set to non capturing (with
If I want to capture
In this implementation, all "optional" groups are not capturing. Actually that's satisfying.
Here
Which is not intuitive, but at least groups contains values which represents the content of actual groups as defined by the regex. That one is weird too ;):
Thank to our discussion, I'm now convinced that the problem is more fundamental and cannot be easily solved and the best solution is to rewrite the regex in a non-ambiguous way. However I agree with you that it may be nice to follow the result of |
Thanks for the explanation, I'm glad you found something that works for you! I've adapted the behaviour for optional groups to be more in line with pcre-heavy on latest head, so you can use: e83feec if you need the new behaviour.. I'm going to investigate a little further before cutting a new release because it looks like there are some issues with handling nested groups (which is what's showing up in your last example there: I'll leave this open until I cut a release; let me know if you discover anything else. Hope the library is working out for you despite the issues! |
Thank you. The fix indeed works! The library is indeed wonderful and works for most of my needs. For this specific topic I ended using raw |
Shipped an update: http://hackage.haskell.org/package/lens-regex-pcre-1.1.0.0 Also adds traversals for named groups which might also help with this 👍 Cheers! |
The following example is invalid:
I was expecting
("AB", ["", "B"])
. Note that matchingAxB
returns the correct result[("AxB", ["x", "B"])]
and changing the regex toAx?(B)
returns the correct content for the last group.Note that the same issue happen with the
Text
andByteString
interfaces on version1.0.0.0
.pcre-heavy
does not have this problem:The text was updated successfully, but these errors were encountered: