Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour in matchall #26049

Closed
dataPulverizer opened this issue Feb 14, 2018 · 5 comments · Fixed by #26071
Closed

Unexpected behaviour in matchall #26049

dataPulverizer opened this issue Feb 14, 2018 · 5 comments · Fixed by #26071
Labels
search & find The find* family of functions

Comments

@dataPulverizer
Copy link

For example the code below returns the captured () SubString with the rest of the expression rather than just the capture request.

matchall(r"<tag>(.*?)</tag>", "<tag>I</tag> <tag>like</tag> <tag>writing</tag> <tag>code</tag>")
# 4-element Array{SubString{String},1}:
#  "<tag>I</tag>"      
#  "<tag>like</tag>"   
#  "<tag>writing</tag>"
#  "<tag>code</tag>"

The expected behaviour should return:

"I"
"like"
"writing"
"code"

Or maybe:

["I"]
["like"]
["writing"]
["code"]

To account for multiple possible captures in each match.

I am currently using the eachmatch function as an alternative.

@nalimilan
Copy link
Member

AFAICT it works as documented, i.e. it returns m.match rather than m.captures. I guess the docs could be made more explicit.

As you note, the problem with returning captures is that there can be several of them, and it's going to be inefficient to return a vector of vectors. In that case, better use eachmatch and handle the multiple captures as appropriate.

We could also deprecate matchall since the best behavior isn't completely obvious. It could be changed to return a vector of RegexMatch objects, equivalent to collect(eachmatch(...)). That would be consistent with the to-be-added findeach function proposed in the Search & Find Julep. But then for full consistency match should be called matchfirst.

@nalimilan nalimilan added the search & find The find* family of functions label Feb 14, 2018
@sschelm
Copy link

sschelm commented Feb 14, 2018

Another issue to consider is the return type discrepancy between match and matchall.
I find that confusing, despite the correct documentation.

@dataPulverizer
Copy link
Author

@sschelm that's a good point! matchall should return a Array{RegEx} type object.

@StefanKarpinski
Copy link
Sponsor Member

Let's just deprecate it. We can add something else back later.

@nalimilan
Copy link
Member

See #26071.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
search & find The find* family of functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants