-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexp/syntax: add Cut #44254
Comments
I would use this feature for config files taking use input. It would be great for my end users. Currently I force them to cope with single/double quoting rules, which is not good when they need to match on quotes. By allowing Instead of “remainder of string” I would prefer returning bytes consumed. |
I'm fine either way, but I'm curious what your rationale for preferring bytes consumed is. It seems like one would always take that count and just slice the string to get the remainder. |
I've wanted something like this in the past to support handling
Given how package syntax
func ParseUntil(s string, flags Flags, delim string) (_ *Regexp, rest string, err error)
package regexp
func CompileSyntax(re *syntax.Regexp, longest bool) (*Regexp, error) This has some advantages:
|
This proposal has been added to the active column of the proposals project |
We don't have to do the full parse, just enough to find the delimiter.
I think the flags are needed to tell where to stop in Cut("[[:alpha:]/]", "/", ...). |
Any objections to syntax.Cut? |
@carlmjohnson the delimiter is a fixed string and it is skipped when returning The |
I misunderstood how this worked. I withdraw the comment. The signature sounds good. |
Based on the discussion above, this proposal seems like a likely accept. |
@aclements could you update the proposal description to the new function signature? |
Note to self: should use regexp.Cut to fix #39904. |
@rsc If I understand #39904 correctly, |
To be clear,
Any instance of sep that is not itself enclosed in [ ] or ( ) counts as a separator. "Implicit" parens do not matter. |
No change in consensus, so accepted. 🎉 |
Hmm. Now that I go to implement this, I've noticed this is not the same as sed expressions. In sed 's/[/]/slash/g' is a syntax error: the slash inside the square brackets is still treated as splitting the expression. Similarly, typing /(/) in ed or sam is also a syntax error. Package testing is doing something very subtle (but mostly correct) by splitting the regexp at a slash and implementing the search for the two different halves. But it's not doing what tools that incorporate regexps into a larger syntax do. The rationale for adding it was that it would help with tools like that, in addition to testing. But it looks like maybe it only helps with testing. Now on the fence about retracting/declining this proposal. If only testing will use it, it's not that useful. |
I'm less interested using this with testing package (although it would be useful), and more interested in enabling parsing the "s/PATTERN/REPLACEMENT/FLAGS", "/PATTERN/", and "m@PATTERN@" idioms via Go. These idioms are commonly known and I'd like to reuse them in various places. Implementing this manually is potentially error prone or impractical. Perl has documented their approach to finding the end delimeter. Due to differences, it isn't possible to match how all regexp implementations find their end delimeter. I think it would be best for Go to pick the most well reasoned option that is easiest to use / least likely to trip people up. |
May I ask that what is the enum values of Flags, and how it means? |
Note: Current proposal is #44254 (comment)
Regular expressions are often embedded in other languages, and the current regexp package makes it difficult to correctly parse such regexps. Common examples of such embedding include awk, Perl, and Javascript, all of which have a
/regexp/
expression syntax. In Go, this appears in thetesting
package's "-test.run" flag, which is a sequence of/
-separated regexps; in benchstat v2's filter syntax; and in at least one other place @rsc mentioned that's now slipping my mind.In general, this is difficult to implement outside
regexp
itself because the delimiter may appear nested in the regexp. For example, in thetesting
package, the run expressiona[/]b/c
matches subtestc
of top-level tests matchinga[/]b
. The first slash is not a separator because it does not appear at the top level of the regexp. Thetesting
package implements a simple, ad hoc parser for this (splitRegexp
) but it doesn't get every corner case.Since this is now a pattern, the
regexp
package (or perhapsregexp/syntax
) should itself implement a "parse until delimiter" function, which would make it easy to parse regular expressions embedded in a larger syntax.To make a concrete proposal, I propose we add the following function to
regexp/syntax
:I propose this should return the split input string, rather than the parsed regexp, so it can be composed with any other regexp parsing entry point (e.g.,
regexp/syntax.Parse
orregexp.Compile
).I don't think this operation needs to take
Flags
, but I'm not positive./cc @rsc
The text was updated successfully, but these errors were encountered: