-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interaction with backreferences / variable-width escape sequences #17
Comments
@nikic good question. At this moment we are not escaping anything that has any context, similarly Basically, whenever you need context sensitivity you'd do:
Accounting for all context cases sounds problematic. |
@benjamingr I think there is a difference between this case and something like |
@benjamingr It also is possible to just add a non-capturing group around the whole; that way it is context-independent. At that point it's then also enough to have the default set of escapes which we have currently. … Except that it will bork with So, generally, I think that's an unavoidable issue and must be handled by the programmer. |
@benjamingr We don't have non capturing groups, but that's just the "proper" way to delimit that… one also could use Issue is that it's not really possible to have real context sensitive escaping… the only idea I have for that is how e.g. prepared statements work with mysql… you basically have placeholders in your query/regex and then can context-sensitively escape them. |
Can you write a short "proof" illustrating that it is impossible (or hard) Sensitive escaping might be our plan for RegExp.tag in the future (name not On Fri, Jun 19, 2015 at 3:45 PM, Bob Weinand [email protected]
|
I don't see what would be hard about it, just escape (Where "escaping" for digits would mean encoding them.) |
@benjamingr Point is that we cannot reliably prevent that very case @nikic did mention in his initial post. If you insert any characters before the string (note that we cannot escape numbers, that'd mean a backreference), it won't be compatible with a character class as it counts as a symbol there. @nikic How do you want to escape numbers? By their octal representation? Then EDIT: Hmm... okay, we can hex-escape that then… that will work. So, turns out I was wrong a bit and it's indeed possible to properly escape it |
@bwoebi As said, by encoding them. I am not terribly familiar with details of JS regex flavor, but presumably it has some kind of fixed-width escape sequence. Probably |
@nikic yeah, see my EDIT, noticed that just now. |
Numbers like Adding a capturing group will mess with back-references. Other solutions seem to have issues with quantifiers and context sensitive inserts. I don't see a way we can solve this from the library level - at least without parsing the RegExp ourselves first. |
I'm not sure I fully understand all the issues here, but the sense I'm getting is that maybe just going for an escape-everything approach (#15) would solve things? (EDIT: I saw the doc about how other languages don't like that idea, interesting.) |
In any case, I think the important thing to distill out of this thread is a list of failure scenarios for the current spec, and put those into the readme. Then we can judge how problematic those are by asking if they're ridiculous edge cases or actually quite believable. |
Python is moving to a escape only metacharacters approach in its new (Courtesy of Martijn Pieters) What this issue points out is that escaping everything wouldn't work and The two other participants are PHP core contributors who I invited to On Fri, Jun 19, 2015 at 6:13 PM, Domenic Denicola [email protected]
|
Speaking in formal, the suggestion here is basically this: REPLACE
WITH
(*) Potential addition: "And c is the first element of cpList" Btw, just for the record PHP also ignores the issue with backreferences and escapes only meta characters (albeit a larger number). Doing that has a lot of precedent in other languages. |
Definitely @domenic , I'm not sure the discussion has exhausted itself yet though. @nikic this would help with backreferences but would not help for example in matching sets or in other context sensitive cases - for example It would also have the effect of making the regexp created less readable. Wouldn't it make more sense to just insert an empty capturing group into the RegExp if the first literal matches HexDigit? |
@benjamingr @bwoebi Isn't |
@ljharb oh wow yeah, it is. I have no idea how I totally missed that while I used it last week at least twice or when I read the spec for Just prefixing and postfixing the RegExp with |
I'm not sure I see the issue you're referring to here, could you elaborate? Apart from use of |
@nikic as an expert, do you think the readability impact of the resulting regular expression is an important factor? If so - do you think this guarantee is worth it? |
Superseding this with #29 |
Using something like
new RegExp("(foo)\\1" + RegExp.escape(input))
, ifinput
were to start with a number, this would extend the backreference\1
to something like\11
. Does this need to be accounted for?The text was updated successfully, but these errors were encountered: