Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape (whitespace) control characters #30

Closed
bergus opened this issue Jun 28, 2015 · 7 comments
Closed

Escape (whitespace) control characters #30

bergus opened this issue Jun 28, 2015 · 7 comments

Comments

@bergus
Copy link
Contributor

bergus commented Jun 28, 2015

  • Improves readability
  • Avoids issues with eval (and many other functions based on common assumptions)

Specifically, I'm looking at linebreaks:

> /\n/g.toString()
"/\n/"
> new RegExp("\\n").toString()
"/\n/"
> new RegExp("\n").toString()
"/
/"

I'd love RegExp.escape("\n") to yield "\\n", not the linebreak "\n". Same for all other control characters code units like \r and \t. The algorithm might be based on JSON string escaping, though not using the short form for backspace (\b).

@benjamingr
Copy link
Collaborator

I'm in favour if you can find me one example in real code where this would help someone

@domenic
Copy link
Member

domenic commented Jun 29, 2015

This seems automatically bad given the desire for (new RegExp(RegExp.escape(s))).test(s) === true:

  • If RegExp.escape("\n") === "\n", then new RegExp(RegExp.escape("\n")) ~ new RegExp("\n") ~ /\n/, and /\n/.test("\n") === true
  • If RegExp.escape("\n") === "\\n", then new RegExp(RegExp.escape("\n")) ~ new RegExp("\\n") ~ /\\n/, and /\\n/.test("\n") === false

@domenic
Copy link
Member

domenic commented Jun 29, 2015

Wait, no, it turns out new RegExp("\n") ~ /\n/, but also new RegExp("\\n") ~ /\n/. That's ... unexpected.

@benjamingr
Copy link
Collaborator

Hmm, ok I've given this some thought now, this is something that needs to be solved from the EscapeRegExpPattern side and not the RegExp.escape side. It's about how a pattern can be represented as a string (EscapeRegExpPattern spec algorithm) and not how a string can be made a part of a pattern (RegExp.escape).

Moreover, this is a bug, according to the spec for EscapeRegExpPattern:

The code points / or any LineTerminator occurring in the pattern shall be escaped in S as necessary to ensure that the String value formed by concatenating the Strings "/", S, "/", and F can be parsed (in an appropriate lexical context) as a RegularExpressionLiteral that behaves identically to the constructed regular expression.

So to sum it up:

  • This is a bug in the existing implementation of Chrome, here is the relevant V8 bug.
  • This works correctly on other implementations like Firefox already.

Thanks for the issue! Keep em' coming!

@bergus
Copy link
Contributor Author

bergus commented Jun 29, 2015

Ah, that's right, EscapeRegExpPattern should be fixed indeed to make the string representations work.

However, I think there's nothing wrong with doing this on both sides.

can find me one example in real code

Well, you've found it yourself - the V8 implementation of source. It's just that nobody expects a pattern string to contain control characters. I don't know how to search code bases for such issues, but I'm pretty sure they exist. Doesn't /data also have some examples of eval("/"+pattern+"/g")?

I do agree that we hardly expect control characters being passed to RegExp.escape, and iff the result is directly passed to new RegExp then there is no harm, but it still might be worth guarding against.

@benjamingr
Copy link
Collaborator

This is a bug in V8 though (other browsers work correctly). I'm not sure why we would want to specify a workaround for a current (and interim) bug in V8.

@bergus
Copy link
Contributor Author

bergus commented Jun 29, 2015

I mean that this kind of bug might happen in user code as well, in cases where new RegExp is not used (but eval or some custom pattern processing).

Indeed, the question boils down to How much safety do want in RegExp.escape? and Do we want to support buggy usage scenarios? (#29). As I'd answer the second question with No, we probably don't need this, it's just a "nice to have" at the expense of added complexity. Still, in contrast to the other discussed safety features, this one wouldn't even be detrimetal to readability of the RegExp.escape output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants