Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite expression parser to a state machine in optional mixin #4768

Merged
merged 22 commits into from
Mar 31, 2018

Conversation

TimvdLippe
Copy link
Contributor

@TimvdLippe TimvdLippe commented Aug 3, 2017

Implement a new binding expression parser as statement machine instead of a regex. This solves the problem of maintaining a whitelist of characters that are allowed in bindings, e.g. it allows Japanese unicode characters, slashes, etc... It can better deal with unbalanced brackets, e.g. {{asdf]] and [[asdf}}, since we can maintain state and escaping in strings is more intuitive.

Regarding performance: I ran the supplied test on current master and on this branch. The new binding expression parsers took 950 ms for 1000 templates while current master took 700 ms. Have to take a deep dive in what is taking a lot of time, but would like to get some initial feedback first.

TODO:

  • Documentation
  • Performance improvements
  • Check compliance with Closure Compiler
  • Add mechanism to check if parsing succeeded, to improve error handling of incorrect bindings

Reference Issue

Fixes #1734
Fixes #1978
Fixes #3870
Fixes #4239
Fixes #4723
Supersedes and closes #3871

@TimvdLippe
Copy link
Contributor Author

Performance is now on-par with the old regex.

@sorvell
Copy link
Contributor

sorvell commented Sep 15, 2017

It would be great to see if this PR could be done as an optional mixin. Hopefully there are enough hooks in the binding system to make this possible.

The concern is the size and speed of the additional code here. This plus the fact that these issues are mostly pretty corner case makes me think that perhaps an optional mixin might be a better fit.

Copy link
Contributor

@sorvell sorvell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfreedm
Copy link
Member

dfreedm commented Sep 15, 2017

Just a few eslint issues: https://travis-ci.org/Polymer/polymer/builds/265406074

@TimvdLippe
Copy link
Contributor Author

This is now extracted to a mixin and tested separately. We might still want to take a closer look at the performance.

@TimvdLippe TimvdLippe changed the title Rewrite expression parser to a state machine Rewrite expression parser to a state machine in optional mixin Sep 22, 2017
Copy link
Contributor

@sorvell sorvell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this mixin something slightly more specific like perhaps StrictBindingParser?

@TimvdLippe
Copy link
Contributor Author

@sorvell renamed!

@@ -0,0 +1,405 @@
<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this file to lib/mixinssince it's.... a mixin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

storeMethodVariable(bindingData, text, i);
storeMethod(bindingData, templateInfo);
bindingData.startChar = i + 1;
storeMethod(bindingData, templateInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo to call storeMethod a second time here? If so, let's fix and add a test to verify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimvdLippe acknowledged issue and will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I no longer have write access to this repository, so I submitted #4866 on this branch to fix the issue

@sorvell
Copy link
Contributor

sorvell commented Nov 16, 2017

We need to fix the test issues before merging.

@justinfagnani
Copy link
Contributor

Hey @TimvdLippe, we talked about this a bit already, but there's a lot of overlap here with the polymer-expressions 2.0 branch: https://github.com/Polymer/polymer-expressions/tree/2.0-preview

I think that recursive descent parsers are generally easier to reason about and maintain than hand-written state machines. It'd be interesting to see if polymer-expressions were suitable to use as the more strict parser either if 1) were stripped down to just the syntax and features that Polymer currently supports, or 2) we offered the full syntax as an opt-in for more powerful expressions.

@kevinpschaaf mentions that this PR handles finding and matching the expression delimiters too, and that would need to be done for polymer-expressions. He and I have both not been able to put together the time to finish the push to integrating it is a mixin. I wonder if you could investigate that and try to get it actually working to compare against this option.

@TimvdLippe
Copy link
Contributor Author

@justinfagnani Yes I did consider using polymer-expressions instead. However, the original bindings were detected by a regex, which is quite performant, but also quite unmaintainable. I did multiple iterations of my overall idea (on how to remove the whitelist of accepted characters and take into account delimiters). Most solutions were simply not performant enough to be comparable with the regex.

The pure state machine was the solution that did get very close (not 100% same speed), while it is more maintainable than the original regex (imo). I fear that a full-blown generalized parser (like in polymer-expressions) is too big and slow for such a small syntax, that's why I did not go down that route.

Nonetheless, I will try to port the expressions in a similar manner to this PR and confirm/deny my suspicion.

@TimvdLippe
Copy link
Contributor Author

@sorvell Build is passing now and docs + types are updated.

@TimvdLippe
Copy link
Contributor Author

For me: update the docs to state performance vs correctness trade-off, fix conflicts and then merge. Verbal LGTM by @sorvell

@TimvdLippe TimvdLippe merged commit 569ff37 into master Mar 31, 2018
@TimvdLippe TimvdLippe deleted the rewrite-expression-parser branch March 31, 2018 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants