Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser is too large/bloated #56

Open
calebdw opened this issue Feb 2, 2024 · 4 comments
Open

Parser is too large/bloated #56

calebdw opened this issue Feb 2, 2024 · 4 comments

Comments

@calebdw
Copy link
Contributor

calebdw commented Feb 2, 2024

Hello!

You can determine the size of the parser by looking at the STATE_COUNT in parser.c. The current state count for this parser is 6740 which is extremely large. For comparison, the php parser state count is ~2700, and the phpdoc parser state count is ~339.

The likely culprit here is the _text regex which results in a ton of fragments.

@EmranMR
Copy link
Owner

EmranMR commented Feb 2, 2024

yea, I knew it was very likely the performance getting a hit , because of the RegEx use, but didn't know how to benchmark or analyse! It is always good to have some number to compare with.

I am really not sure what about any alternatives tbh, any ideas? I believe you were working on a blade parser yourself with scanners, any luck there? That _text node and parsing text in general has been a nightmare that keeps haunting! #55 !

@calebdw
Copy link
Contributor Author

calebdw commented Feb 2, 2024

Yeah I have some ideas that I need to play around with including:

  • rewriting the rules
  • extending from php or html (or even both)
  • external scanner

@EmranMR
Copy link
Owner

EmranMR commented Feb 2, 2024

I am not sure how extending alone would work in terms of text. I suppose that + external scanner might work in harmony?

I actually got the _text inspiration from tree-sitter-html, but had to extensively modify it.

If there is anything at the bottom of the grammar.js that you need clarified, please feel free to let me know and I will try to elaborate a bit more.

I tried to document it as much as I could using comments but yea, it's some ugly code, esp parenthesis balancing recursion bit!

@calebdw
Copy link
Contributor Author

calebdw commented Feb 2, 2024

Yes, but interestingly enough the html state count is only 94 bytes (makes heavy use of the scanner though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants