Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gutenberg still parsing blocks with a Regular Expression, instead of using a common parser #10353

Closed
gschoppe opened this issue Oct 5, 2018 · 3 comments
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Feedback Issues that relate purely to feedback on a feature that isn't necessarily actionable

Comments

@gschoppe
Copy link

gschoppe commented Oct 5, 2018

One of the most important parts of providing a single source of truth, is to have a single language to interpret and parse data. Gutenberg selected a custom PEG parser as its manner of interpreting HTML comments into blocks. As such, that PEG parser defines the manner in which blocks are to be written and parsed. It is the single source of truth on blocks.

However, on every page load, Gutenberg ignores this PEG parser, and instead uses a regular expression, that doesn't actually understand the format it parses. This is a fundamental design flaw that will lead to many future bugs and increasingly hacky patches. a regular expression is incapable of properly parsing a PEG language, as they operate at fundamentally different Chomsky levels. Small differences in interpretation between the two competing parsers will inevitably be a source of bugs.

If Gutenberg's language is to be the source of truth, then a common parser needs to be in use everywhere it is used. This hacky solution just sets up dynamic blocks to become "Shortcodes 2", with all the baggage and limitations that come along with that.

Launching Gutenberg without a consistant php parser would be a huge mistake.

@designsimply designsimply added [Type] Feedback Issues that relate purely to feedback on a feature that isn't necessarily actionable [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Oct 15, 2018
@designsimply
Copy link
Member

Related: #5967

@designsimply
Copy link
Member

Thank you for the feedback! I work on testing and triage. I would like to close this as a duplicate of #5967 and also in favor of the work being done in #10463. Could you let me know if you think the issue is not a duplicate or if you have opened it as a duplicate intentionally?

@gschoppe
Copy link
Author

gschoppe commented Oct 16, 2018

Thank you, It was opened as a separate ticket to draw attention to the outstanding issue on #5967 that had, at the time, been dormant for several months. After posting, #5967 was closed, citing the new fast parser work that was being done, but there was no sign on that ticket of actually replacing the dynamic block regex. That appears to finally be addressed in #10463.

I do believe there to be a remaining concern regarding the fact that the fast PHP parser does not follow the same process as the PEG.js parser to interpret the spec, so there are still likely issues that will arise over time, from lack of a common interpreter, between PHP and JS. However, I have seen some conversation sparked about creating a JS parser that follows the same specs as the Fast PHP parser, which may theoretically resolve that issue.

I'm looking forward to running the fast parser through some extensive tests, to make sure its REGEX tokenizer can actually handle the complexities that nested blocks produce, such as self-nesting (e.g. a column block nested within a column), but that would be a separate topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Feedback Issues that relate purely to feedback on a feature that isn't necessarily actionable
Projects
None yet
Development

No branches or pull requests

2 participants