Gutenberg still parsing blocks with a Regular Expression, instead of using a common parser #10353

gschoppe · 2018-10-05T03:08:21Z

One of the most important parts of providing a single source of truth, is to have a single language to interpret and parse data. Gutenberg selected a custom PEG parser as its manner of interpreting HTML comments into blocks. As such, that PEG parser defines the manner in which blocks are to be written and parsed. It is the single source of truth on blocks.

However, on every page load, Gutenberg ignores this PEG parser, and instead uses a regular expression, that doesn't actually understand the format it parses. This is a fundamental design flaw that will lead to many future bugs and increasingly hacky patches. a regular expression is incapable of properly parsing a PEG language, as they operate at fundamentally different Chomsky levels. Small differences in interpretation between the two competing parsers will inevitably be a source of bugs.

If Gutenberg's language is to be the source of truth, then a common parser needs to be in use everywhere it is used. This hacky solution just sets up dynamic blocks to become "Shortcodes 2", with all the baggage and limitations that come along with that.

Launching Gutenberg without a consistant php parser would be a huge mistake.

designsimply · 2018-10-15T21:52:21Z

Related: #5967

designsimply · 2018-10-15T23:18:14Z

Thank you for the feedback! I work on testing and triage. I would like to close this as a duplicate of #5967 and also in favor of the work being done in #10463. Could you let me know if you think the issue is not a duplicate or if you have opened it as a duplicate intentionally?

gschoppe · 2018-10-16T03:45:35Z

Thank you, It was opened as a separate ticket to draw attention to the outstanding issue on #5967 that had, at the time, been dormant for several months. After posting, #5967 was closed, citing the new fast parser work that was being done, but there was no sign on that ticket of actually replacing the dynamic block regex. That appears to finally be addressed in #10463.

I do believe there to be a remaining concern regarding the fact that the fast PHP parser does not follow the same process as the PEG.js parser to interpret the spec, so there are still likely issues that will arise over time, from lack of a common interpreter, between PHP and JS. However, I have seen some conversation sparked about creating a JS parser that follows the same specs as the Fast PHP parser, which may theoretically resolve that issue.

I'm looking forward to running the fast parser through some extensive tests, to make sure its REGEX tokenizer can actually handle the complexities that nested blocks produce, such as self-nesting (e.g. a column block nested within a column), but that would be a separate topic.

designsimply added [Type] Feedback Issues that relate purely to feedback on a feature that isn't necessarily actionable [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Oct 15, 2018

designsimply closed this as completed Oct 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gutenberg still parsing blocks with a Regular Expression, instead of using a common parser #10353

Gutenberg still parsing blocks with a Regular Expression, instead of using a common parser #10353

gschoppe commented Oct 5, 2018 •

edited

Loading

designsimply commented Oct 15, 2018

designsimply commented Oct 15, 2018

gschoppe commented Oct 16, 2018 •

edited

Loading

Gutenberg still parsing blocks with a Regular Expression, instead of using a common parser #10353

Gutenberg still parsing blocks with a Regular Expression, instead of using a common parser #10353

Comments

gschoppe commented Oct 5, 2018 • edited Loading

designsimply commented Oct 15, 2018

designsimply commented Oct 15, 2018

gschoppe commented Oct 16, 2018 • edited Loading

gschoppe commented Oct 5, 2018 •

edited

Loading

gschoppe commented Oct 16, 2018 •

edited

Loading