Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Replace dynamic-block regex in do_blocks #10463

Closed
wants to merge 7 commits into from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Oct 10, 2018

Description

Previously we have been using a simplified parse to grab dynamic
blocks and replace them with their rendered content.

Since #8083 we've had a fast default parser which removes the need
for a simplified parse here.

In this patch we're replacing the existing simplified parser in
do_blocks with the new default parser. This will open up new
opportunities for working with nested blocks on the server.

Resolves or addresses:

Status

  • it appears like the old parser relies on the raw content in the blocks but the parser spec doesn't provide this. this means that after this patch we might be wiping out all nested block content on render. this needs to be fixed 😉
  • I'm not sure what the mobile files are but they mirror this one. can someone explain what those are? are they auto-generated? do we have to update them?
  • this doesn't replace usage of the simplified parser in strip_dynamic_blocks(). we should probably revisit WIP: Add dynamic block #6170 to prevent introducing another tear-down-built-up double iteration in those simpler functions. for now, I've left in the regex solution built by @aduth

Checklist:

  • My code is tested.
  • My code follows the WordPress code style
  • My code follows the accessibility standards.
  • My code has proper inline documentation.

Previously we have been using a simplified parse to grab dynamic
blocks and replace them with their rendered content.

Since #8083 we've had a fast default parser which removes the need
for a simplified parse here.

In this patch we're replacing the existing simplified parser in
`do_blocks` with the new default parser. This will open up new
opportunities for working with nested blocks on the server.
@dmsnell dmsnell added the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label Oct 10, 2018
@dmsnell dmsnell requested review from mcsf, mtias and aduth October 10, 2018 01:56
@mtias mtias added this to the 4.1 milestone Oct 10, 2018
$rendered_content .= $content;

// Strip remaining block comment demarcations.
$rendered_content = preg_replace( '/<!--\s+\/?wp:.*?-->\r?\n?/m', '', $rendered_content );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this line is causing the unit test failure in https://travis-ci.org/WordPress/gutenberg/jobs/439435606 as it's changing how some whitespace renders?

The only difference is an extra line break showing up in the output, so not a huge deal, but this may be accounting for it.

@mcsf
Copy link
Contributor

mcsf commented Oct 15, 2018

I'm not sure what the mobile files are but they mirror this one. can someone explain what those are? are they auto-generated? do we have to update them?

No need to worry about these. Context: #9883.

it appears like the old parser relies on the raw content in the blocks but the parser spec doesn't provide this. this means that after this patch we might be wiping out all nested block content on render. this needs to be fixed

What's a viable plan for this?

this doesn't replace usage of the simplified parser in strip_dynamic_blocks()

@dmsnell: Considering this and the tight timeline for 4.1, 4.2 and WordPress 5.0, how do you think this current PR should be prioritized?

This will open up new opportunities for working with nested blocks on the server.

I'm all for this, but, with prioritization in mind, we need to identify in which way the present work is a requirement for the MVP of #8352. Also, a note to @mtias and I that we should make sure that said issue, PHP APIs Overview, is up-to-date for 5.0.

@pento
Copy link
Member

pento commented Oct 18, 2018

d7811ca is kind of janky, but it fixes nested blocks. Instead of dumping the parent block HTML into its innerHTML, it creates new innerBlocks freeform children, with the HTML fragments interspersed in the right place with the actual child blocks.

@pento
Copy link
Member

pento commented Oct 18, 2018

I've manually added these changes to Core, we can sync up the exact behaviour later.

Copy link
Contributor

@georgestephanis georgestephanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than initially rendering dynamic blocks with no concept of innerBlocks, it would probably be better to render any inner blocks first, then pass that as content to the dynamic block.

This would solve the issues w/r/t Jetpack Contact Forms not having field render in Automattic/jetpack#10256


// Replace dynamic block with server-rendered output.
$block_content = $block_type->render( (array) $block['attrs'], $block['innerHTML'] );
} elseif ( $block['innerBlocks'] ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way these conditionals are laid, blocks can either be Dynamic Blocks, OR they can have innerBlocks. It doesn't permit Dynamic Blocks to have innerBlocks. Which seems bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true, hence the reason this PR exists.

I'd rather not render inner blocks first then pass them up because then that limits what we can do with inner blocks. it should be good enough to give blocks their inner blocks in structural form, let them decide if they want to modify or filter them, then render the document.

this PR is just one step in the process - we have lots to overhaul here and maybe I'm wrong but I thought that @mcsf decided to keep this as-is until post merge and leave dynamic blocks the way they are - as in, yesterday or the day before he decided. let's see what he says

@gziolo
Copy link
Member

gziolo commented Oct 19, 2018

I've manually added these changes to Core, we can sync up the exact behaviour later.

Let's move it to 4.2 and wait for feedback from Core in such case. We are wrapping up 4.1-RC today and it isn't approved so we really don't have a choice. We can still move it back if only it's ready to go before we do the release.

$dynamic_blocks = get_dynamic_block_names();

foreach ( $blocks as $block ) {
$block = (array) $block;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just do something like this, to allow individual plugins to override as necessary for more custom functionality that Core doesn't support w/r/t content, html, inner blocks, etc?

Suggested change
$block = (array) $block;
$block = (array) $block;
if ( $pre_rendered_content = apply_filters( 'pre_do_block', null, $block, $all_blocks ) ) {
$rendered_content .= $pre_rendered_content;
$post = $global_post;
continue;
}

or potentially even apply_filters( 'pre_do_block_' . $block['blockName'], null, $block, $all_blocks ) to narrow the filter instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be taken advantage of via something like this:

Automattic/jetpack@a12356b

@youknowriad
Copy link
Contributor

@mcsf @dmsnell What's the status of this PR. Is it still meant to be included in 4.2?

@georgestephanis
Copy link
Contributor

@youknowriad I believe it may have shifted over to #11082 and #11141 instead? Unsure.

@dmsnell
Copy link
Member Author

dmsnell commented Oct 29, 2018

@youknowriad what @georgestephanis said is true. this work has shifted into a more reliable approach dependent on the changes to the parse which provide "block markers" for where the inner blocks came from in a given block.

I'm closing this here in favor of #11141 and yeah I really want to get this in before the merge window closes.

@dmsnell dmsnell closed this Oct 29, 2018
@dmsnell dmsnell deleted the parser/do-blocks branch October 29, 2018 01:20
@pento pento removed this from the 4.2 milestone Oct 29, 2018
dmsnell added a commit that referenced this pull request Nov 7, 2018
Attempt three at including positional information from the parse to enable isomorphic reconstruction of the source `post_content` after parsing.

See alternate attempts: #11082, #11309
Motivated by: #7247, #8760, Automattic/jetpack#10256
Enables: #10463, #10108

## Abstract

Add new `innerContent` property to each block in parser output indicating where in the innerHTML each innerBlock was found.

## Status

 - will update fixtures after design review indicates this is the desired approach
 - all parsers passing new tests for fragment behavior

## Summary

Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in `post_content` in place as normal blocks which exist in between another block's comment delimiters.

```html
<!-- wp:outerBlock -->
Check out my
<!-- wp:voidInnerBlock /-->
and my other
<!-- wp:innerBlock -->
with its own content.
<!-- /wp:innerBlock -->
<!-- /wp:outerBlock -->
```

The way this gets parsed leaves us in a quandary: we cannot reconstruct the original `post_content` after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n"
}
```

At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation.

In this PR I'd like to introduce a new mechanism which shouldn't break existing functionality but which will enable us to go back and forth isomorphically between the `post_content` and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript.

The proposed mechanism is a new property as a **list of HTML fragments with `null` values interspersed between those fragments where the blocks were found**.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n",
	"innerContent": [ "\nCheck out my\n", null, "\n and my other\n", null, "\n" ],
}
```

Doing this allows us to replace those `null` values with their associated block (sequentially) from `innerBlocks`.

## Questions

 - Why not use a string token instead of an array?
    - See #11309. The fundamental problem with the token is that it could be valid content input from a person and so there's a probability that we would fail to split the content accurately.

 - Why add the `null` instead of leaving basic array splits like `[ 'before', 'after' ]`?
    - By inspection we can see that without an explicit marker we don't know if the block came before or after or between array elements. We could add empty strings `''` and say that blocks exist only _between_ array elements but the parser code would have to be more complicated to make sure we appropriately add those empty strings. The empty strings are a bit odd anyway.

 - Why add a new property?
    - Code already depends on `innerHTML` and `innerBlocks`; I don't want to break any existing behaviors and adding is less risky than changing.
@mcsf mcsf mentioned this pull request Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants