Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Allow block nesting #2743

Merged
merged 1 commit into from
Nov 9, 2017
Merged

Parser: Allow block nesting #2743

merged 1 commit into from
Nov 9, 2017

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Sep 17, 2017

In this patch we're opening up a new avenue for allowing nested blocks
in the data structure.

For each block:

  • Nested blocks appear as innerBlocks as a sequential list
  • The contained HTML without the nested blocks appear as a string
    property innerHtml which replaces rawContent (update it doesn't replace it yet)

Testing

Smoke test in the browser.
Load the grammar into the explorer (you can use the fetcher URL in the explorer for the grammar link) then load in some test content and play with nesting. You should be able to see how innerBlocks grows and when blocks do nest, that the snippet which represents them disappears from innerHtml.

Input

Front

<!-- wp:core/void /-->

<!-- wp:core/nest -->
Before
<!-- wp:core/inside -->
This is working
<!-- /wp:core/inside -->
Middle
<!-- wp:core/inside -->
Another in the <!-- wp:core/mark -->jump here<!-- /wp:core/mark -->list
<!-- /wp:core/inside -->
After
<!-- /wp:core/nest -->

End

Output

[
  {
    "blockName": "core/freeform",
    "rawContent": "Front\n\n"
  },
  {
    "blockName": "core/void",
    "attrs": null,
    "innerBlocks": [],
    "rawContent": ""
  },
  {
    "blockName": "core/freeform",
    "rawContent": "\n\n"
  },
  {
    "blockName": "core/nest",
    "attrs": null,
    "innerBlocks": [
      {
        "blockName": "core/inside",
        "attrs": null,
        "innerBlocks": [],
        "rawContent": "\nThis is working\n"
      },
      {
        "blockName": "core/inside",
        "attrs": null,
        "innerBlocks": [
          {
            "blockName": "core/mark",
            "attrs": null,
            "innerBlocks": [],
            "rawContent": "jump here"
          }
        ],
        "rawContent": "\nAnother in the list\n"
      }
    ],
    "rawContent": "\nBefore\n\nMiddle\n\nAfter\n"
  },
  {
    "blockName": "core/freeform",
    "rawContent": "\n\nEnd"
  }
]

Notes

  • This could end up going really slowly on account of how we are parsing a list of non-token characters and then eventually joining them.
  • I haven't tested any of this. Main purpose of the PR is to get something going to explore and see if it's good. We can update and fix all things later if we need to.

@dmsnell dmsnell added the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label Sep 17, 2017
@BE-Webdesign
Copy link
Contributor

BE-Webdesign commented Sep 17, 2017

dmsnell to the rescue! I think allowing nested blocks from the parser will be a good step forward! Awesome work so far!

Slightly off topic, but definitely tied to this, although they are more next step questions:

How do you think we will handle block normalization, or if we even want to do normalization?
How do we handle reordering of child blocks after parse ( tied to above )?

How do we parse this correctly?:

{
    "blockName": "core/inside",
    "attrs": null,
    "innerBlocks": [
        {
          "blockName": "core/mark",
          "attrs": null,
          "innerBlocks": [],
          "rawContent": "jump here"
        }
    ],
    "rawContent": "\nAnother in the list\n"
}

"jump here" is supposed to live in between the words "the" and "list", how do we denote that location in our block data? Do we not allow other blocks within text to avoid the need for cursor data or do we need to figure out a data model that can accommodate nested blocks better, or do we create some sort of template syntax to abstract our render tree?

I don't know if this is already a solved problem, but I think the above example, in my opinion, shows that we really want to create a data spec for blocks first, and then build our API around that. Maybe with the news of changing view libraries, this could be a good time to reflect on the current data model for blocks state.

@mtias
Copy link
Member

mtias commented Sep 17, 2017

"jump here" is supposed to live in between the words "the" and "list", how do we denote that location in our block data.

There are two aspects involved in determining how a block is rendered. The parsing step handled by the grammar breaks up the ingredients, but doesn't know how they are combined—that is determined by the block edit and save interfaces. Children are an inner boundary of a block, so they are extracted into innerBlocks. From there, the block author will received as part of edit arguments { attributes, innerBlocks ... }. This will be passed to a <BlockList children={ innerBlocks } /> sort of component that would handle the UI to manage this inner list of blocks.

@dmsnell
Copy link
Member Author

dmsnell commented Sep 17, 2017

thanks for jumping in @BE-Webdesign (sorry I should have added you to this one).

Awesome work so far!

much obliged 🙇

for starters I can say sorry for pushing out a PR without much description, explanation of motivation, or context. I've been talking in person with @mtias and some others and wanted to make sure that I could record in code some of the ideas we've spoken about. will add more of the written part later as we continue to think through the implications here.

How do you think we will handle block normalization, or if we even want to do normalization?
How do we handle reordering of child blocks after parse ( tied to above )?
How do we parse this correctly?:

mostly I'll just defer to the response that @mtias gave, but we had lots of discussion on this very point. this PR is the realization of the ideas we landed at.

@BE-Webdesign
Copy link
Contributor

There are two aspects involved in determining how a block is rendered. The parsing step handled by the grammar breaks up the ingredients, but doesn't know how they are combined—that is determined by the block edit and save interfaces. Children are an inner boundary of a block, so they are extracted into innerBlocks. From there, the block author will received as part of edit arguments { attributes, innerBlocks ... }. This will be passed to a <BlockList children={ innerBlocks } /> sort of component that would handle the UI to manage this inner list of blocks.

That sounds like a cool approach matias. What I am trying to figure out is how our block state will give those components enough information to render in the correct order. Maybe parsing each text node as a block would be a good solution?

@BE-Webdesign
Copy link
Contributor

BE-Webdesign commented Sep 18, 2017

(sorry I should have added you to this one)

No need to apologize. I don't expect to be added to anything. I am only a casual contributor. Handling the parsing/serialization and inbetween for nested blocks will be a challenge, and hopefully I can constructively aid that discussion and path forward.

@belcherj
Copy link
Contributor

belcherj commented Oct 3, 2017

@dmsnell

  • You have a merge conflict, it is much easier to resolve if you squash before rebasing
  • You have a lint issue: belcherj@e1ec93f
  • I think to fix the test all of the fixtures need to be updated due to the changes to the parser: belcherj@3f31d19

@dmsnell
Copy link
Member Author

dmsnell commented Oct 3, 2017

You have a merge conflict, it is much easier to resolve if you squash before rebasing

Thanks @belcherj - was from the work @aduth did with the more tag - I resolved manually from inside of GitHub

You have a lint issue: belcherj/gutenberg@e1ec93f

Best to defer until this merges - the innerBlocks is there to serve as a landmark that it's available. We need to add that data into the blocks inside the app as well.

I think to fix the test all of the fixtures need to be updated due to the changes to the parser:

Thanks! when I recreated the fixtures before they didn't all pass the tests and I didn't have the chance to dig in to find out why. Are they passing tests in your branch?

@dmsnell
Copy link
Member Author

dmsnell commented Oct 3, 2017

Thanks @belcherj! (oops look like tests are still failing)

Is anyone opposed to merging this PR? The ability to nest blocks doesn't force blocks to nest and as-is, this parser should be compliant with all existing posts.

Would anyone be interested in helping out by testing this against several different existing posts?

@belcherj
Copy link
Contributor

belcherj commented Oct 3, 2017

@dmsnell still a couple tests failing. Could not quickly determine how to fix.

@@ -220,7 +220,8 @@ export function createBlockWithFallback( name, rawContent, attributes ) {
*/
export function parseWithGrammar( content ) {
return grammarParse( content ).reduce( ( memo, blockNode ) => {
const { blockName, rawContent, attrs } = blockNode;
// eslint-disable-next-line no-unused-vars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're truly not using innerBlocks, what's the intent with disabling the rule?

Noting your comment at #2743 (comment) , I could go either way on whether it's sensible to leave as a placeholder. It should be fine.

@@ -220,7 +220,8 @@ export function createBlockWithFallback( name, rawContent, attributes ) {
*/
export function parseWithGrammar( content ) {
return grammarParse( content ).reduce( ( memo, blockNode ) => {
const { blockName, rawContent, attrs } = blockNode;
// eslint-disable-next-line no-unused-vars
const { blockName, innerBlocks, innerHtml: rawContent, attrs } = blockNode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace all references to rawContent as innerHTML ? Here or in a separate pull changeset?

Copy link
Member Author

@dmsnell dmsnell Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or now I thought a second PR would be good so that this can be introduced in as minimal of a way as possible - I'm open to input

/ WP_Block_Html
return [
freeform( pre ),
...ts.reduce( ( out, [ t, h ] ) => [ ...out, t, freeform( h ) ], [] ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we had issues previously because we don't run Babel on the .pegjs JavaScript, so I have a feeling this spread syntax won't work in IE11 and we'd either need to port to ES5 equivalent or investigate transpiling the result of the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 4e41437

ts:(t:Token html:$((!Token .)*) { /** <?php return $t; ?> **/ return [ t, html ] })*
post:$(.*)
{ /** <?php
$blocks = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The square bracket array syntax was added in PHP 5.4. We need to use the more-verbose array() for support back to PHP 5.2.4

http://php.net/manual/en/language.types.array.php

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 4e41437

function freeform( s ) {
return s.length && {
blockName: 'core/freeform',
innerHtml: s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to consider the capitalization here. I created #2511 a while back to open discussion on a convention. I've started leaning more toward capitalizing abbreviations (innerHTML) but really just care to enforce some consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I don't care. I'll use HTML

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 4e41437

{
/** <?php
$innerBlocks = array_filter( $children, function( $a ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anonymous functions are only available in PHP 5.3+. To support PHP 5.2.4+, we'd need this to be a named function, passing the string callable reference to array_filter.

http://php.net/manual/en/functions.anonymous.php

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 4e41437

} );

$innerHtml = array_filter( $children, function( $a ) {
return is_string( $a );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or here we could probably do array_filter( $children, 'is_string' );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. I actually intended on maybe creating an array_partition function and maybe will still do that. I always get confused which core functions are functions and which are operators (like empty()) and cannot be used as Callables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 4e41437

@dmsnell
Copy link
Member Author

dmsnell commented Oct 25, 2017

All, here are some big updates since the last time I addressed this: more complete PHP support, some big comments, and a few functions extracting out specific logic for return value shapes.

I've been inspired by some alternative parser generators in how they separate actions on a rule from the grammar itself, allowing the host language to define them separately.

For example, Canopy lets you call these actions on a rule by adding something like %combine_blocks and then in the language-specific file you supply an actions map like this:

const actions = {
	combine_blocks: function( pre, tokens, post ) {
		
	}
}

Ohm also has this notion and they consider it separating the syntax from the semantics of the language being parsed.

Anyway, something to think about as we start introducing more code. I'd love to have a grammar completely free from any code.

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like fixtures need to be regenerated per innerHtml -> innerHTML change:

npm run fixtures:regenerate

Resolves some, but not all PHPUnit errors.

It looks like we no longer guarantee that attrs will exist on the returned parsed block object, so there are PHP errors occurring here where we assume to be able to access the property:

$attributes = is_array( $block['attrs'] ) ? $block['attrs'] : array();

1) Dynamic_Blocks_Render_Test::test_dynamic_block_rendering
Undefined index: attrs

?> **/

function freeform( s ) {
return s.length && {
blockName: 'core/freeform',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we have made assumptions that freeform is the absence of a block name. Specifically, this test case now fails for a few reasons:

var truthy = [];
var falsey = [];

// nod to performance over a simpler reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ A good place for micro-optimizations.

predicate( item )
? truthy.push( item )
: falsey.push( item );
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary semi-colon. Guessing it wouldn't be very easy for us to apply linting to this file, would it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it? I thought I would be following style guidelines by including it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unnecessary: http://bit.ly/2y5BO9c

@dmsnell
Copy link
Member Author

dmsnell commented Oct 26, 2017

Thanks @aduth - I have updated with your feedback!

@codecov
Copy link

codecov bot commented Oct 26, 2017

Codecov Report

Merging #2743 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2743   +/-   ##
=======================================
  Coverage   32.05%   32.05%           
=======================================
  Files         249      249           
  Lines        6885     6885           
  Branches     1249     1249           
=======================================
  Hits         2207     2207           
  Misses       3934     3934           
  Partials      744      744
Impacted Files Coverage Δ
blocks/api/parser.js 98.21% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1cbe5e6...4e0b589. Read the comment docs.

@@ -225,7 +225,8 @@ export function createBlockWithFallback( name, rawContent, attributes ) {
*/
export function parseWithGrammar( content ) {
return grammarParse( content ).reduce( ( memo, blockNode ) => {
const { blockName, rawContent, attrs } = blockNode;
// eslint-disable-next-line no-unused-vars
const { blockName, innerBlocks, innerHTML: rawContent, attrs } = blockNode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to normalize this toward innerHTML and drop rawContent, for consistency, and also because objects of innerBlocks will keep their innerHTML property. Should be fine to do separately.

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, looks like there's still an issue: Support for passing strings as callables in PHP was only added in PHP 5.4, which does not meet our support requirements of 5.2.4:

http://php.net/manual/en/language.types.callable.php
https://wordpress.org/about/requirements/

@aduth
Copy link
Member

aduth commented Nov 2, 2017

It was easy enough to drop the callable type hint and have it work just as well. See c344da6.

I'd personally like to see us push forward with block nesting. It would help in other cases such as #3047 and #2854 where I think we've had to make some compromises due to lack of proper nesting support.

@dmsnell
Copy link
Member Author

dmsnell commented Nov 2, 2017

@aduth what’s the point of removing the type hint?

@aduth
Copy link
Member

aduth commented Nov 2, 2017

@dmsnell See #2743 (review)

@dmsnell
Copy link
Member Author

dmsnell commented Nov 2, 2017

oh roger @aduth - I thought that the act itself of passing a string callable was the issue, not adding the type hint

@mtias mtias added this to the Roadmap milestone Nov 3, 2017
In this patch we're opening up a new avenue for allowing nested blocks
in the data structure.

For each block:
 - Nested blocks appear as `innerBlocks` as a sequential list
 - The contained HTML _without_ the nested blocks appear as a string
   property `innerHtml` which replaces `rawContent`

Also:
 - Remove `WP_` prefix on grammar terms - not needed - was there
   from the earliest iterations where different parts were prefixed
   by which spec they implemented, such as HTML_, URL_, etc...

Regenerates fixtures based on updated parser

Disable eslint for line so tests will run

Fix rebase issue

Update based on PR feedback

I'm breaking my own rules here by introducing more code into the
parser but I'm also not sure how we can escape this without
placing higher demands on some post-processing after the parse.

Changes:
 - Do away with non-supported language features
 - Abstract `joinBlocks( pre, tokens, post )` into a function
   basically just need to join non-empty items into a flat list

Tiny fix and big header comment

Added comment to start of rules

Actually return blocks from peg_join_blocks()

Minor adjustments to parser to preserve existing behavior

Update parser and fix bug in updates

When the `Balanced_Block` was rebuilt to be defined as a starting block,
some number of tokens and non-closing HTML, finished by a closing block,
I used a `+` to indicate that we needed at least _some_ content inside
of the block to be valid.

In some regards this is true because empty blocks should be void blocks.

On the other hand, it's very likely that we'll receive empty non-void
blocks in practice and the parser should not invalidate one because it
has chosen the wrong syntax.

This update replaces the `+` with a `*` such that we can have empty
blocks and they will be treated as normal.

Remove unnecessary semicolon

Blocks: Access block content by innerHTML

Parser: Drop callable type hint

Type hint supported only in PHP 5.4+, above support level. Only use is in call_user_func, supportable with expected string callable input prior to 5.4
@aduth aduth merged commit 48ccc03 into master Nov 9, 2017
@aduth aduth deleted the parser/allow-block-nesting branch November 9, 2017 15:50
@dmsnell
Copy link
Member Author

dmsnell commented Nov 9, 2017

Thanks for all your help on this one @aduth!

@nylen
Copy link
Member

nylen commented Nov 16, 2017

There are no test cases for this major and foundational change to the parser, even though there is a framework in place for creating them.

What is the plan for adding these test cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants