-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table of contents: Avoid parsing the document using libxml in PHP #29560
Comments
I agree, this is a block that I strongly feel should be restricted to work with blocks alone, not attempt to work with freeform content. It's at the core of the value proposition of blocks to offer structured data that we can more easily reason about. We also have the right parsing tools for them, both on the client and the server. Ideally, for me, the headings are pulled from just heading blocks at render time and assembled using the block APIs. Freeform content is skipped. There are too many concerns with trying to parse arbitrary html in search of headings, and it can crumble down on any long enough post. |
You mirror my thoughts exactly. I commented in #22874 (comment) about the fact that there are trade-offs to be made with any implementation, so we need to ensure we value the right things — like a block-based approach instead of a DOM-based one. Which brings me to #22874 (comment), in which I recommended starting with just Heading and only thereafter designing a decent API for third-party heading-like blocks to feed the table of contents.
Yes, this is something that worries me too, not just from a computational standpoint, but also because of the reliance on libxml itself. It's heavy-handed, a bit fragile, not guaranteed to be supported* across an install base as large as WP's, and has opened the doors for multiple issues last year in the context of Block Supports. See the description (the diff itself is irrelevant) of #26111 for context. Since then we've moved away from libxml for that feature. There is a common learning between Block Supports and the Table of Content blocks, though: it's that we tend to come up with more robust solutions when we err on the side of doing less and relying on blocks as the default semantic unit. In the case of Block Supports, that meant inverting control in the API, raising restrictions on what Block Supports should do, and focusing on blocks' attribute schemas. So, for the ToC block, let's take it from the top with the idea of heading-like blocks. *: For instance, Jetpack checks for the existence of the |
Exactly, also because the right API for third party blocks might actually be to use heading block as children or variations. |
It's worth noting that I originally went with the dynamic rendering approach because altering the saved markup of the block in the editor created extra undo steps, since every change to a Heading would result in a subsequent change to the Table of Contents. However, I think the recently-introduced @mtias I can think of one use-case that might be difficult to accomplish if the core Heading block was the only supported source of headings: accordions. The proper semantic markup for an accordion is as follows: <div>
<h2>
<button>Accordion title... the button MUST be nested INSIDE the heading, and the button should contain the entire content of the title.</button>
</h2>
<div>Accordion content...</div>
</div> Can this be accomplished using block variations? If we decide to only support the core Heading block, we should probably make an announcement about it. There are several existing popular plugins that add their own heading blocks (e.g. Kadence Blocks and Getwid), and we should probably let them know that what they're doing isn't supported, if we choose to go that route. Because as things currently are in the editor, trying to use any non-core Heading block already results in a somewhat broken experience when trying to use the Document Outline tool. |
It would be very helpful to get some movement on this PR! |
What problem does this address?
Right now the TOC block parses the content of the post on render, dynamically generating the list every time.
It works, but is a resources-intensive process and could be optimized.
What is your proposed solution?
Instead of building the TOC dynamically on render, we could do it via JS in the editor itself. This way it will be saved as a list and rendered a lot faster and efficiently.
The headings structure can be saved in the block attributes and updated on the fly as the post gets edited.
This would mean that the classic block will be ignored by the TOC block, but that is to be expected and in fact something we may want to enforce since users should be using the heading blocks to structure their content, and classic blocks can be converted to block-based content.
The text was updated successfully, but these errors were encountered: