WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068

ockham · 2023-01-11T13:52:15Z

What?

Part of #44410.

Allow updating of arbitrary HTML (rather than just tag attributes) by derived classes by making $lexical_updates protected, and adding some required logic.

Why?

This is needed in order to be able to implement methods like set_content_inside_balanced_tags(). See #47036 and #46680 (comment) for more context.

How?

As noted elsewhere:

[W]hile $attribute_updates is keyed by ("comparable", i.e. lowercase) attribute name, those keys are only relevant for set_attribute() (to check if an attribute of the same name already exists). They are however entirely ignored by apply_attributes_updates -- which is why we can use that mechanism for set_content_inside_balanced_tags without much modification 🎉

This PR thus:

Re-introduces $attribute_updates (after WP_HTML_Tag_Processor: Rename attribute_updates to lexical_updates #47053 has renamed them to $lexical_updates);
Introduces a attribute_updates_to_lexical_updates() method (inspired by class_name_updates_to_attribute_updates()).
Runs that before apply_lexical_updates().
Makes $lexical_updates protected.

These changes only affect existing private methods and members, so we're not breaking any APIs.

TODO

With "arbitrary" lexical updates, there's more potential for collisions:

There's a chance that existing bookmarks become invalid -- e.g. when an update overwrites an existing tag that had a bookmark attached. We thus need to add some logic to apply_lexical_updates that invalidates bookmarks, if necessary.
Even worse, there's a chance that lexical updates themselves collide. Imagine that in the aforementioned case, we have an attribute update for the tag that gets overwritten by the lexical update. The only way to avoid this might be to flush attribute updates before adding the lexical update (see). Edit: Or maybe this isn't really a problem after all, see WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068 (comment).

Testing Instructions

See unit tests.

github-actions · 2023-01-11T14:25:42Z

Flaky tests detected in 41a3919.
Some tests passed with failed attempts. The failures may not be related to this commit but are still reported for visibility. See the documentation for more information.

🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4007383888
📝 Reported issues:

[Flaky Test] should always expand single line selection #39787 in specs/editor/various/multi-block-selection.test.js
[Flaky Test] should move the blocks up #39782 in specs/editor/various/block-editor-keyboard-shortcuts.test.js

ockham · 2023-01-11T14:47:12Z

lib/experimental/html/class-wp-html-tag-processor.php

+	 * @param string $text  The replacement.
+	 * @return void
+	 */
+	protected function add_lexical_update( $start, $end, $text ) {


We probably need to flush updates here to avoid collisions.

We also might need to change the signature to use bookmarks instead of offsets for $start and $end, to guarantee that they're moved correctly upon update.

We probably need to flush updates here to avoid collisions.

I'd really like to avoid any possibility of adding colliding updates here. I think at that point we've exposed too much of the internal logic of the system or we have failed to maintain our internal consistency.

This is something I've worried about as we look at making sure get_attribute() returns the updated values for enqueued set_attribute() calls.

Put another way, the fact that we are raising these concerns might be more evidence that we're trying to stuff too much in when we could call get_updated_html() before some of these operations and then not worry about it.

Maybe I was being too pessimistic here. The example(s) I was thinking of all involved moving to a different tag (via next_tag() or seek()), but I had forgotten that we flush in those cases anyway, since our updates are per-tag.

Is it really that simple? Does the updating mechanism Just Work™️, even for more generic changes? 🤔

Does the updating mechanism Just Work™️, even for more generic changes?

Right now attribute updates have a name corresponding to the attribute name, so successive attribute updates overwrite previous ones. Other textual updates are appended without a name (so probably get some generic integer key).

If we add overlapping textual updates we could collide. If we add successive attribute updates they replace the earlier ones and things are safe.

On this note we might be able to run a variant of parse_next_attribute() on get_attribute() if we have updates enqueued for the given attribute. That could save us the complexity of tracking attribute updates separately in #46680

Right now attribute updates have a name corresponding to the attribute name, so successive attribute updates overwrite previous ones.

Yeah. With the changes in this PR, we'd be retaining this behavior (which is happening entirely inside set_attribute). Eventually, when they're carried over to $lexical_updates by attribute_updates_to_lexical_updates, those keys are dismissed.

Other textual updates are appended without a name (so probably get some generic integer key).

Yeah -- I'm implementing this behavior more explicitly in this PR, where other textual updates go straight to $lexical_updates, which is a non-associative array.

If we add overlapping textual updates we could collide. If we add successive attribute updates they replace the earlier ones and things are safe.

I'm starting to think that things really might work in our favor -- at least as long as we only support set_content_inside_balanced_tags: By its nature, that's an operation that shouldn't collide with set_attribute when applied to the same tag, e.g. the section in the following:

<section id="foobar"> <div>Text</div> </section>

Right now I'm thinking to maybe just throw a ton of test coverage at the problem with every possible scenario I can think of, to see if I'm missing something 😅

One thing I'm pretty sure is that we will need to invalidate (i.e. release?) bookmarks. Anything that points to a tag that's overwritten by set_content_inside_balanced_tags is going to get lost.

Right now I'm thinking to maybe just throw a ton of test coverage at the problem with every possible scenario I can think of, to see if I'm missing something 😅

Adding those to #47036, since that PR has the actual, public set_content_inside_balanced_tags method. (I've updated that PR to implement that method now using add_lexical_update from this PR.)

I'd remove this method – $this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $text ); can be done in-place where its needed.

I'd really like to avoid any possibility of adding colliding updates here. I think at that point we've exposed too much of the internal logic of the system or we have failed to maintain our internal consistency.

+1 to that.

I'd remove this method – $this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $text ); can be done in-place where its needed.

I'm exposing it (protected) for use by set_content_inside_bookmarks (which is a method of WP_HTML_Processor, which in turn is derived from WP_HTML_Tag_Processor). I don't mind removing the method, but it means I'll have to make $lexical_updates protected.

lib/experimental/html/class-wp-html-tag-processor.php

ockham · 2023-01-19T17:58:15Z

As I've now implemented bookmark invalidation for bookmarks that are "overwritten", I'll tentatively open this for review.

Note that since the method itself is private, I haven't added any unit test coverage here. Instead, please refer to #47036, where set_content_inside_balanced_tags is implemented using the add_lexical_update method from this PR, and I've added some unit test coverage for set_content_inside_balanced_tags in that PR.

lib/experimental/html/class-wp-html-tag-processor.php

adamziel · 2023-01-20T12:19:53Z

The changes look good to me. Let's observe how we end up using this API – same as Dennis I am worried about the possibility of overlapping updates. I'll leave the final approval to @dmsnell as he clearly gave this code more though.

ockham · 2023-01-23T14:23:59Z

The changes look good to me. Let's observe how we end up using this API – same as Dennis I am worried about the possibility of overlapping updates.

Thank you, Adam! Yeah, that was my major point of concern as well -- and the ultimate goal for this PR to avoid them! 😅 Per this convo, I've tried to ascertain that mostly by adding test coverage to #47036 😊 LMK if you can think of any other scenarios that aren't covered by those tests 😄

dmsnell · 2023-01-26T18:41:13Z

What does the addition of lexical_updates and attribute_updates buy us? We can already distinguish these because attribute updates have a string array key while non-attribute updates are appended with [] = , giving them numeric keys. I find the distinction a bit confusing.

It seems like we now have two PRs here: one which unlocks a private method to subclassing and one which addresses some accounting issues for bookmarks that are eliminated by updates. Would it take much work to extract the bookmarking PR? I understand that before this PR the bookmark issue isn't really an issue at all, but I don't think it would be wrong to add the safety into the system before it's needed, and the additional integer comparison checks shouldn't appreciably slow anything down, particularly since we're only ever theoretically taking the same branch until more work is made available, such as proposed in this PR

ockham · 2023-01-30T13:39:23Z

What does the addition of lexical_updates and attribute_updates buy us? We can already distinguish these because attribute updates have a string array key while non-attribute updates are appended with [] = , giving them numeric keys. I find the distinction a bit confusing.

For context, this PR originally also contained a add_lexical_update() method, which I removed per Adam's suggestion 😊

It seems like we now have two PRs here: one which unlocks a private method to subclassing and one which addresses some accounting issues for bookmarks that are eliminated by updates. Would it take much work to extract the bookmarking PR? I understand that before this PR the bookmark issue isn't really an issue at all, but I don't think it would be wrong to add the safety into the system before it's needed, and the additional integer comparison checks shouldn't appreciably slow anything down, particularly since we're only ever theoretically taking the same branch until more work is made available, such as proposed in this PR

I can file a PR to extract the bookmarking logic 👍

ockham · 2023-02-21T16:07:40Z

Rebased, now that the bookmark invalidation PR (#47559) has been merged.

dmsnell · 2023-02-21T21:02:15Z

with $this->lexical_updates already being protected, is this required now? I'm still wondering what separating attribute from lexical updates gain us that we can't already do, apart from needing to check in two places for an update rather than one.

also, how do you feel about moving this PR into WordPress/WordPress-develop?

adamziel · 2023-03-02T11:38:15Z

Now I came around again @ockham @dmsnell .

The name lexical_update suggests the ability to apply a diff to any part of the markup. However, WP_HTML_Tag_Processor assumes that diffs are applied only to the currently processed tag.

I've been using this PR for the last week and struggled to:

Insert a tag or text before the currently processed tag
Replace the current tag's outer HTML

I'd add support for these use-cases before merging this PR. Otherwise, we're replacing a name that reflects the limitations reasonably well (attribute updates) with one that doesn't.

dmsnell · 2023-03-02T18:34:40Z

suggests the ability to apply a diff to any part of the markup

It shouldn't suggest this. It should only suggest that at the point we have the diff it's purely lexical, based on the text and void of semantic information about the structure of the document.

WP_HTML_Tag_Processor assumes that diffs are applied only to the currently processed tag

Does it though? It doesn't in apply_attribute_updates()? It does overlook diffs inside of seek(), which seems like it would be a problem, but one we could also resolve with the same bookmark delta/shifting algorithm employed in apply_attribute_updates().

For both of the situations you listed I kind of envisioned we could create a zero-width bookmark before a tag starts and use that as a reference. I have not tested this out in practice, but that bookmark wouldn't interfere with any syntax and it wouldn't be affected by set_outer upates.

ockham · 2023-03-06T10:30:21Z

with $this->lexical_updates already being protected, is this required now?

Probably not anymore!

I'm still wondering what separating attribute from lexical updates gain us that we can't already do, apart from needing to check in two places for an update rather than one.

Yeah, it would've been needed mostly if $lexical_updates were private.

There's one more reason I find somewhat compelling: For the update handling logic, the update array keys (which happen to be attribute names) are purely circumstantial. This PR basically reflects that updates can exist without such keys, and that attribute updates are basically just one class of such updates.

also, how do you feel about moving this PR into WordPress/WordPress-develop?

Happy to -- but would you say it's even still worthwhile pursuing? 😊

ockham · 2023-03-06T10:36:19Z

suggests the ability to apply a diff to any part of the markup

It shouldn't suggest this. It should only suggest that at the point we have the diff it's purely lexical, based on the text and void of semantic information about the structure of the document.

Yeah, that's how I read "lexical" as well 👍😊

WP_HTML_Tag_Processor assumes that diffs are applied only to the currently processed tag

Does it though? It doesn't in apply_attribute_updates()? It does overlook diffs inside of seek(), which seems like it would be a problem, but one we could also resolve with the same bookmark delta/shifting algorithm employed in apply_attribute_updates().

For both of the situations you listed I kind of envisioned we could create a zero-width bookmark before a tag starts and use that as a reference. I have not tested this out in practice, but that bookmark wouldn't interfere with any syntax and it wouldn't be affected by set_outer upates.

Seems like a reasonable approach to me 👍

dmsnell · 2023-03-06T19:13:41Z

the update array keys (which happen to be attribute names) are purely circumstantial

this wasn't circumstantial, because those keys were chosen so that subsequent updates would automatically replace earlier updates without us needing to track them.

that we can use numeric keys for non-attribute updates isn't entirely chance either. 😄

read "lexical"

the difference is "the ability to apply a diff to any part of the markup"

the rules are still supposed to be maintained about where an update can occur to an input document. that we've stripped those semantic rules away at the point we enqueue an update doesn't imply that those rules don't exist.

my point is that there's nothing in this class at all that should suggest that we can or should do arbitrary edits to the HTML. opening up the ability to do this is an unfortunate requirement of extending the class, and we should be on the lookout for code trying to apply diffs to "any part of the markup" and learn how we can better enforce the rules that it shouldn't be able to. never should it suggest you can change <div><p>Test</p></div> into <diTes>.

adamziel · 2023-03-08T18:44:28Z

my point is that there's nothing in this class at all that should suggest that we can or should do arbitrary edits to the HTML.

To me, there still is. Lexical update only has a start and an end; nothing about it communicates these semantic rules. An "attribute update" may be just a WP_HTML_Text_Replacement as well, but the name tells me specifically what it is about. With a lexical update, I need to dig in the docstrings. That's not a deal breaker, but I find it a worse DX.

That being said, we need more than attribute updates here or else we'll reimplement the same logic in Directive processor, HTML processor, etc. Perhaps this PR is for the best then – we'll likely find ourselves adjusting the flushing logic to cater more use-cases anyway. In other words, I'm fine with this name change that I think is a bit misleading because I believe the behavior will change in a way I'll find more intuitive.

never should it suggest you can change
Test
into .

Oh I didn't mean that. I just meant changing the current <div> into <span>. I like the zero-width bookmark idea!

dmsnell · 2023-03-09T00:48:15Z

if we're getting "this allows arbitrary changes" out of "lexical updates" then I think we should scrap lexical updates, but find something better than "attribute updates"

"semantic_patches" ?

I'm still struggling with why having text-based diffs implies the rules go out the window. these aren't and never were semantic updates, and from the day we created attribute updates we said "it's up to anyone creating these to maintain the semantic rules" and clearly we haven't found a good name yet to communicate that.

$lexical_updates_that_must_respect_html_syntax_boundaries

ockham · 2023-07-10T12:11:53Z

Closing this PR as obsolete, now that we have WordPress/wordpress-develop#4519.

ockham added the [Type] Experimental Experimental feature or API. label Jan 11, 2023

ockham requested review from adamziel and dmsnell January 11, 2023 13:52

ockham self-assigned this Jan 11, 2023

ockham mentioned this pull request Jan 11, 2023

WP_HTML_Processor: Add set_content_inside_balanced_tags #47036

Closed

2 tasks

ockham commented Jan 11, 2023

View reviewed changes

dmsnell reviewed Jan 11, 2023

View reviewed changes

lib/experimental/html/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved

ockham mentioned this pull request Jan 11, 2023

WP_HTML_Tag_Processor: Rename attribute_updates to lexical_updates #47053

Merged

ockham force-pushed the rename/attribute-updates-to-lexical-updates branch from 08d1380 to 4ea8fa6 Compare January 12, 2023 13:03

ockham force-pushed the add/add-lexical-update branch from f6581ac to 8ec0306 Compare January 12, 2023 13:04

ockham force-pushed the rename/attribute-updates-to-lexical-updates branch from 7f84941 to b24d6c7 Compare January 16, 2023 13:35

Base automatically changed from rename/attribute-updates-to-lexical-updates to trunk January 16, 2023 18:54

ockham force-pushed the add/add-lexical-update branch from 8ec0306 to 7abe25d Compare January 16, 2023 18:57

ockham marked this pull request as ready for review January 19, 2023 17:58

ockham requested a review from spacedmonkey as a code owner January 19, 2023 17:58

adamziel reviewed Jan 20, 2023

View reviewed changes

lib/experimental/html/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved

ockham force-pushed the add/add-lexical-update branch 2 times, most recently from 05bc7fe to 41a3919 Compare January 25, 2023 15:25

ockham changed the title ~~WP_HTML_Tag_Processor: Add add_lexical_update() method~~ WP_HTML_Tag_Processor: Allow non-attribute lexical updates Jan 26, 2023

ockham mentioned this pull request Jan 30, 2023

Tag Processor: Add bookmark invalidation logic #47559

Merged

adamziel mentioned this pull request Feb 9, 2023

WP_HTML_Tag_Processor overview issue #44410

Closed

26 tasks

gziolo added the [Feature] Block API API that allows to express the block paradigm. label Feb 21, 2023

ockham force-pushed the add/add-lexical-update branch from 8b92ec6 to 48f430d Compare February 21, 2023 14:51

Allow handling of non-attribute lexical updates

3985596

ockham force-pushed the add/add-lexical-update branch from 48f430d to 3985596 Compare February 21, 2023 15:31

gziolo added [Feature] HTML API An API for updating HTML attributes in markup and removed [Feature] Block API API that allows to express the block paradigm. labels Apr 18, 2023

ockham closed this Jul 10, 2023

ockham deleted the add/add-lexical-update branch July 10, 2023 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068

WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068

ockham commented Jan 11, 2023 •

edited by gziolo

Loading

github-actions bot commented Jan 11, 2023 •

edited

Loading

ockham Jan 11, 2023

ockham Jan 11, 2023

dmsnell Jan 11, 2023

ockham Jan 11, 2023

dmsnell Jan 12, 2023

ockham Jan 12, 2023

ockham Jan 12, 2023

adamziel Jan 20, 2023

ockham Jan 23, 2023

ockham Jan 23, 2023

ockham commented Jan 19, 2023

adamziel commented Jan 20, 2023

ockham commented Jan 23, 2023

dmsnell commented Jan 26, 2023

ockham commented Jan 30, 2023

ockham commented Feb 21, 2023

dmsnell commented Feb 21, 2023

adamziel commented Mar 2, 2023

dmsnell commented Mar 2, 2023

ockham commented Mar 6, 2023

ockham commented Mar 6, 2023

dmsnell commented Mar 6, 2023

adamziel commented Mar 8, 2023 •

edited

Loading

dmsnell commented Mar 9, 2023

ockham commented Jul 10, 2023

WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068

WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068

Conversation

ockham commented Jan 11, 2023 • edited by gziolo Loading

What?

Why?

How?

TODO

Testing Instructions

github-actions bot commented Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ockham commented Jan 19, 2023

adamziel commented Jan 20, 2023

ockham commented Jan 23, 2023

dmsnell commented Jan 26, 2023

ockham commented Jan 30, 2023

ockham commented Feb 21, 2023

dmsnell commented Feb 21, 2023

adamziel commented Mar 2, 2023

dmsnell commented Mar 2, 2023

ockham commented Mar 6, 2023

ockham commented Mar 6, 2023

dmsnell commented Mar 6, 2023

adamziel commented Mar 8, 2023 • edited Loading

dmsnell commented Mar 9, 2023

ockham commented Jul 10, 2023

ockham commented Jan 11, 2023 •

edited by gziolo

Loading

github-actions bot commented Jan 11, 2023 •

edited

Loading

adamziel commented Mar 8, 2023 •

edited

Loading