-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068
Conversation
Flaky tests detected in 41a3919. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4007383888
|
* @param string $text The replacement. | ||
* @return void | ||
*/ | ||
protected function add_lexical_update( $start, $end, $text ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to flush updates here to avoid collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also might need to change the signature to use bookmarks instead of offsets for $start
and $end
, to guarantee that they're moved correctly upon update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to flush updates here to avoid collisions.
I'd really like to avoid any possibility of adding colliding updates here. I think at that point we've exposed too much of the internal logic of the system or we have failed to maintain our internal consistency.
This is something I've worried about as we look at making sure get_attribute()
returns the updated values for enqueued set_attribute()
calls.
Put another way, the fact that we are raising these concerns might be more evidence that we're trying to stuff too much in when we could call get_updated_html()
before some of these operations and then not worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I was being too pessimistic here. The example(s) I was thinking of all involved moving to a different tag (via next_tag()
or seek()
), but I had forgotten that we flush in those cases anyway, since our updates are per-tag.
Is it really that simple? Does the updating mechanism Just Work™️, even for more generic changes? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the updating mechanism Just Work™️, even for more generic changes?
Right now attribute updates have a name corresponding to the attribute name, so successive attribute updates overwrite previous ones. Other textual updates are appended without a name (so probably get some generic integer key).
If we add overlapping textual updates we could collide. If we add successive attribute updates they replace the earlier ones and things are safe.
On this note we might be able to run a variant of parse_next_attribute()
on get_attribute()
if we have updates enqueued for the given attribute. That could save us the complexity of tracking attribute updates separately in #46680
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now attribute updates have a name corresponding to the attribute name, so successive attribute updates overwrite previous ones.
Yeah. With the changes in this PR, we'd be retaining this behavior (which is happening entirely inside set_attribute
). Eventually, when they're carried over to $lexical_updates
by attribute_updates_to_lexical_updates
, those keys are dismissed.
Other textual updates are appended without a name (so probably get some generic integer key).
Yeah -- I'm implementing this behavior more explicitly in this PR, where other textual updates go straight to $lexical_updates
, which is a non-associative array.
If we add overlapping textual updates we could collide. If we add successive attribute updates they replace the earlier ones and things are safe.
I'm starting to think that things really might work in our favor -- at least as long as we only support set_content_inside_balanced_tags
: By its nature, that's an operation that shouldn't collide with set_attribute
when applied to the same tag, e.g. the section
in the following:
<section id="foobar">
<div>Text</div>
</section>
Right now I'm thinking to maybe just throw a ton of test coverage at the problem with every possible scenario I can think of, to see if I'm missing something 😅
One thing I'm pretty sure is that we will need to invalidate (i.e. release?) bookmarks. Anything that points to a tag that's overwritten by set_content_inside_balanced_tags
is going to get lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I'm thinking to maybe just throw a ton of test coverage at the problem with every possible scenario I can think of, to see if I'm missing something 😅
Adding those to #47036, since that PR has the actual, public set_content_inside_balanced_tags
method. (I've updated that PR to implement that method now using add_lexical_update
from this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this method – $this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $text );
can be done in-place where its needed.
I'd really like to avoid any possibility of adding colliding updates here. I think at that point we've exposed too much of the internal logic of the system or we have failed to maintain our internal consistency.
+1 to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this method –
$this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $text );
can be done in-place where its needed.
I'm exposing it (protected
) for use by set_content_inside_bookmarks
(which is a method of WP_HTML_Processor
, which in turn is derived from WP_HTML_Tag_Processor
). I don't mind removing the method, but it means I'll have to make $lexical_updates
protected
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
08d1380
to
4ea8fa6
Compare
f6581ac
to
8ec0306
Compare
7f84941
to
b24d6c7
Compare
8ec0306
to
7abe25d
Compare
As I've now implemented bookmark invalidation for bookmarks that are "overwritten", I'll tentatively open this for review. Note that since the method itself is |
The changes look good to me. Let's observe how we end up using this API – same as Dennis I am worried about the possibility of overlapping updates. I'll leave the final approval to @dmsnell as he clearly gave this code more though. |
Thank you, Adam! Yeah, that was my major point of concern as well -- and the ultimate goal for this PR to avoid them! 😅 Per this convo, I've tried to ascertain that mostly by adding test coverage to #47036 😊 LMK if you can think of any other scenarios that aren't covered by those tests 😄 |
05bc7fe
to
41a3919
Compare
What does the addition of It seems like we now have two PRs here: one which unlocks a private method to subclassing and one which addresses some accounting issues for bookmarks that are eliminated by updates. Would it take much work to extract the bookmarking PR? I understand that before this PR the bookmark issue isn't really an issue at all, but I don't think it would be wrong to add the safety into the system before it's needed, and the additional integer comparison checks shouldn't appreciably slow anything down, particularly since we're only ever theoretically taking the same branch until more work is made available, such as proposed in this PR |
For context, this PR originally also contained a
I can file a PR to extract the bookmarking logic 👍 |
8b92ec6
to
48f430d
Compare
48f430d
to
3985596
Compare
Rebased, now that the bookmark invalidation PR (#47559) has been merged. |
with also, how do you feel about moving this PR into WordPress/WordPress-develop? |
Now I came around again @ockham @dmsnell . The name I've been using this PR for the last week and struggled to:
I'd add support for these use-cases before merging this PR. Otherwise, we're replacing a name that reflects the limitations reasonably well (attribute updates) with one that doesn't. |
It shouldn't suggest this. It should only suggest that at the point we have the diff it's purely lexical, based on the text and void of semantic information about the structure of the document.
Does it though? It doesn't in For both of the situations you listed I kind of envisioned we could create a zero-width bookmark before a tag starts and use that as a reference. I have not tested this out in practice, but that bookmark wouldn't interfere with any syntax and it wouldn't be affected by |
Probably not anymore!
Yeah, it would've been needed mostly if There's one more reason I find somewhat compelling: For the update handling logic, the update array keys (which happen to be attribute names) are purely circumstantial. This PR basically reflects that updates can exist without such keys, and that attribute updates are basically just one class of such updates.
Happy to -- but would you say it's even still worthwhile pursuing? 😊 |
Yeah, that's how I read "lexical" as well 👍😊
Seems like a reasonable approach to me 👍 |
this wasn't circumstantial, because those keys were chosen so that subsequent updates would automatically replace earlier updates without us needing to track them. that we can use numeric keys for non-attribute updates isn't entirely chance either. 😄
the difference is "the ability to apply a diff to any part of the markup" the rules are still supposed to be maintained about where an update can occur to an input document. that we've stripped those semantic rules away at the point we enqueue an update doesn't imply that those rules don't exist. my point is that there's nothing in this class at all that should suggest that we can or should do arbitrary edits to the HTML. opening up the ability to do this is an unfortunate requirement of extending the class, and we should be on the lookout for code trying to apply diffs to "any part of the markup" and learn how we can better enforce the rules that it shouldn't be able to. never should it suggest you can change |
To me, there still is. Lexical update only has a start and an end; nothing about it communicates these semantic rules. An "attribute update" may be just a That being said, we need more than attribute updates here or else we'll reimplement the same logic in Directive processor, HTML processor, etc. Perhaps this PR is for the best then – we'll likely find ourselves adjusting the flushing logic to cater more use-cases anyway. In other words, I'm fine with this name change that I think is a bit misleading because I believe the behavior will change in a way I'll find more intuitive.
Oh I didn't mean that. I just meant changing the current |
if we're getting "this allows arbitrary changes" out of "lexical updates" then I think we should scrap lexical updates, but find something better than "attribute updates" "semantic_patches" ? I'm still struggling with why having text-based diffs implies the rules go out the window. these aren't and never were semantic updates, and from the day we created attribute updates we said "it's up to anyone creating these to maintain the semantic rules" and clearly we haven't found a good name yet to communicate that.
|
Closing this PR as obsolete, now that we have WordPress/wordpress-develop#4519. |
What?
Part of #44410.
Allow updating of arbitrary HTML (rather than just tag attributes) by derived classes by making
$lexical_updates
protected
, and adding some required logic.Why?
This is needed in order to be able to implement methods like
set_content_inside_balanced_tags()
. See #47036 and #46680 (comment) for more context.How?
As noted elsewhere:
This PR thus:
$attribute_updates
(after WP_HTML_Tag_Processor: Rename attribute_updates to lexical_updates #47053 has renamed them to$lexical_updates
);attribute_updates_to_lexical_updates()
method (inspired byclass_name_updates_to_attribute_updates()
).apply_lexical_updates()
.$lexical_updates
protected
.These changes only affect existing
private
methods and members, so we're not breaking any APIs.TODO
With "arbitrary" lexical updates, there's more potential for collisions:
apply_lexical_updates
that invalidates bookmarks, if necessary.Even worse, there's a chance that lexical updates themselves collide. Imagine that in the aforementioned case, we have an attribute update for the tag that gets overwritten by the lexical update. The only way to avoid this might be to flush attribute updates before adding the lexical update (see).Edit: Or maybe this isn't really a problem after all, see WP_HTML_Tag_Processor: Allow non-attribute lexical updates #47068 (comment).Testing Instructions
See unit tests.