-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Handle parsing changes in foreign content. #6006
Conversation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN:
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
7934be0
to
cfcd0cf
Compare
'A' === $html[ $at + 7 ] && | ||
'[' === $html[ $at + 8 ] | ||
) { | ||
$closer_at = strpos( $html, ']]>', $at + 1 ); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
ba2c889
to
3e4bb4d
Compare
As part of work to add more spec support to the HTML API, this patch adds support for SVG and MathML elements, or more generally, "foreign content." The rules in foreign content are a mix of XML and HTML parsing rules and introduce additional complexity into the processor, but is important in order to avoid getting lost when inside these elements. Developed in WordPress#6006 Discussed in https://core.trac.wordpress.org/ticket/61576 Props: dmsnell, jonsurrell. See #61576.
3e4bb4d
to
68a6b4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
in_array( $token_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) | ||
in_array( $token_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) || | ||
// Self-closing elements in foreign content. | ||
( isset( $node ) && 'html' !== $node->namespace && $node->has_self_closing_flag ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is a performance benefit for doing this, I think generally better to check for equality with null
since the variable is set. But even better here to check if it is the instance of the expected class, right?
( isset( $node ) && 'html' !== $node->namespace && $node->has_self_closing_flag ) | |
( $node instanceof WP_HTML_Token && 'html' !== $node->namespace && $node->has_self_closing_flag ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I think I did this for a reason, but I can't remember. likely I had other code in transition and had non-null non-token instances passed around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bookmarking this, the isset( $node )
seems like it would make this function return true
under some circumstances where we would not expect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was much discussion over this little function, and a lot of effort to "fix" it that only resulted in broken code.
I'd like to propose we continue with it as written and come back afterwards to clean it up.
right now, all of the "fixes" are only used in test code when expects_closer()
is called without a $node
passed in. the real problem is that there's confusion in what this function is communicating and how step()
pops elements off of the stack of open elements.
we recently realized that we need to pop elements off not in step()
, but in the parsing rules where they indicate as much. I'd rather we ship this odd bit now and create the more-sweeping fix in one go instead of stewing on code we think must be wrong but which we don't have a failing test case for.
I've added a type annotation to the function, ensuring that isset()
is sufficient, and I've added a @todo
annotation earmarking this for review. I think what's going on is a complicated interaction with step()
where this looks wrong but isn't.
By the way, this function returns "expects closer" for all closing tags, which is "wrong" but also part of the bigger issue not created in this PR.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR to address this: #7162
@@ -2237,8 +2260,12 @@ private function step_in_body(): bool { | |||
* These ought to be handled in the attribute methods. | |||
*/ | |||
|
|||
$this->bail( 'Cannot process MATH element, opening foreign content.' ); | |||
break; | |||
$this->change_parsing_namespace( 'math' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the parsing namespace changed back to html
when encountering </SVG>
or </MATH>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble figuring out from the spec where the namespace is assigned for any other token besides SVG
and MATH
, as I'm having failures with a math ANNOTATION-XML
tag
* > An end tag whose name is "script", if the current node is an SVG script element. | ||
*/ | ||
if ( $this->is_tag_closer() && 'SCRIPT' === $this->state->current_token->node_name && 'svg' === $this->state->current_token->namespace ) { | ||
$this->bail( 'Cannot parse SCRIPT tags inside SVG elements.' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this just be ignored since the scripting flag is not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can't be ignored because there are rules that we would (might) need to follow and I didn't want to dive into that at this point. perhaps the rules aren't that complicated. it would require looking into the spec, determining what is required, and if we have enough state tracking already to handle it.
if ( 'svg' === $this->get_namespace() ) { | ||
return ( | ||
'DESC' === $tag_name || | ||
'FOREIGNoBJECT' === $tag_name || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the lower-case o
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a TyPO 😄
if ( is_string( $tag_name ) ) { | ||
$tag_name = strtoupper( $tag_name ); | ||
} else { | ||
$tag_name = 'html' === $tag_name->namespace | ||
? strtoupper( $tag_name->node_name ) | ||
: "{$tag_name->namespace} {$tag_name->node_name}"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the normal case is now the object:
if ( is_string( $tag_name ) ) { | |
$tag_name = strtoupper( $tag_name ); | |
} else { | |
$tag_name = 'html' === $tag_name->namespace | |
? strtoupper( $tag_name->node_name ) | |
: "{$tag_name->namespace} {$tag_name->node_name}"; | |
} | |
if ( $tag_name instanceof WP_HTML_Token ) { | |
$tag_name = 'html' === $tag_name->namespace | |
? strtoupper( $tag_name->node_name ) | |
: "{$tag_name->namespace} {$tag_name->node_name}"; | |
} else { | |
$tag_name = strtoupper( $tag_name ); | |
} |
Also, should $tag_name
be renamed $node
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah probably. thanks for the review - there are still some logical error in here I'm tracking, specifically, trying to figure out the right place to assign namespaces and avoid infinite loops when reprocessing nodes from the foreign content rules.
@sirreal @westonruter I've pushed some updates that incorporate |
break; | ||
} | ||
|
||
echo "\e[90mPopping a \e[35m{$current_node->namespace} \e[34m{$current_node->node_name}\e[90m from the stack.\e[m\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug code I'm presuming.
echo "\e[90mPopping a \e[35m{$current_node->namespace} \e[34m{$current_node->node_name}\e[90m from the stack.\e[m\n"; |
echo "\e[90mACI is a \e[35m{$this->state->current_token->namespace}\e[90m \e[34m{$this->state->current_token->node_name}\e[m\n"; | ||
echo "\e[90mInserted a \e[35m{$this->state->current_token->namespace}\e[90m \e[34m{$this->state->current_token->node_name}\e[m\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug code I'm presuming.
echo "\e[90mACI is a \e[35m{$this->state->current_token->namespace}\e[90m \e[34m{$this->state->current_token->node_name}\e[m\n"; | |
echo "\e[90mInserted a \e[35m{$this->state->current_token->namespace}\e[90m \e[34m{$this->state->current_token->node_name}\e[m\n"; |
This is very tricky. I'll share some cases that are problematic right now: Update: I've debugged all of these cases and pushed some changes to fix them. Fixed
|
The push/pop handlers are a good opportunity to change parsing namespace as elements are encountered and moved onto the stack. This helps to keep the parsing namespace up to date without requiring state changes spread all over parsing code.
f4373ff
to
a1e48e3
Compare
As part of work to add more spec support to the HTML API, this patch adds support for the relevant foreign elements in the HTML algorithms within the stack of open elements. Although the HTML Processor cannot yet step into these elements, the format of how they will be represented was determined in the related work from which this patch is extracted. This patch extracted from WordPress#6006. Developed in https://github.com/wordpress/wordpress-develop/pull/ Discussed in https://core.trac.wordpress.org/ticket/61576 Props: dmsnell, jonsurrell. See #61576.
As part of work to add more spec support to the HTML API, this patch adds support for the relevant foreign elements in the HTML algorithms within the stack of open elements. Although the HTML Processor cannot yet step into these elements, the format of how they will be represented was determined in the related work from which this patch is extracted. This patch extracted from WordPress#6006. Developed in WordPress#7157 Discussed in https://core.trac.wordpress.org/ticket/61576 Props: dmsnell, jonsurrell. See #61576.
The specification says to run run the "any other end tag" steps under certain other conditions. A goto and label are used for this. When moving to the label, the "any other end tag" condition does not apply, the goal is to run the steps regardless. Move the label into the conditional block so that the condition is not erroneously checked. This caused a failure to return a value when stepping on `<svg><script/>`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ready to me, I'm very excited to add this support!
The amount of html5lib-tests that run (and all pass) gives me confident in this work:
OK, but incomplete, skipped, or risky tests!
Tests: 1498, Assertions: 930, Skipped: 568.
As part of work to add more spec support to the HTML API, this patch adds support for SVG and MathML elements, or more generally, "foreign content." The rules in foreign content are a mix of XML and HTML parsing rules and introduce additional complexity into the processor, but is important in order to avoid getting lost when inside these elements. Developed in #6006 Discussed in https://core.trac.wordpress.org/ticket/61576 Props: dmsnell, jonsurrell, westonruter. See #61576. git-svn-id: https://develop.svn.wordpress.org/trunk@58867 602fd350-edb4-49c9-b593-d223f7449a82
As part of work to add more spec support to the HTML API, this patch adds support for SVG and MathML elements, or more generally, "foreign content." The rules in foreign content are a mix of XML and HTML parsing rules and introduce additional complexity into the processor, but is important in order to avoid getting lost when inside these elements. Developed in WordPress/wordpress-develop#6006 Discussed in https://core.trac.wordpress.org/ticket/61576 Props: dmsnell, jonsurrell, westonruter. See #61576. Built from https://develop.svn.wordpress.org/trunk@58867 git-svn-id: http://core.svn.wordpress.org/trunk@58263 1a063a9b-81f0-0310-95a4-ce76da25c4cd
As part of work to add more spec support to the HTML API, this patch adds support for SVG and MathML elements, or more generally, "foreign content." The rules in foreign content are a mix of XML and HTML parsing rules and introduce additional complexity into the processor, but is important in order to avoid getting lost when inside these elements. Developed in WordPress/wordpress-develop#6006 Discussed in https://core.trac.wordpress.org/ticket/61576 Props: dmsnell, jonsurrell, westonruter. See #61576. Built from https://develop.svn.wordpress.org/trunk@58867 git-svn-id: https://core.svn.wordpress.org/trunk@58263 1a063a9b-81f0-0310-95a4-ce76da25c4cd
As part of work to add more spec support to the HTML API, this patch adds support for SVG and MathML elements, or more generally, "foreign content." The rules in foreign content are a mix of XML and HTML parsing rules and introduce additional complexity into the processor, but is important in order to avoid getting lost when inside these elements. This patch follows the first by deleting the empty files, which were mistakenly left in during the initial merge. Developed in #6006 Discussed in https://core.trac.wordpress.org/ticket/61576 Follow-up to [58867]. Props: dmsnell, jonsurrell, westonruter. See #61576. git-svn-id: https://develop.svn.wordpress.org/trunk@58868 602fd350-edb4-49c9-b593-d223f7449a82
As part of work to add more spec support to the HTML API, this patch adds support for SVG and MathML elements, or more generally, "foreign content." The rules in foreign content are a mix of XML and HTML parsing rules and introduce additional complexity into the processor, but is important in order to avoid getting lost when inside these elements. This patch follows the first by deleting the empty files, which were mistakenly left in during the initial merge. Developed in WordPress/wordpress-develop#6006 Discussed in https://core.trac.wordpress.org/ticket/61576 Follow-up to [58867]. Props: dmsnell, jonsurrell, westonruter. See #61576. Built from https://develop.svn.wordpress.org/trunk@58868 git-svn-id: http://core.svn.wordpress.org/trunk@58264 1a063a9b-81f0-0310-95a4-ce76da25c4cd
As part of work to add more spec support to the HTML API, this patch adds support for SVG and MathML elements, or more generally, "foreign content." The rules in foreign content are a mix of XML and HTML parsing rules and introduce additional complexity into the processor, but is important in order to avoid getting lost when inside these elements. This patch follows the first by deleting the empty files, which were mistakenly left in during the initial merge. Developed in WordPress/wordpress-develop#6006 Discussed in https://core.trac.wordpress.org/ticket/61576 Follow-up to [58867]. Props: dmsnell, jonsurrell, westonruter. See #61576. Built from https://develop.svn.wordpress.org/trunk@58868 git-svn-id: https://core.svn.wordpress.org/trunk@58264 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Trac ticket: Core-61576
Status
This is undergoing final review and preparation.
Earlier versions
First draft (ba2c889)
$skip_next_foreign_content_processing
, because we pop nodes off of the stack of open elements until we're back at anhtml
element, or an HTML integration point, or a MathML integration point. Returning tostep()
will choose insertion mode instead of foreign content and prevent an infinite loop.Second draft (3e4bb4d)
get_token_display_name()
would provide the remapping.Third draft (@sirreal's work)
Notes
<circle clippath=1 clipPath=2 CLIPPATH=3 cliPPath=4>
has a single attribute namedclipPath
and whose value is1
. This is convenient because we don't have to change the Tag Processor, but inconvenient when things are mostly based on lower-case attribute names.math:mi
ormath:MI
is in themath
namespace? There can be no HTML tag namedmath:mi
(it would beMATH:MI
).base_class_get_tag()
orprivate function comparison_tag_name()
etc… to report an upper-case tag name, while preserving the case-variants required in foreign content to outside calls.html
namespace, that it should change the namespace tosvg
ormath
, respectively, and lower-case the tag names. However, the role of integration points and parsing things in the insertion mode is still vague.get_modifiable_text()
which doesn't know if a text node inside foreign content is being processed as foreign content or in the insertion mode, where it determines if NULL bytes should be replaced or removed.Description
We should reliable detect foreign content and we need to do it in the Tag Processor, specifically because of the rules for CDATA sections. The HTML Processor needs this as well to determine if things like self-closing flags for HTML elements should be respected.
Unlocks: