-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sanitization reporting #912
Conversation
…hitelist sanitizer
In Development Hi @westonruter, |
Building upon Weston's work and solution design, Add a class to track whenever a node or attribute is removed. And a method to get whether a node was removed. The format of the stored nodes and attributes might change. This will probably depend on the error reporting needed in the REST API and GET request response.
There was an error: Class file names should be based on the class name with 'class-' But the format of the other test files is different. So use that format, and exclude this rule for test files.
The 'mutation_callback' will then track removed nodes and attributes. Also, change the way in which we pass the 'mutation_callback.' Before, it was part of the constructor of: AMP_Tag_And_Attribute_Sanitizer. Instead, move it to the $args of: AMP_Content_Sanitizer::sanitize(). This will pass it to all of the sanitizer/* files when they're instantiated. @todo: look at whether to call the callback for all node removals.
Before, there were 3 places in the file that called removeChild(). This was fine, but they now need to call the mutation callback. So abstract these into remove_child(). Also, call the mutation callback in AMP_Video_Sanitizer.
Per Weston's description in PR #912, It allows sending a POST with markup for validation. The headers should have 'Content-Type' of 'application/json.' And it should pass the markup in the param 'markup.' The current response only has 'is_error.' @todo: look at returning more in the response, like the stripped tags and attributes. Also, add nonce verification.
There's an existing handler to create 'amp-carousel' elements: class AMP_Gallery_Embed_Handler. So override the 'Gallery' widget class. And use that in render_media(). Otherwise, that function is copied from the parent. It calls gallery_shortcode() at the end. Which doesn't have a filter for the markup.
This is only one approach. But for now, the response has counts for: 'removed_nodes' and 'removed_attributes'. If a <script> is removed, 'removed_nodes' will be: {"script":1}. The count will increment every time the same node type is removed. There is a similar histogram for 'removed_attributes'.
This reverts commit aed76c6.
Abstract the logic for the response into get_response(). This enables using it for the existing REST API logic, And the new use-case of full-page GET requests.
Screencast Of REST API Responses Here's a screencast of the current validation via the REST API. The schema of the response will probably change as I look at outputting it in a frontend GET request. |
In a frontend GET request, add a header: 'AMP-Validation-Error'. This outputs whether the sanitizers stripped nodes or tags. A possible output is: '{"has_error":true,"removed_nodes":{"script":1},"removed_attributes":{"async":1}}'
includes/class-amp-theme-support.php
Outdated
*/ | ||
public static function finish_buffer_add_header( $output ) { | ||
$markup = self::finish_output_buffering( $output ); | ||
AMP_Mutation_Utils::add_header(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to limit this to only be added when a user specifically requests this additional information. Like there should be a nonce that must be present to authorize the reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @westonruter. That's a good idea to add the header only for users with a nonce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied with a nonce, details to come shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMP_Mutation_Utils::add_header()
now has a check for authorization via self::is_authorized()
. That verifies the nonce and capability.
includes/class-amp-theme-support.php
Outdated
@@ -523,6 +535,7 @@ public static function finish_output_buffering( $output ) { | |||
$dom = AMP_DOM_Utils::get_dom( $output ); | |||
$args = array( | |||
'content_max_width' => ! empty( $content_width ) ? $content_width : AMP_Post_Template::CONTENT_MAX_WIDTH, // Back-compat. | |||
'mutation_callback' => 'AMP_Mutation_Utils::track_removed', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per above, this should be conditional based on whether an authorized nonce is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This is now only added if AMP_Mutation_Utils::is_authorized()
. That checks for the nonce and capability.
includes/class-amp-theme-support.php
Outdated
@@ -506,13 +506,25 @@ public static function get_amp_component_scripts() { | |||
* Start output buffering. | |||
*/ | |||
public static function start_output_buffering() { | |||
ob_start( array( __CLASS__, 'finish_output_buffering' ) ); | |||
ob_start( array( __CLASS__, 'finish_buffer_add_header' ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave this as finish_output_buffering
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this commit changes the name back to finish_output_buffering()
.
Example Of GET Response Headers Hi @westonruter, |
@kienstra thanks for that great video. Really cool to see. Aside: you can still use Postman with authenticated requests if you just install the basic auth plugin. |
* @since 0.7 | ||
* | ||
* @param object $child The node to remove. | ||
* @param object $parent The parent node for which to remove the child (optional). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of object
these should probably be DOMNode
.
Also, is the $parent
even needed since it will always be available via $child->nodeParent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @westonruter. This commit changes the @param
type to DOMNode
, and removes the second parameter. Thanks for pointing out that the second isn't needed.
$parent->removeChild( $child ); | ||
} | ||
|
||
if ( isset( $this->args['mutation_callback'] ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that remove_child
could be called without any node being removed, per the above if
/elseif
conditions. Should this check to see if something was actually removed? I'm curious as to the scenarios when the else
condition would be met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a good point. This commit from above has the if
conditional nested, to ensure there was a removal before calling the 'mutation_callback'
.
As Weston mentioned, the child could get the parentNode. So there's no reason for the elseif. Also, this makes it possible to nest the 'mutation_callback.' So it's only called if there's a removal.
$response = array( | ||
'has_error' => self::was_node_removed(), | ||
'removed_nodes' => self::$removed_nodes, | ||
'removed_attributes' => self::$removed_attributes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful if the processed markup were also returned here. It could then be used for previewing, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @westonruter, that would help. Do you think we should return the processed markup only if we're validating a limited amount of markup, like a single Gutenberg block? We could detect this by whether get_response()
is called with a $markup
argument.
if ( isset( $markup ) ) {
$response['processed_markup'] = $markup;
};
If get_response()
is called without a $markup
argument, it's validating the entire document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if markup is supplied to validate, then it should get returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @westonruter.
These commits output the 'processed_markup'
, when $markup
is passed to the function.
This function has the same logic as the current get_buffer(). But the name is more descriptive.
Respond with the markup that is submitted in the request, In the value 'processed_markup'. Full-page requests won't have the markup in the response. esc_html() might not be the best way to escape the markup. But it doesn't display properly without escaping.
The function prepare_response() doesn't exist. Instead, use get_buffer().
Before, the error message always appeared. This is because it only checked that the response had a value for 'has_error'. But this needs to be true in order for there to be a reported error.
Fixing My Mistakes In Resolving Merge Conflicts Hi @ThierryA, |
I had renamed some methods in this branch: add/sanitization-reporting. Also, remove the parameter from finish_output_buffering(). That function in the 'develop' branch no longer has as parameter.
* | ||
* @return void. | ||
*/ | ||
public static function add_header() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually used yet. I think it was a foundation for validating plugins on activation (#842)
@@ -11,6 +11,9 @@ | |||
</properties> | |||
</rule> | |||
|
|||
<rule ref="WordPress.Files.FileName.InvalidClassFileName"> | |||
<exclude-pattern>tests/*</exclude-pattern> | |||
</rule> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all of the test files begin with test-
, while this rule states that they should begin with class-
.
There were different characters in prepare_response(), Mabye from copying from GitHub. Also, adjust documentation, and add a @codingStandardsIgnoreEnd.
* | ||
* @return void. | ||
*/ | ||
public static function amp_rest_validation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't currently used, though it would be helpful for Gutenberg block validation.
Ready For Review Hi @ThierryA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @kienstra, nice an clean coding here. I left my CR below.
@@ -704,6 +704,9 @@ private function sanitize_disallowed_attributes_in_node( $node, $attr_spec_list | |||
|
|||
foreach ( $attrs_to_remove as $attr ) { | |||
$node->removeAttributeNode( $attr ); | |||
if ( isset( $this->args['mutation_callback'], $attr->name ) ) { | |||
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attr->name ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe AMP_Mutation_Util::ATTRIBUTE_REMOVED
could be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this commit from above substitutes the constant.
if ( method_exists( $node, 'removeAttribute' ) ) { | ||
$node->removeAttribute( $attribute ); | ||
if ( isset( $this->args['mutation_callback'] ) ) { | ||
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attribute ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe AMP_Mutation_Util::ATTRIBUTE_REMOVED
could be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this commit from above substitutes the constant.
if ( method_exists( $child->parentNode, 'removeChild' ) ) { // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar. | ||
$child->parentNode->removeChild( $child ); // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar. | ||
if ( isset( $this->args['mutation_callback'] ) ) { | ||
call_user_func( $this->args['mutation_callback'], $child, 'removed' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe AMP_Mutation_Util::NODE_REMOVED
could be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this commit uses that constant instead of the string.
/** | ||
* The argument if an attribute was removed. | ||
* | ||
* @const array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For constants the @var
tag must be used for the PHPDoc to be valid. This comment applies in multiple constants in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this commit changes the tags to @var
.
* Tracks when a sanitizer removes an attribute or node. | ||
* | ||
* @param array $histogram The count of attributes or nodes removed. | ||
* @param string $key The attribute or node name removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a wonder Travis didn't pickup alignment comments miss alignment, it might only sniff the variables alignment actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also align the comments, like:
@param array $histogram The count of attributes or nodes removed.
@param string $key The attribute or node name removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @return void. | ||
*/ | ||
public static function display_error() { | ||
$error = isset( $_GET[ self::ERROR_QUERY_KEY ] ) ? sanitize_text_field( wp_unslash( $_GET[ self::ERROR_QUERY_KEY ] ) ) : ''; // WPCS: CSRF ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compliance sake, it would be good to check nonce here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* | ||
* @since 0.7 | ||
*/ | ||
class AMP_Mutation_Utils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Mutation stands for here, it could because I am native French but AMP_Validation_Utils
would make more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that AMP_Validation_Utils
would be better. Mutation refers to the mutation_callback
, which tracks when the sanitizer removes an attribute or node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit changes the name of the classes.
* | ||
* @since 0.7 | ||
*/ | ||
class AMP_Mutation_Utils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an architectural perspective, it is interesting that all methods are public static
. Are they really all meant to but used in that purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much better to not have all of these methods static
. This might mean bootstrapping the class somewhere else.
It's currently calling AMP_Validation_Utils::init();
here in amp.php. It might help if we could instantiate the class somewhere, like:
$validation_utils = new AMP_Validation_Utils();
validation_utils->init();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amp_post_meta_box() might be a good model for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is not really a pressing need or a blocker though, it doesn't have to be addressed in this PR.
This was previously 'AMP_Mutation_Utils' The new name describes better what this does.
On Thierry's suggestion, As these were already stored in constants.
Use check_admin_referer(), as this will display the 'are you sure' message. Also , update the test.
In PHPDoc blocks, most of the comments weren't aligned. The types aren't aligned.
* Skip reporting iframe removal when merely being moved * Skip reporting removal of form[action] attribute when transformed to action-xhr. * Rename sanitizer base methods to make explicit they are for removal of invalid nodes.
* | ||
* @var array. | ||
*/ | ||
public static $removed_nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to removed_elements
since that is what it contains.
*/ | ||
public static function init() { | ||
add_action( 'rest_api_init', array( __CLASS__, 'amp_rest_validation' ) ); | ||
add_action( 'save_post', array( __CLASS__, 'validate_content' ), 10, 2 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this in the save_post
action, let's do this right inside the edit_form_top
so that when the post is loaded they will see the notification. In this way they will be able to activate the AMP plugin and then be informed before they even start editing a post that it has invalid elements.
This would then eliminate the need for the nonce in the request.
*/ | ||
public static function validate_content( $post_id, $post ) { | ||
unset( $post_id ); | ||
if ( ! self::is_authorized() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to check if post_supports_amp( $post )
. Because if not, the user wouldn't want a warning.
includes/class-amp-theme-support.php
Outdated
@@ -752,7 +752,9 @@ public static function start_output_buffering() { | |||
* @see AMP_Theme_Support::start_output_buffering() | |||
*/ | |||
public static function finish_output_buffering() { | |||
echo self::prepare_response( ob_get_clean() ); // WPCS: xss ok. | |||
$output = self::prepare_response( ob_get_clean() ); | |||
AMP_Validation_Utils::add_header(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to remove thus until we have it being utilized in the plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
Hi @westonruter,
This pull request is approved. Thanks a lot for improving this so much. The notices of which tags and attributes were removed really helps.
POST
'ed and get back results for what changes were made. An element removal could be considered a sanitization error. The request would require anedit_post
cap and require a nonce (see this screencast)Relates to content validation (#843) and also determining when a theme or plugin does something invalid (#842).
Fixes #843.