Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize entire HTML output when theme support is present #888

Merged
merged 9 commits into from
Jan 23, 2018

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 23, 2018

See #875.

This largely eliminates the need for amp_component_scripts entirely because the AMP component scripts are discovered via the whitelist sanitizer in #882. So for example, the Form sanitizer in #871 can be used without needing to define its get_scripts() method.

For another PR:

self::$amp_scripts = array_merge( self::$amp_scripts, $scripts );
self::$amp_styles = array_merge( self::$amp_styles, $styles );

$output = preg_replace( '#(<body.*?>)(.+)(</body>)#si', '$1' . $sanitized_inner_body . '$3', $output );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter here we're replacing the body with the sanitised body. This is because it's returning AMP_DOM_Utils::get_content_from_dom( $dom ) from the sanitize method. Wouldn't this be easier if the method returned $dom->saveHTML() if the content passed is a complete document? This would stop the need to replace the body after it was sanitized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, potentially. The sanitize method is better called sanitize_content. Better handling of sanitizing entire documents should be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidCramer how about this: 876c22f

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter Thats nice, makes way for a full doc later on. I see that you made a comment about the head needs updating to mandatory_parent_blacklist so that makes sense now.

Copy link
Contributor

@DavidCramer DavidCramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that I'm happy with with this.

* as otherwise elements from the HEAD could get added to the BODY.
*/
$sanitized_inner_body = AMP_DOM_Utils::get_content_from_dom( $dom );
$output = preg_replace( '#(<body.*?>)(.+)(</body>)#si', '$1' . $sanitized_inner_body . '$3', $output );

// Inject required scripts.
$output = preg_replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter get_amp_component_scripts doesn't require the argument anymore.

Copy link
Contributor

@DavidCramer DavidCramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @westonruter. This will help to move forward, thanks!

@westonruter
Copy link
Member Author

Hummm, something isn't working quite right.

If I add this to a template:

<script type="text/javascript">bad</script>

It is not getting stripped out as expected.

@DavidCramer
Copy link
Contributor

hmm. testing as well.

@westonruter
Copy link
Member Author

😅 whew, it's just because #884 wasn't merged yet from the 0.6 branch. Fixed with merging in ef69f2d.

Copy link
Collaborator

@ThierryA ThierryA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @westonruter @DavidCramer. This is really exciting!

$original_html = trim( ob_get_clean() );
$sanitized_html = AMP_Theme_Support::finish_output_buffering( $original_html );

$this->assertContains( '<meta charset="utf-8">', $sanitized_html );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter @DavidCramer these tests could have a bit more coverage here. I believe it is ok to do that at a later stage since the other components sanitization (form etc.) are still WIP.

@ThierryA ThierryA merged commit b8af86a into develop Jan 23, 2018
@ThierryA ThierryA deleted the add/output-buffer-sanitization branch January 23, 2018 13:54
@westonruter westonruter added this to the v0.7 milestone Jan 23, 2018
kienstra pushed a commit that referenced this pull request Jan 23, 2018
That function has been removed in PR #888.
It looks like that uses full-page buffering.
So remove the widget subclasses that use that.
@todo: look at regressions in the 'Archives' and 'Categories' dropdowns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants