Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The HTML4 meta[http-equiv=Content-Type] causes validation error to leak and should convert automatically to HTML5 meta[charset] #3469

Closed
westonruter opened this issue Oct 8, 2019 · 4 comments · Fixed by #3758
Labels
Bug Something isn't working QA passed Has passed QA and is done
Milestone

Comments

@westonruter
Copy link
Member

westonruter commented Oct 8, 2019

Bug Description

As reported in a support forum topic, if a theme like Koji declares the Content-Type using the HTML4 method like this:

<meta http-equiv="content-type" content="<?php bloginfo( 'html_type' ); ?>" charset="<?php bloginfo( 'charset' ); ?>" />

Then the result the AMP plugin generating:

<meta content="text/html; charset=UTF-8" charset="UTF-8" http-equiv="Content-Type">

Somehow the validating sanitizer is leaking the http-equiv and content attributes here, and this causes an error in the validator:

image

This is even though the sanitizer is reporting the former as being invalid:

image

Even though it is accepted, it is not getting sanitized. So that's confusing.

The AMP plugin should transparently convert any such meta Content-Type tag into the HTML meta charset that AMP requires:

<meta charset=utf-8>

See also #2970. This would have avoided an error in adding support for Twenty Twenty (#3342 (comment)) which was fixed in Twenty Twenty via WordPress/twentytwenty#640.

Here is a workaround plugin for the issue: https://gist.github.com/westonruter/d18fd37fb9a58a26ae375a0cb9570831

There should not be any need to show a validation error as this should automatically get converted. If the site is using an encoding other than UTF-8, then technically the encoding should be getting converted to UTF-8 before being parsed. See:

// @todo If 'utf-8' is not the blog charset, then we'll need to do some character encoding conversation or "entityification".
if ( 'utf-8' !== strtolower( get_bloginfo( 'charset' ) ) ) {
/* translators: %s: the charset of the current site. */
trigger_error( esc_html( sprintf( __( 'The database has the %s encoding when it needs to be utf-8 to work with AMP.', 'amp' ), get_bloginfo( 'charset' ) ) ), E_USER_WARNING ); // phpcs:ignore WordPress.PHP.DevelopmentFunctions.error_log_trigger_error
}

This would fix #855.

Expected Behaviour

An AMP-valid HTML5 meta charset should only be output.

Steps to reproduce

  1. Install and activate Koji theme.
  2. Activate Standard or Transitional mode.
  3. See the the AMP Validator report a validation error for http-equiv attribute.
  4. See the admin bar report a validation error for the same.

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • The http-equiv attribute should not be leaked to the frontend.
  • Convert HTML4 meta[http-equiv] to HTML5 meta[charset].
  • (Nice to have) Convert encoding of input documents to UTF-8.

Implementation brief

QA testing instructions

Demo

Changelog entry

@westonruter westonruter added the Bug Something isn't working label Oct 8, 2019
@westonruter westonruter added this to the v1.3.1 milestone Oct 8, 2019
@swissspidy swissspidy modified the milestones: v1.3.1, v1.4 Oct 17, 2019
@westonruter westonruter removed this from the v1.4 milestone Oct 17, 2019
@schlessera schlessera self-assigned this Nov 14, 2019
@westonruter
Copy link
Member Author

Another support topic that may be resolved by this issue: https://wordpress.org/support/topic/amp-errors-on-search-console-after-updating-to-v1-2-0/

@schlessera
Copy link
Collaborator

This should be fixed via #3758.

@westonruter
Copy link
Member Author

Confirmed that the above build prevents there being a validation error for an http-equiv attribute, in the Koji theme.

Before

image

The page's meta charset is an unexpected:

<meta content="text/html" charset="UTF-8">

After

No such validation error present and the meta charset is as expected:

<meta charset="utf-8">

@westonruter westonruter added this to the v1.5 milestone Dec 19, 2019
@csossi
Copy link

csossi commented Feb 13, 2020

Verified in QA

@csossi csossi added the QA passed Has passed QA and is done label Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working QA passed Has passed QA and is done
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants