Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add site test for page cache #6456

Merged
merged 56 commits into from
Sep 18, 2021
Merged

Conversation

dhaval-parekh
Copy link
Collaborator

@dhaval-parekh dhaval-parekh commented Jul 15, 2021

Summary

Fixes #4386

Page Caching Detected

image

Page Caching Not Detected

No error scenario:

image

Error scenario:

image

Absent Persistent Object Caching when Page Caching Present

As outlined in #5780 (comment), when page caching is detected, the persistent object caching will no longer have a status of recommended but will rather be good.

image

Checklist

  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@github-actions
Copy link
Contributor

github-actions bot commented Jul 15, 2021

Plugin builds for 4d905f2 are ready 🛎️!

Comment on lines 238 to 246
$intl_idna_variant = defined( 'INTL_IDNA_VARIANT_UTS46' ) ? INTL_IDNA_VARIANT_UTS46 : false;

// phpcs:ignore PHPCompatibility.Constants.RemovedConstants.intl_idna_variant_2003Deprecated
$domain = idn_to_utf8( $domain, IDNA_DEFAULT, defined( 'INTL_IDNA_VARIANT_UTS46' ) ? INTL_IDNA_VARIANT_UTS46 : INTL_IDNA_VARIANT_2003 );
$intl_idna_variant = ( empty( $intl_idna_variant ) && defined( 'INTL_IDNA_VARIANT_2003' ) ) ? INTL_IDNA_VARIANT_2003 : false;

// The third parameter is set explicitly to prevent issues with newer PHP versions compiled with an old ICU version.
if ( ! empty( $intl_idna_variant ) ) {
$domain = idn_to_utf8( $domain, IDNA_DEFAULT, $intl_idna_variant );
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you noted in chat @dhaval-parekh this change was necessary due to phpstan throwing the error:

 ------ --------------------------------------------------------------------
  Line   includes/class-amp-http.php
 ------ --------------------------------------------------------------------
  240    Constant INTL_IDNA_VARIANT_2003 not found.
         💡 Learn more at https://phpstan.org/user-guide/discovering-symbols
 ------ --------------------------------------------------------------------

This is not being logged as an error currently as this only occurs on PHP 8. The PHP static analysis GHA currently runs on PHP 7.4 so we'll have to bump that.

Copy link
Contributor

@pierlon pierlon Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #6466 to bump the PHP version to 8.0, amongst other related changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this start failing in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly the INTL_IDNA_VARIANT_2003 constant did not have a is_defined check and so phpstan flagged it as it would be undefined in PHP 8.

Copy link
Member

@westonruter westonruter Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so because Dhaval was doing development locally using PHP 8. It wasn't failing on GHA (yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

Comment on lines +328 to +330
if ( null === $type ||
( is_a( $type, 'ReflectionType' ) && method_exists( $type, 'isBuiltin' ) && $type->isBuiltin() )
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this change was also necessary due to phpstan on PHP 8 failing with:

 ------ ----------------------------------------------------------
  Line   src/Infrastructure/Injector/SimpleInjector.php
 ------ ----------------------------------------------------------
  328    Call to an undefined method ReflectionType::isBuiltin().
 ------ ----------------------------------------------------------

According to the PHP docs the isBuiltin() method has been moved to the ReflectionNamedType class in PHP 8:

ReflectionType has become abstract and ReflectionType::isBuiltin() has been moved to ReflectionNamedType::isBuiltin().

Copy link
Contributor

@pierlon pierlon Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #6466 to bump the PHP version to 8.0, amongst other related changes.

@dhaval-parekh dhaval-parekh requested a review from pierlon July 21, 2021 10:22
add_action( 'admin_print_styles-site-health.php', [ $this, 'add_styles' ] );
}

add_action( 'template_redirect', [ $this, 'render_random_string_page' ] );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be limited to only users who have the ability to view Site Health checks:

Suggested change
add_action( 'template_redirect', [ $this, 'render_random_string_page' ] );
if ( current_user_can( 'view_site_health_checks' ) ) {
add_action( 'template_redirect', [ $this, 'render_random_string_page' ] );
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierlon We should not add that check. Since idea is to check if page cache is enabled for end-user or not. And for that reason, we make a plain request (without any cookies). If we add that check then when we make a request that won't execute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only time we would need this request to execute is to check if page caching works, no?

If we add that check then when we make a request that won't execute.

It won't execute for users who can't view the Site Health checks, which I think would be valid in this case. One problem I did not recognize at the time was that if page caching was enabled it would be possible for a non-authenticated user to request that URL and cache the home page instead of the random number. That could be resolved by using a random query param value (like home_url( '?amp_page_cache=' . wp_rand( 1000, 10000 ) ) instead of home_url( '?amp_page_cache=1' ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierlon I think this does need to be added for unauthenticated users. The site health check is run by an authenticated user, but the requests are made in an unauthenticated context. So the template_redirect needs to be added for unauthenticated users.


@dhaval-parekh After thinking about this some more, I think the use of a query parameter will be problematic a some caching plugins prevent all caching altogether when query parameters are present.

So here's my suggestion:

  1. Use a HTTP request header to request the random string, and send that random string back via an HTTP response header rather than dying.
  2. Even better, perhaps the entire random variable in the response was a bad idea from me to begin with. To detect page caching, should we not rather be checking the Expires response header that in the future or that Cache-Control that has a non-zero max-age? Alongside Expires/Cache-Control we can also check if the response includes an ETag header and/or a Last-Modified header, which are also signals that the response is cacheable.

If doing the latter, we don't need to do multiple requests. We can just do one request to the homepage, without any cookies and without any custom request headers or query parameters. If that one request returns with Expires/Cache-Control/Last-Modified/ETag that indicates caching is page enabled, then that all we need.

This presumes that page caching always will include HTTP response headers that tell the client they can cache the pages as well, which I think is a fair assumption. Caching plugins are “behooved” to send such response headers as it allows other caching layers to cache and serve back responses without requiring hits to origin.

Nevertheless, if it turns out that caching plugins don't send back response headers for client-cacheability, then there will still be the need to do multiple requests as in option 1 above, where the client can send a X-AMP-Request-Random-Number: 1 request header, and then the response can include an X-AMP-Random-Number: ... response header. By using HTTP headers, we can avoid false negatives for detection while also avoiding accidentally polluting the cache by using harmless HTTP headers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these approaches are needed actually. The Site Health test can have two degrees of passing:

  1. Server-caching: Repeated unauthenticated requests to the server return the same response (by checking the random response header).
  2. Client-caching: Server sends back Cache-Control/Expires/Last-Modified/ETag headers.

Both of these should be checked because both will result in a better page experience. For 1, this reduces the TTFB for users requesting a page even if they have not visited before. For 2, this allows browsers to serve from disk cache a page that they previously visited, either without hitting the server at all (if within the max-age) or by making a conditional request (sending If-Modified-Since or If-None-Match requests headers).

The first check should be critical in terms of priority. Otherwise, if that passes, the second one would be recommended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If repeated requests with an X-AMP-Request-Random-Number request header result in responses without a X-AMP-Random-Number response header, it could be that there is a caching layer that is stripping that out. In such case, that could indicate that caching is working and the client-caching check should be used to confirm that server-caching is indeed working.

Copy link
Contributor

@pierlon pierlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides my suggestion in #6456 (comment), this looks great ✅.

Passing unto @westonruter for review.

'label' => $this->get_badge_label(),
'color' => $is_using_page_cache ? 'green' : 'orange',
],
'description' => esc_html__( 'The AMP plugin performs at its best when page object cache is enabled.', 'amp' ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'description' => esc_html__( 'The AMP plugin performs at its best when page object cache is enabled.', 'amp' ),
'description' => esc_html__( 'The AMP plugin performs at its best when page caching is enabled.', 'amp' ),

add_action( 'admin_print_styles-site-health.php', [ $this, 'add_styles' ] );
}

add_action( 'template_redirect', [ $this, 'render_random_string_page' ] );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only time we would need this request to execute is to check if page caching works, no?

If we add that check then when we make a request that won't execute.

It won't execute for users who can't view the Site Health checks, which I think would be valid in this case. One problem I did not recognize at the time was that if page caching was enabled it would be possible for a non-authenticated user to request that URL and cache the home page instead of the random number. That could be resolved by using a random query param value (like home_url( '?amp_page_cache=' . wp_rand( 1000, 10000 ) ) instead of home_url( '?amp_page_cache=1' ).

Comment on lines 252 to 254
if ( ! empty( $query_var ) ) {
die( esc_html( wp_rand( 1000, 10000 ) ) );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: some page caching plugins may ignore custom query parameters from impacting the cache, and this could result in the random number page being cached for responses without the query parameter.

add_action( 'admin_print_styles-site-health.php', [ $this, 'add_styles' ] );
}

add_action( 'template_redirect', [ $this, 'render_random_string_page' ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierlon I think this does need to be added for unauthenticated users. The site health check is run by an authenticated user, but the requests are made in an unauthenticated context. So the template_redirect needs to be added for unauthenticated users.


@dhaval-parekh After thinking about this some more, I think the use of a query parameter will be problematic a some caching plugins prevent all caching altogether when query parameters are present.

So here's my suggestion:

  1. Use a HTTP request header to request the random string, and send that random string back via an HTTP response header rather than dying.
  2. Even better, perhaps the entire random variable in the response was a bad idea from me to begin with. To detect page caching, should we not rather be checking the Expires response header that in the future or that Cache-Control that has a non-zero max-age? Alongside Expires/Cache-Control we can also check if the response includes an ETag header and/or a Last-Modified header, which are also signals that the response is cacheable.

If doing the latter, we don't need to do multiple requests. We can just do one request to the homepage, without any cookies and without any custom request headers or query parameters. If that one request returns with Expires/Cache-Control/Last-Modified/ETag that indicates caching is page enabled, then that all we need.

This presumes that page caching always will include HTTP response headers that tell the client they can cache the pages as well, which I think is a fair assumption. Caching plugins are “behooved” to send such response headers as it allows other caching layers to cache and serve back responses without requiring hits to origin.

Nevertheless, if it turns out that caching plugins don't send back response headers for client-cacheability, then there will still be the need to do multiple requests as in option 1 above, where the client can send a X-AMP-Request-Random-Number: 1 request header, and then the response can include an X-AMP-Random-Number: ... response header. By using HTTP headers, we can avoid false negatives for detection while also avoiding accidentally polluting the cache by using harmless HTTP headers.

add_action( 'admin_print_styles-site-health.php', [ $this, 'add_styles' ] );
}

add_action( 'template_redirect', [ $this, 'render_random_string_page' ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these approaches are needed actually. The Site Health test can have two degrees of passing:

  1. Server-caching: Repeated unauthenticated requests to the server return the same response (by checking the random response header).
  2. Client-caching: Server sends back Cache-Control/Expires/Last-Modified/ETag headers.

Both of these should be checked because both will result in a better page experience. For 1, this reduces the TTFB for users requesting a page even if they have not visited before. For 2, this allows browsers to serve from disk cache a page that they previously visited, either without hitting the server at all (if within the max-age) or by making a conditional request (sending If-Modified-Since or If-None-Match requests headers).

The first check should be critical in terms of priority. Otherwise, if that passes, the second one would be recommended.

*/
private function is_site_has_page_cache() {

$request_url = home_url( '?amp_page_cache=1' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See notes above about how an HTTP request header will likely be a better approach.

add_action( 'admin_print_styles-site-health.php', [ $this, 'add_styles' ] );
}

add_action( 'template_redirect', [ $this, 'render_random_string_page' ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If repeated requests with an X-AMP-Request-Random-Number request header result in responses without a X-AMP-Random-Number response header, it could be that there is a caching layer that is stripping that out. In such case, that could indicate that caching is working and the client-caching check should be used to confirm that server-caching is indeed working.

@dhaval-parekh dhaval-parekh requested a review from pierlon August 10, 2021 07:28
}

if ( wp_doing_ajax() ) {
add_action( 'wp_ajax_health-check-site-status', [ $this, 'ajax_site_status' ], 11 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could instead run at the wp_ajax_health-check-amp-page_cache action, which would simplify the ajax_site_status method and could be renamed to ajax_check_page_cache, for example.

Suggested change
add_action( 'wp_ajax_health-check-site-status', [ $this, 'ajax_site_status' ], 11 );
add_action( 'ajax_site_status', [ $this, 'ajax_check_page_cache' ] );

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pierlon

We can not change the action name, since it triggers from the "Health Check" plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK I see. However, this test is not running whenever the Health Check plugin is disabled though, as the AJAX action is now health-check-amp-page_cache instead of wp_ajax_health-check-site-status:

Health Check plugin disabled:

image

Health Check plugin enabled:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this test is not running whenever the Health Check plugin is disabled though

This test will only be on the "Health Check" page if the health-check plugin is active. Here we are just adding one more callback in already registered ajax call specifically for amp_page_cache.

Copy link
Contributor

@pierlon pierlon Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but on the Site Health page the page cache test is not running because there is no action registered to return the test result. With the Health Check plugin enabled it works just fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of WordPress 5.2 and r49154, the Site Health tests can now be invoked via the REST API . I'll try reworking this to make use of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eww, but the Site Health plugin doesn't support the REST API async tests 😦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the issue is that the Health Check plugin is way out of date. The last release was 2 years ago. There are various issues reported in the GitHub repo about incompatibilities with current tests. So I'm going to add logic to prevent this REST-based async test from being added when on WP<5.6 and when this stale version of Health Check is active.

@dhaval-parekh dhaval-parekh requested a review from pierlon August 13, 2021 12:17
@westonruter
Copy link
Member

I just tested this with WP Super Cache and… it doesn't work. Looks like it's a fatal flaw in my proposed solution. For one, by default HTTP headers are not cached with the page content:

image

Even when that is enabled, the plugin will only cache the known headers, which naturally does not include any X-AMP... request header.

So yeah, my whole random number in a request header as I proposed in #6456 (comment) was wrong.

We'll instead need to only look at response headers to see if caching is enabled.

@pierlon
Copy link
Contributor

pierlon commented Sep 17, 2021

Can confirm I had to fine-tune W3 Total Cache to work as well. I had to tell it to manually cache the header:

image

@westonruter
Copy link
Member

The first request made to a page when WP Super Cache is enabled will return no Cache-Control response header (which seems like an omission), whereas the second request when a page has been cached will return: Cache-Control: max-age=3, must-revalidate. So at least for WP Super Cache we can check for page caching by making 2 requests to the homepage, and if the second one includes a Cache-Control response header with a max-age value greater than 0, then that would be a pass.

@pierlon What does W3 Total Cache return for repeated requests to the homepage?

@westonruter
Copy link
Member

For Pantheon with Varnish, the relevant response headers (on amp-wp.org):

age: 10
cache-control: public, max-age=600

We could continue to check for a non-zero age as well as a max-age in Cache-Control.

Looking at a post from WP Rocket blog, they return:

last-modified: Fri, 17 Sep 2021 13:56:37 GMT
etag: "61449e95-18219"
expires: Sun, 17 Oct 2021 19:33:07 GMT
cache-control: max-age=2592000

The 1-month expiration on the document is quite surprising to me. Nevertheless, it is a number greater than zero, so it would pass our page cache test.

@pierlon
Copy link
Contributor

pierlon commented Sep 17, 2021

@pierlon What does W3 Total Cache return for repeated requests to the homepage?

By default it sends:

cache-control: max-age=3600, public

@westonruter
Copy link
Member

Looking at WordPress.com for an example post, Batcache sends:

last-modified: Fri, 17 Sep 2021 19:38:00 GMT
cache-control: max-age=300, must-revalidate
x-nananana: Batcache

For Batcache, by default it will only cache a page after 2 visits.

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew, this was another journey!

@westonruter westonruter requested a review from pierlon September 17, 2021 23:09

if ( ! $is_using_object_cache ) {
if ( $has_page_caching ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the page cache test is async and this is direct, there would be a chance of this being a false positive right?

Copy link
Member

@westonruter westonruter Sep 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the first time the Site Health tests run then this will always be false. I don't see a way around that so I think it's acceptable for the first time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yea, seems like an acceptable trade off.

}

$expires_header = wp_remote_retrieve_header( $http_response, 'expires' );
if ( $expires_header && strtotime( $expires_header ) > time() ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Copy link
Contributor

@pierlon pierlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you hard work on this @dhaval-parekh and kudos to @westonruter for getting across the finish line. LGTM!

@westonruter westonruter merged commit 77ff4c4 into develop Sep 18, 2021
@westonruter westonruter deleted the add/4386-detect-page-caching branch September 18, 2021 05:48
@westonruter westonruter added the Changelogged Whether the issue/PR has been added to release notes. label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelogged Whether the issue/PR has been added to release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encourage use of page caching
3 participants