Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WP Cron to periodically check site URLs for AMP validation #1756

Closed
westonruter opened this issue Dec 19, 2018 · 12 comments · Fixed by #5515
Closed

Use WP Cron to periodically check site URLs for AMP validation #1756

westonruter opened this issue Dec 19, 2018 · 12 comments · Fixed by #5515
Labels
Developer Tools Enhancement New feature or improvement of an existing one Groomed P0 High priority Site Scanning Validation WS:Core Work stream for Plugin core
Milestone

Comments

@westonruter
Copy link
Member

Breaking out from #1007 which focused specifically on using WP-CLI to do validation of an entire site.

At the moment AMP validation is only performed in response to editing a post, or activating a plugin. There should be passive validation checks performed so that the user can be informed of issues that arise from adding widgets or making other site changes that do not result in validation being performed. WP Cron can be used to initiate revalidation requests for random URLs of enabled templates. This will make the "At a glance" Dashboard widget much more useful on admin as well, as it will actually highlight new issues that the user isn't already immediately aware of:

image

Relates to #1755 which is for adding a UI in the admin for bulk validation.

@westonruter westonruter added the Enhancement New feature or improvement of an existing one label Dec 19, 2018
@swissspidy
Copy link
Collaborator

Integrating with cron shouldn't be too hard I think. Just need to figure out which pages to scan. Instead of (or in addition to) random URLs, I'd love to see periodical checks of the most popular posts.

@swissspidy
Copy link
Collaborator

I just did the following:

  1. Install and activate the plugin
  2. Go to settings and select native mode (because the current theme works well with it)
  3. Wait for page to reload. Receive error message:

Native mode activated! However, there was an error when checking the AMP validity for your site. cURL error 28: Operation timed out after 15001 milliseconds with 0 bytes received.

The error message is probably because of my Lando setup, but it's certainly a bit odd to wait that long for the page to reload because the plugin checks the validity of the site right after changing the settings. This would be a great opportunity to use cron as well. It could simply schedule an event 1min in the future or so.

@westonruter
Copy link
Member Author

Yes, I've faced difficulty with timeouts in Lando for the validation requests as well. In particular, it seems the PHP-CSS-Parser performs extremely slowly in the Docker environment. Any intensive PHP computations seem to be very slow.

The reason for checking validity synchronously during the settings save is so that any success message can be served back in PHP. It would be better indeed to do it asynchronously after saving and show a spinner on the resulting page once the results are back. For that matter, the whole screen needs to be redone to use the REST API as opposed to the classic full-page-reload from the legacy admin screens.

@swissspidy
Copy link
Collaborator

There are some small UX things on the settings screen I'd love to improve anyway, so this sounds like a good excuse to do that 😀

@westonruter westonruter added this to the v2.0 milestone Jul 15, 2019
@amedina amedina removed this from the v2.0 milestone Mar 31, 2020
@westonruter
Copy link
Member Author

westonruter commented May 5, 2020

This is going to be of critical importance once we allow Developer Tools to be turned off. When that happens, validation will no longer be done synchronously when a post is updated. We need for validation errors to still be gathered in an automated way and reported to the administrator in the life of a site, across posts being published and plugins being updated. So here's what I propose:

  1. On a daily basis, update the validation results for the most recently-published posts in each post type, the homepage, and the other templates (using similar logic to what WP-CLI is doing).
  2. Schedule a cron event every time a published post is updated to re-check for validity.
  3. Introduce AMP Validation Site Health test which is shown as a warning whenever there are validation errors that are unreviewed. The Site Health test can link off to the Validated URLs screen even if the current user has Developer Tools turned off.

@westonruter
Copy link
Member Author

The site health test message can not only flag the count of URLs that have unreviewed validation errors, but also the list of plugins that are causing them. Then the CTA can be to go to the Plugin Suppression section on the admin screen.

@amedina
Copy link
Member

amedina commented Jun 30, 2020 via email

@westonruter
Copy link
Member Author

Yeah, we could do a query for the other administrators and list those who have Developer Tools turned on as good candidates to reach out to.

@kmyram kmyram modified the milestones: v2.0, v2.1 Jul 29, 2020
@westonruter
Copy link
Member Author

Depends on #4779.

@westonruter
Copy link
Member Author

Note: the same underlying mechanisms here should also be site scanning (#4795 and #4719)

@johnwatkins0
Copy link
Contributor

johnwatkins0 commented Aug 13, 2020

re: The discussion we had earlier today about the site scan REST endpoint. I was thinking that if all validation uses the same underlying mechanism, we could potentially solve some problems by working on the cron along with the site scanning for the wizard, etc. in tandem.

Maybe we can do something along these lines:

  1. The cron task kicks off on plugin activation and runs with a limit of, say, five URLs per content type (could be fewer).
  2. Save the result for each URL as it's validated in the cron task.
  3. In the REST endpoint, when requesting results for a URL, signal that we prefer the saved result over a fresh validation. With the REST endpoint and the cron action sharing the same underlying code, it's likely that all of the requested URLs will have already been saved (depending on how quickly the user reaches the wizard after activating the plugin).
  4. In the REST endpoint we can batch the query in some way (I have to work through this) so that if all the queried URLs have saved data, then only one request will be needed, but if there are URLs without saved data, they'll be sent back one at a time (for a progress bar or similar UI).

A lot of the underlying code for this already exists, and I've already begun moving the pieces around in #5228 (WIP).

@johnwatkins0 johnwatkins0 self-assigned this Aug 13, 2020
@westonruter
Copy link
Member Author

  1. The cron task kicks off on plugin activation and runs with a limit of, say, five URLs per content type (could be fewer).

Yes, this could reduce the perceived time involved to perform the site scan step, if they are happening in the background. We'd need to make sure that WP Cron gets spawned immediately after activation and not wait around for another request, as if the user is activating the plugin for the first time, they may cause WP Cron to spawn when they click the banner to access the onboarding wizard, and the scan results will not yet be available.

  1. In the REST endpoint, when requesting results for a URL, signal that we prefer the saved result over a fresh validation. With the REST endpoint and the cron action sharing the same underlying code, it's likely that all of the requested URLs will have already been saved (depending on how quickly the user reaches the wizard after activating the plugin).

The way to determine if a given URL is fresh or not is via AMP_Validated_URL_Post_Type::get_post_staleness(). So yeah, we could avoid re-scanning a URL if it's results are fresh. This would be useful when on the settings screen in particular and we want to show an overview of the last results, in addition to providing a way to re-scan. One complication is that gathering posts for checking may change each time a scan is done, if we use the most recent post for example. So a way to mitigate that could be to always get the oldest ID (or smallest object ID when datetimes are not available).

4. In the REST endpoint we can batch the query in some way (I have to work through this) so that if all the queried URLs have saved data, then only one request will be needed, but if there are URLs without saved data, they'll be sent back one at a time (for a progress bar or similar UI).

This relates to what @schlessera mentioned in regards to retaining the loopback requests. If you go to the Validated URLs post table screen and select multiple validated URLs, you can do a bulk re-check of those URLs. This will be a batch operation. A problem with doing this, however, is that there can be timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Developer Tools Enhancement New feature or improvement of an existing one Groomed P0 High priority Site Scanning Validation WS:Core Work stream for Plugin core
Projects
None yet
5 participants