-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce garbage collection for validation data #6763
Conversation
b50939d
to
fe878a7
Compare
Plugin builds for 0e6c4e6 are ready 🛎️!
|
Before deploying the changes here to a site, there were 55 Validated URLs and 647 Validation Errors. After deployment, the numbers were reduced to 14 and 148, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me.
* @return void | ||
*/ | ||
public function process( ...$args ) { // phpcs:ignore VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable | ||
AMP_Validated_URL_Post_Type::garbage_collect_validated_urls( 100, '1 week ago' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe provide filters for both of these values (or a single combined one) so that sites can adapt this for optimization or debugging purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filters added in cc5f4db
Co-authored-by: Alain Schlesser <[email protected]>
…-toolbox-php<0.9.0" This reverts commit b305db3.
…ation-garbage-collection * 'develop' of github.com:ampproject/amp-wp: Update Composer lock file Update to amp-toolbox 0.9.2 Update Gutenberg package dependencies Update unit test case Use AMP_Validated_URL_Post_Type::get_url_from_post() instead of ->post_title
Summary
Fixes #4779
This introduces garbage collection for validation data (
amp_validated_url
posts andamp_validation_error
terms).With Site Scanning in v2.2, the most recently-published post will be validated on a weekly basis. If the user never sees the list of Validated URLs—such as when the user doesn't have DevTools turned on—the end result is a perpetual increase in the number of validated URLs. Over time this will result in validation data taking up more and more of the database. When all of the validation errors associated with a validated URL are unreviewed, or if all of the validation errors are related to other validated URLs as well, then there is no need to keep the old validated URLs in perpetuity. They should be garbage-collected.
This PR introduces a new cron task which runs on a daily basis. It obtains a random set of
amp_validated_url
posts that are older than 1 week. For each post which is stale, it checks to see if it has associatedamp_validation_error
taxonomy terms. The validated URL garbage collected if it does not have a unique validation error (not associated with any other URL) which has been been marked as reviewed or has a non-default removed state.After the validated URLs have been garbage-collected, the cron task finally calls
AMP_Validation_Error_Taxonomy::delete_empty_terms()
to remove anyamp_validation_error
taxonomy terms which no longer have any associated validated URLs. This is the same action that previously required the user to click the “Clear Empty” button on the Error Index screen:The logic described above prevents deleting validated URLs that would cause validation error terms to become empty unless they are not reviewed and not in the default removed state.
Checklist