Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce garbage collection for validation data #6763

Merged
merged 8 commits into from
Dec 7, 2021

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Dec 6, 2021

Summary

Fixes #4779

This introduces garbage collection for validation data (amp_validated_url posts and amp_validation_error terms).

With Site Scanning in v2.2, the most recently-published post will be validated on a weekly basis. If the user never sees the list of Validated URLs—such as when the user doesn't have DevTools turned on—the end result is a perpetual increase in the number of validated URLs. Over time this will result in validation data taking up more and more of the database. When all of the validation errors associated with a validated URL are unreviewed, or if all of the validation errors are related to other validated URLs as well, then there is no need to keep the old validated URLs in perpetuity. They should be garbage-collected.

This PR introduces a new cron task which runs on a daily basis. It obtains a random set of amp_validated_url posts that are older than 1 week. For each post which is stale, it checks to see if it has associated amp_validation_error taxonomy terms. The validated URL garbage collected if it does not have a unique validation error (not associated with any other URL) which has been been marked as reviewed or has a non-default removed state.

After the validated URLs have been garbage-collected, the cron task finally calls AMP_Validation_Error_Taxonomy::delete_empty_terms() to remove any amp_validation_error taxonomy terms which no longer have any associated validated URLs. This is the same action that previously required the user to click the “Clear Empty” button on the Error Index screen:

image

The logic described above prevents deleting validated URLs that would cause validation error terms to become empty unless they are not reviewed and not in the default removed state.

Checklist

  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@westonruter westonruter added this to the v2.2 milestone Dec 6, 2021
@westonruter westonruter force-pushed the add/validation-garbage-collection branch from b50939d to fe878a7 Compare December 7, 2021 01:01
@westonruter westonruter marked this pull request as ready for review December 7, 2021 01:04
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2021

Plugin builds for 0e6c4e6 are ready 🛎️!

@westonruter
Copy link
Member Author

Before deploying the changes here to a site, there were 55 Validated URLs and 647 Validation Errors. After deployment, the numbers were reduced to 14 and 148, respectively.

Copy link
Collaborator

@dhaval-parekh dhaval-parekh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

* @return void
*/
public function process( ...$args ) { // phpcs:ignore VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable
AMP_Validated_URL_Post_Type::garbage_collect_validated_urls( 100, '1 week ago' );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe provide filters for both of these values (or a single combined one) so that sites can adapt this for optimization or debugging purposes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filters added in cc5f4db

westonruter and others added 4 commits December 7, 2021 08:28
Co-authored-by: Alain Schlesser <[email protected]>
…ation-garbage-collection

* 'develop' of github.com:ampproject/amp-wp:
  Update Composer lock file
  Update to amp-toolbox 0.9.2
  Update Gutenberg package dependencies
  Update unit test case
  Use AMP_Validated_URL_Post_Type::get_url_from_post() instead of ->post_title
@westonruter westonruter merged commit 7d4346c into develop Dec 7, 2021
@westonruter westonruter deleted the add/validation-garbage-collection branch December 7, 2021 19:27
@westonruter westonruter added the Changelogged Whether the issue/PR has been added to release notes. label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelogged Whether the issue/PR has been added to release notes. Site Scanning
Projects
None yet
3 participants