-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement CDN invalidation queue & background worker #1961
Conversation
r? @jsha |
21404b0
to
51f7367
Compare
|
0f99d9d
to
3713623
Compare
3713623
to
2ba21db
Compare
cc @rust-lang/docs-rs anyone could help with reviewing this here? I would love to finish up & activate full page caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand the design correctly:
There is a new table cdn_invalidation_queue
. After a crate build, we append to that table. Periodically, a cron task fetches up to N items from the table, creates a CDN invalidation from them, and sets those items to created_in_cdn = CURRENT_TIMESTAMP
.
As part of the same cron task, we fetch the list of active invalidations from the CDN, and delete from the database all items that are not considered active invalidations by the CDN, but which do have a non-null created_in_cdn
. This should effectively mean deleting all completed items. So the database should in normal operation always have a smallish number of rows.
What happens if we fall behind? It seems like there should be an additional query that deletes all rows from the database where queued
is more than X minutes ago, regardless of whether we were able to create an invalidation for them. At some point they are old enough that they've become irrelevant.
It's worth noting that databases-as-queues are usually not very efficient, but so long as we keep the table size and write volume small, this will probably be okay for now.
Yep, that's it.
When the queued invalidation would be deleted, we would leave outdated content visible to users. IMO the invalidation is never irrelevant, only when it's older than the TTL on the CDN. And best case that TTL is forever.
Yeah, I'm aware of that and the possible alternatives. Since we have the build-queue in the DB too I would postpone that change alltogether to a later point. Some more general notes: If we go live with this, the plan would be to leave the TTL short for some time to see how long the invalidations take / if they fail. We also could think about adding some metrics, when we know which help us. I believe for the near future we would be fine with the 15-invalidation-limit and the invalidations would be fast enough for the amount of releases / yanks we get. I definitely didn't expect this much complexity to work around CloudFront limitations, and the downsides it still has. I'm still excited & motivated to forward, web-speed-me would definitely enjoy browsing docs far more when it's done. My long-term wish / preference would be moving to another CDN like fastly, which would make all of this queuing logic obsolete. But since we don't have it yet, I lean towards adding this queue here until then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some unhandled cases where Cloudfront's list of invalidations gets too big or the table of queued invalidations gets too big, but I guess so long as our cache time is low those aren't a big deal; it will just cause the cron task to get stuck somewhere, but builds should continue to happen and the cache will expire after N minutes. Let's try it!
To work around the Cloudfront limit (max 15 active wildcard invalidations), this implements a queue & batching for CDN invalidations.
So when we get a bunch of changes at once (many small builds, many failed builds, ....) these would be queued and executed over time.
I was also thinking about adding invalidation-related metrics, but then decided to move this to a later point when we have this code running for some time.
When the invalidations work reliably we can finally slowly introduce the full page caching for our docs.