implement CDN invalidation queue & background worker #1961

syphar · 2022-12-17T08:06:30Z

To work around the Cloudfront limit (max 15 active wildcard invalidations), this implements a queue & batching for CDN invalidations.

this would solve https://sentry.io/organizations/rust-lang/issues/3749747987/
the list of active invalidations on our build queue page will then also be based on actual data from cloudfront.
Since we only queue the invalidation, we might wait up to a minute to actually hand it over to cloudfront.

So when we get a bunch of changes at once (many small builds, many failed builds, ....) these would be queued and executed over time.

I was also thinking about adding invalidation-related metrics, but then decided to move this to a later point when we have this code running for some time.

When the invalidations work reliably we can finally slowly introduce the full page caching for our docs.

syphar · 2022-12-17T08:06:35Z

r? @jsha

syphar · 2022-12-17T09:42:12Z

~~( I'm debugging the failing test, which doesn't fail locally. Review is still possible IMO)~~ solved the test failure

syphar · 2023-01-07T12:26:38Z

cc @rust-lang/docs-rs anyone could help with reviewing this here?

I would love to finish up & activate full page caching

src/cdn.rs

jsha

To make sure I understand the design correctly:

There is a new table cdn_invalidation_queue. After a crate build, we append to that table. Periodically, a cron task fetches up to N items from the table, creates a CDN invalidation from them, and sets those items to created_in_cdn = CURRENT_TIMESTAMP.

As part of the same cron task, we fetch the list of active invalidations from the CDN, and delete from the database all items that are not considered active invalidations by the CDN, but which do have a non-null created_in_cdn. This should effectively mean deleting all completed items. So the database should in normal operation always have a smallish number of rows.

What happens if we fall behind? It seems like there should be an additional query that deletes all rows from the database where queued is more than X minutes ago, regardless of whether we were able to create an invalidation for them. At some point they are old enough that they've become irrelevant.

It's worth noting that databases-as-queues are usually not very efficient, but so long as we keep the table size and write volume small, this will probably be okay for now.

src/cdn.rs

syphar · 2023-01-11T19:16:54Z

To make sure I understand the design correctly:

There is a new table cdn_invalidation_queue. After a crate build, we append to that table. Periodically, a cron task fetches up to N items from the table, creates a CDN invalidation from them, and sets those items to created_in_cdn = CURRENT_TIMESTAMP.

As part of the same cron task, we fetch the list of active invalidations from the CDN, and delete from the database all items that are not considered active invalidations by the CDN, but which do have a non-null created_in_cdn. This should effectively mean deleting all completed items. So the database should in normal operation always have a smallish number of rows.

Yep, that's it.

What happens if we fall behind? It seems like there should be an additional query that deletes all rows from the database where queued is more than X minutes ago, regardless of whether we were able to create an invalidation for them. At some point they are old enough that they've become irrelevant.

When the queued invalidation would be deleted, we would leave outdated content visible to users. IMO the invalidation is never irrelevant, only when it's older than the TTL on the CDN. And best case that TTL is forever.

It's worth noting that databases-as-queues are usually not very efficient, but so long as we keep the table size and write volume small, this will probably be okay for now.

Yeah, I'm aware of that and the possible alternatives.

Since we have the build-queue in the DB too I would postpone that change alltogether to a later point.

Some more general notes:

If we go live with this, the plan would be to leave the TTL short for some time to see how long the invalidations take / if they fail. We also could think about adding some metrics, when we know which help us.

I believe for the near future we would be fine with the 15-invalidation-limit and the invalidations would be fast enough for the amount of releases / yanks we get.
But of course this is a hard limit that won't move long-term, where we would have to either merge/escalate invalidations, or move to a different CDN, or give up caching again.

I definitely didn't expect this much complexity to work around CloudFront limitations, and the downsides it still has.

I'm still excited & motivated to forward, web-speed-me would definitely enjoy browsing docs far more when it's done.

My long-term wish / preference would be moving to another CDN like fastly, which would make all of this queuing logic obsolete. But since we don't have it yet, I lean towards adding this queue here until then.

jsha

I think there are some unhandled cases where Cloudfront's list of invalidations gets too big or the table of queued invalidations gets too big, but I guess so long as our cache time is low those aren't a big deal; it will just cause the cron task to get stuck somewhere, but builds should continue to happen and the cache will expire after N minutes. Let's try it!

github-actions bot added the S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed label Dec 17, 2022

syphar force-pushed the cdn-invalidation-batches branch 2 times, most recently from 21404b0 to 51f7367 Compare December 17, 2022 09:38

syphar force-pushed the cdn-invalidation-batches branch 2 times, most recently from 0f99d9d to 3713623 Compare December 17, 2022 10:16

implement CDN invalidation queue & background worker

2ba21db

syphar force-pushed the cdn-invalidation-batches branch from 3713623 to 2ba21db Compare January 7, 2023 12:24

jsha reviewed Jan 10, 2023

View reviewed changes

src/cdn.rs Show resolved Hide resolved

jsha requested changes Jan 10, 2023

View reviewed changes

src/cdn.rs Show resolved Hide resolved

src/cdn.rs Show resolved Hide resolved

src/cdn.rs Show resolved Hide resolved

jsha approved these changes Jan 13, 2023

View reviewed changes

syphar merged commit b50e43a into rust-lang:master Jan 14, 2023

syphar deleted the cdn-invalidation-batches branch January 14, 2023 08:14

github-actions bot added S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it and removed S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed labels Jan 14, 2023

syphar removed the S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it label Jan 19, 2023

This was referenced Feb 4, 2023

show CDN invalidation status in release-pages #1877

Closed

de-duplicate CDN invalidations #2025

Closed

CDN invalidation: decide what to do about quotas #1871

Open

https://docs.rs/futures-concurrency/latest/futures_concurrency/ is stale #1913

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement CDN invalidation queue & background worker #1961

implement CDN invalidation queue & background worker #1961

syphar commented Dec 17, 2022

syphar commented Dec 17, 2022

syphar commented Dec 17, 2022 •

edited

Loading

syphar commented Jan 7, 2023

jsha left a comment

syphar commented Jan 11, 2023

jsha left a comment

implement CDN invalidation queue & background worker #1961

implement CDN invalidation queue & background worker #1961

Conversation

syphar commented Dec 17, 2022

syphar commented Dec 17, 2022

syphar commented Dec 17, 2022 • edited Loading

syphar commented Jan 7, 2023

jsha left a comment

Choose a reason for hiding this comment

syphar commented Jan 11, 2023

jsha left a comment

Choose a reason for hiding this comment

syphar commented Dec 17, 2022 •

edited

Loading