full page caching for rustdoc pages in the CDN #1552

syphar · 2021-11-13T17:54:42Z

We can use a CDN to improve world-wide speed for serving documentation. Even when we would improve server-side response times (for example by caching S3 requests locally on the webserver) we would look at the global latency between EU/US and else where (at least 100ms). If after CDN caching we still need to optimize server-side response times we can do it then.

Documentation is mostly static and can only change with a build it's a nearly perfect candidate for CDN caching. It's only nearly because we have rebuilds and also we are adding a header & footer.

I think we can leverage CDN caching for most parts of the site. When actively invalidating caches and using a good CDN we would still always have up-to-date content.

page-types and invalidation events

cached forever, no invalidation needed:

static assets with hashed filenames

can only change after any release for one specific crate

rustdoc pages (the header contains all versions of the crate)
latest-version redirects
release-internal redirects

can only change when we release new code:

documentation pages etc
styles

not really cachable:

search-results
release-lists (only cachable if we accept them being outdated for a certain amount of time)

requirements to the CDN

we need

fast invalidation (tag-based if possible, path/pattern based works for simple cases too).
CDN specific caching headers will be removed on the CDN level (to have control over the cache at all times)
logic on the edge to add CSP nonces at the edge

Nice to have would be:

serving stale content while updating the cache in the background.
soft purge: serves stale content while updating the cache. Prevents thundering herd problem when clearing the cache.

CloudFront

invalidations are probably too expensive to execute these on every release
path/pattern invalidations are possible, tags not
invalidations take minutes, sometimes 15.
secret headers I would need to research, I don't have a definitive answer yet. Perhaps solvable with lambda@edge or CF configuration.
but we already have it
Lambda@edge could probably solve the CSP issue, I didn't dig deeper on programming language support in there

Fastly

invalidations are free, and take 100ms worldwide
tag-based invalidations work,
secret headers too
soft purge and serving stale content is natively supported
fastly compute@edge has rust-support we could use for CSP (POC : https://github.com/syphar/docs-rs-fastly-csp/blob/main/src/main.rs )
PyPI gets fastly for free, I guess we could get it too.
but we don't have it yet, SSL needs work, contracts too.

CloudFlare

I didn't dig deeper yet on the feature set here.

browser caches

since we want to actively invalidate certain caches we won't cache these pages in the browser and limit browser caching to static assets with hashed filenames as currently.

syphar · 2021-11-13T18:06:30Z

@jyn514 to start with I just wrote down my thoughts. I'll probably refine the text over the time and add details.

The quickest solution with the least amount of work would be:

use our current CloudFront setup
cache pages and redirects below the /crate_name/ path, invalidate this whole pattern on every release (or yank) for that crate
release-lists and static pages could be cached for a short time (30 minutes or shorter, or not at all)
a full cache purge is risky depending on the amount of requests the server gets because all cache locations would request new content at the same time. costs need to be checked -> AWS charges $0.005 per path invalidated after the first 1000 invalidations (standard price, I don't know our discounts / credits). When I take 600 releases per day, that sums up to $90, which feels ok-ish, but not up to me :)

The best solution would probably be going with fastly (alternatively cloudflare if the features match).
From a backend perspective the effort is similar (also I did some setups already). Additional work is of course the contract / infrastructure parts.

syphar · 2021-11-14T08:09:28Z

to add: for CloudFront there would be additional costs for Lambda@Edge

jsha · 2021-11-26T19:04:23Z

Thanks for writing this up! Could you describe what is the current status quo? CloudFront, but with no caching? Or CloudFront, but without lots of PoPs all over the world?

It would be great to have a summary of how bad the current situation is, and how much it would improve under a better CDN approach. Are you interested in collecting a bunch of samples from https://webpagetest.org/? It seems like in particular Time To First Byte (TTFB) would be the most important measure that would be improved by a better CDN.

Also cross-linking some caching/performance related issues:

Another thing to consider: With Cloudflare Workers / Fastly Compute@Edge, we could do the unpacking of storage blobs inside the CDN. That would have the advantage that when someone requests 1 page of a crate's docs, their local PoP would have the whole blob of that crate's docs, so subsequent navigations would be very fast.

AWS charges $0.005 per path invalidated after the first 1000 invalidations (standard price, I don't know our discounts / credits). When I take 600 releases per day, that sums up to $90, which feels ok-ish, but not up to me :)

I'm getting 0.005 * 600 = $3. Presumably there is some other multiplier here that I'm missing?

syphar · 2021-11-26T20:38:19Z

Thanks for writing this up! Could you describe what is the current status quo? CloudFront, but with no caching? Or CloudFront, but without lots of PoPs all over the world?

Valid question, thanks for asking.

The whole of docs.rs is behind CloudFront. All static assets, from rustdoc or docs.rs, are cached in the browser and the CDN. All other pages are uncached and just routed through the CDN. Our webserver answers all of them. For most crates server-side response times are totally fine. From my perception the bottleneck is the request from europe (for me) to the AWS datacenter in the US where docs.rs is hosted.

Right now the pages are regenerated for every request, including fetching the original files from S3. While we could of course start caching files locally on our webserver we could just skip this step and directly cache on the edge, helping not only US but worldwide users :)

It would be great to have a summary of how bad the current situation is, and how much it would improve under a better CDN approach. Are you interested in collecting a bunch of samples from https://webpagetest.org/? It seems like in particular Time To First Byte (TTFB) would be the most important measure that would be improved by a better CDN.

Any modern CDN would be fine regarding performance and POPs. Biggest differentiator is how we can selectively invalidate parts of the page, since every new (or re-) release would also change cached content for old releases and we want the docs to be up-to-date.
Edge-Logic is (for the start) only needed to combine CSP with caching.

TTFB measurements could of course be fed into a better CDN selection, we could start with CloudFront and already have a good solution without too much infrastructure effort.

Another thing to consider: With Cloudflare Workers / Fastly Compute@Edge, we could do the unpacking of storage blobs inside the CDN. That would have the advantage that when someone requests 1 page of a crate's docs, their local PoP would have the whole blob of that crate's docs, so subsequent navigations would be very fast.

While that could be a next optimization step, it needs more design, since the documentation blobs sometimes have multiple gigabytes and millions of files.

AWS charges $0.005 per path invalidated after the first 1000 invalidations (standard price, I don't know our discounts / credits). When I take 600 releases per day, that sums up to $90, which feels ok-ish, but not up to me :)

I'm getting 0.005 * 600 = $3. Presumably there is some other multiplier here that I'm missing?

Sorry I wasn't clear enough here, I was thinking about $90 monthly.

jsha · 2022-09-19T20:37:08Z

I think this sounds exactly right. I was actually wondering if something like this (CDN-only caching for the HTML pages) was possible with CloudFront.

syphar · 2022-09-20T02:33:44Z

@jsha i actually removed the long writeup again because I forgot the main reason why full page caching is more work here: CSP 😄.

We could handle this part with lambda@edge, but I would have to dig into that first.

jsha · 2022-09-20T21:46:52Z

I think it would be useful to bring back the long writeup, even if you caveat it with "I think this won't work because of CSP script-nonce." However, I think it will work with CSP script-none (and also we should move away from our plans to use CSP in this way, see #1853).

Here's a Server Fault thread to back me up: https://serverfault.com/a/1064775/361298 (and I talk about this here: #1569 (comment)).

The short version is: CSP script-nonce is not a nonce in the cryptographic sense of "if this is ever used twice everything will explode horribly." Instead, it's just a random value that needs to (a) be unpredictable before a given page is generated, and (b) match the nonce= attribute for the scripts on a generated page.

When a page with CSP script-nonce: xyz is generated. docs.rs adds nonce=xyz to all script tags. If that page gets cached, the CSP header gets cached along with the body, so loading that page subsequently will work just fine: the header still matches the contents. Sometime down the road, if the browser gets a fresh copy of the page, it will get fresh headers that match the fresh body, and things will work fine.

Does caching help an attacker trying to defeat CSP script-nonce? Nope. Once the page is cached, its contents are unchanging, so there's no possibility of an attacker crafting an XSS with a known nonce.

syphar · 2022-09-21T03:40:33Z

I think it would be useful to bring back the long writeup, even if you caveat it with "I think this won't work because of CSP script-nonce."

you're right, I'll re-add it from memory below. (if someone still has an email notifications I'll take it ;) )

However, I think it will work with CSP script-none (and also we should move away from our plans to use CSP in this way, see #1853).

related comment by @jsha : #1569 (comment)

IMO this is the biggest question to be answered around the caching topic. I would love some more input by @rust-lang/docs-rs .

the new full page caching idea

( from memory, in keywords)

Cloudfront is somewhat limited compared to Fastly or CloudFlare. For our caching we would need something that controls caching in the CDN without affecting caching in the browser (like max-age) or other intermediary caches (like s-maxage). Fastly as the Surrogate-Control header for this, CloudFlare has CDN-Cache-Control.

With this control we could let the CDN cache for a long time, while just actively invalidating the content we want to invalidate after we build a crate version.

But there could be a workaround: looking at its documentation on cache-control headers we could use the default TTL for this.

rustdoc pages, redirects etc don't get any max-age, so cloudfront internally applies the long default TTL. While I could imagine a short max-age + stale-while-revalidate having a similar behaviour I believe especially for /latest/ URLs we want more control.
static assets could still get an explicit "forever" TTL
pages we always want up-to-date (release-lists, builds etc) would need an explicit max-age=0 or no-cache

Invalidation would be:

/krate/* after a build
the whole site after HTML / style changes (only if we start caching non rustdoc / redirect pages)

The only annoying part of this approach is that we have to explicitly set no-cache on all pages we don't want to be cached, which I could imagine as a middleware.

syphar · 2022-09-21T04:01:02Z

I will do some reading & testing around this

jsha · 2022-09-22T00:31:11Z

I had a copy in my email notifications. You reproduced it remarkably well from memory!

original proposal

I would love feedback here before continuing.

While working on #1825 and setting thinking about new settings / improvements for #1569 I had a new idea.

With other sites that have static content and are updated via a publish / build process there is a quite performant setup possible:

cache everything for a long time in the CDN

purge affected content after building a release

when we deploy changes that affect cached pages we might need a manual purge of the whole site

serve stale content for a short time while revalidating

forbid any browser or intermediary caching (excluding things with hashed filenames of course)

Since CloudFront doesn't have Surrogate-Control (like Fastly) or CDN-Cache-Control ( like CloudFlare) I thought this would probably involve some lambda@edge JS/Python logic, which I would love to prevent.

Now after reading the CloudFront docs in more detail I had an idea:

let's set a long TTL as default TTL in CloudFront. It will be applied when we don't provide any max-age in our responses. We still can set stale-while-revalidate so respones are fast after purges. Browsers and other caches won't see this TTL.

static assets will still return a longer TTL and cached by everyone.

pages that should be always up-to-date would need to get Cache-Control: no-cache, no-store

While the last point might sound risky since we could forget adding this, this could be the default and for example added via middleware, with an optional longer TTL.

Any thoughts about this?

syphar added E-medium Effort: This requires a fair amount of work C-enhancement Category: This is a new feature A-backend Area: Webserver backend labels Nov 13, 2021

jsha mentioned this issue Nov 26, 2021

Add Last-Modified and ETag headers on HTML responses #1560

Open

syphar mentioned this issue Dec 1, 2021

Add Cache-Control to rustdoc pages #1569

Merged

syphar mentioned this issue Jan 14, 2022

invocation specific rustdoc static files are cached too long on /latest/ #1593

Open

syphar mentioned this issue May 10, 2022

Split apart web server and build server #795

Closed

jsha mentioned this issue Sep 19, 2022

Separate root path for invocation-specific JS #1851

Open

jsha mentioned this issue Sep 20, 2022

Host each crate on its own subdomain and allow user JS #1853

Open

This was referenced Sep 23, 2022

invalidate CDN caches after build #1825

Merged

new cache-policy & cache middleware structure to support full page caching #1856

Merged

jsha mentioned this issue Sep 29, 2022

invalidate pages when release is yanked #1862

Closed

syphar closed this as completed in #1856 Sep 30, 2022

syphar mentioned this issue Oct 13, 2022

show CDN invalidation status in release-pages #1877

Closed

syphar mentioned this issue Nov 8, 2022

show pending CDN invalidations on queue page #1897

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

full page caching for rustdoc pages in the CDN #1552

full page caching for rustdoc pages in the CDN #1552

syphar commented Nov 13, 2021

syphar commented Nov 13, 2021

syphar commented Nov 14, 2021

jsha commented Nov 26, 2021

syphar commented Nov 26, 2021

jsha commented Sep 19, 2022

syphar commented Sep 20, 2022

jsha commented Sep 20, 2022

syphar commented Sep 21, 2022

syphar commented Sep 21, 2022

jsha commented Sep 22, 2022

full page caching for rustdoc pages in the CDN #1552

full page caching for rustdoc pages in the CDN #1552

Comments

syphar commented Nov 13, 2021

page-types and invalidation events

requirements to the CDN

CloudFront

Fastly

CloudFlare

browser caches

syphar commented Nov 13, 2021

syphar commented Nov 14, 2021

jsha commented Nov 26, 2021

syphar commented Nov 26, 2021

jsha commented Sep 19, 2022

syphar commented Sep 20, 2022

jsha commented Sep 20, 2022

syphar commented Sep 21, 2022

the new full page caching idea

syphar commented Sep 21, 2022

jsha commented Sep 22, 2022