-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
full page caching for rustdoc pages in the CDN #1552
Comments
@jyn514 to start with I just wrote down my thoughts. I'll probably refine the text over the time and add details. The quickest solution with the least amount of work would be:
The best solution would probably be going with fastly (alternatively cloudflare if the features match). |
to add: for CloudFront there would be additional costs for Lambda@Edge |
Thanks for writing this up! Could you describe what is the current status quo? CloudFront, but with no caching? Or CloudFront, but without lots of PoPs all over the world? It would be great to have a summary of how bad the current situation is, and how much it would improve under a better CDN approach. Are you interested in collecting a bunch of samples from https://webpagetest.org/? It seems like in particular Time To First Byte (TTFB) would be the most important measure that would be improved by a better CDN. Also cross-linking some caching/performance related issues:
Another thing to consider: With Cloudflare Workers / Fastly Compute@Edge, we could do the unpacking of storage blobs inside the CDN. That would have the advantage that when someone requests 1 page of a crate's docs, their local PoP would have the whole blob of that crate's docs, so subsequent navigations would be very fast.
I'm getting 0.005 * 600 = $3. Presumably there is some other multiplier here that I'm missing? |
Valid question, thanks for asking. The whole of docs.rs is behind CloudFront. All static assets, from rustdoc or docs.rs, are cached in the browser and the CDN. All other pages are uncached and just routed through the CDN. Our webserver answers all of them. For most crates server-side response times are totally fine. From my perception the bottleneck is the request from europe (for me) to the AWS datacenter in the US where docs.rs is hosted. Right now the pages are regenerated for every request, including fetching the original files from S3. While we could of course start caching files locally on our webserver we could just skip this step and directly cache on the edge, helping not only US but worldwide users :)
Any modern CDN would be fine regarding performance and POPs. Biggest differentiator is how we can selectively invalidate parts of the page, since every new (or re-) release would also change cached content for old releases and we want the docs to be up-to-date. TTFB measurements could of course be fed into a better CDN selection, we could start with CloudFront and already have a good solution without too much infrastructure effort.
While that could be a next optimization step, it needs more design, since the documentation blobs sometimes have multiple gigabytes and millions of files.
Sorry I wasn't clear enough here, I was thinking about $90 monthly. |
I think this sounds exactly right. I was actually wondering if something like this (CDN-only caching for the HTML pages) was possible with CloudFront. |
@jsha i actually removed the long writeup again because I forgot the main reason why full page caching is more work here: CSP 😄. We could handle this part with lambda@edge, but I would have to dig into that first. |
I think it would be useful to bring back the long writeup, even if you caveat it with "I think this won't work because of CSP script-nonce." However, I think it will work with CSP script-none (and also we should move away from our plans to use CSP in this way, see #1853). Here's a Server Fault thread to back me up: https://serverfault.com/a/1064775/361298 (and I talk about this here: #1569 (comment)). The short version is: CSP When a page with CSP Does caching help an attacker trying to defeat CSP |
you're right, I'll re-add it from memory below. (if someone still has an email notifications I'll take it ;) )
related comment by @jsha : #1569 (comment) IMO this is the biggest question to be answered around the caching topic. I would love some more input by @rust-lang/docs-rs . the new full page caching idea( from memory, in keywords) Cloudfront is somewhat limited compared to Fastly or CloudFlare. For our caching we would need something that controls caching in the CDN without affecting caching in the browser (like With this control we could let the CDN cache for a long time, while just actively invalidating the content we want to invalidate after we build a crate version. But there could be a workaround: looking at its documentation on cache-control headers we could use the default TTL for this.
Invalidation would be:
The only annoying part of this approach is that we have to explicitly set |
I will do some reading & testing around this |
I had a copy in my email notifications. You reproduced it remarkably well from memory! original proposal
|
We can use a CDN to improve world-wide speed for serving documentation. Even when we would improve server-side response times (for example by caching S3 requests locally on the webserver) we would look at the global latency between EU/US and else where (at least 100ms). If after CDN caching we still need to optimize server-side response times we can do it then.
Documentation is mostly static and can only change with a build it's a nearly perfect candidate for CDN caching. It's only nearly because we have rebuilds and also we are adding a header & footer.
I think we can leverage CDN caching for most parts of the site. When actively invalidating caches and using a good CDN we would still always have up-to-date content.
page-types and invalidation events
cached forever, no invalidation needed:
can only change after any release for one specific crate
can only change when we release new code:
not really cachable:
requirements to the CDN
we need
Nice to have would be:
CloudFront
Fastly
CloudFlare
I didn't dig deeper yet on the feature set here.
browser caches
since we want to actively invalidate certain caches we won't cache these pages in the browser and limit browser caching to static assets with hashed filenames as currently.
The text was updated successfully, but these errors were encountered: