-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc: adjust static file names for better cache configuration #98413
Comments
That's not quite right - docs.rs passes --resource-suffix to rustdoc to make it automatically include the suffix; the alternative would be parsing and rewriting thousands of html files at build time. It should be possible for doc.rust-lang.org to do the same pretty easily. I'm not opposed to adding a default for --resource-suffix on nightly, but I also don't think it's a particularly pressing problem. |
I guess adding |
Ah, thanks for the correction! I had taken the hex bytes in those filenames to be a hash, but I see now they are a commit id. Also, for some reason I had assumed that self-built docs had the version suffix by default (e.g. 1.61.0), but I see now they don't. That means people self-hosting their docs can't set caching headers for static files.
My main priority here is that http://doc.rust-lang.org/std (i.e. stable) doesn't have caching headers for the JS and CSS, which adds some delay to page loads. To add those I'd like to make it things so that when content changes, filenames change (as suggested in rust-lang/simpleinfra#108 (comment)). I think the straightforward fix is to add a default value for |
Just the hash would make it a little harder to tell when the docs were published, but I guess we can find that out from our database records anyway. I don't think there's any special logic behind the current format other than being unambiguous. Do note though that the hash will have to be longer if you omit the channel and date - docs.rs stores enough years of docs that hash collisions are a real risk (especially since git is still using SHA1). |
Well, they can, they just have to know about resource-suffix. |
FWIW, it's not clear to me why we should be versioning these files along rustc's commit hash or similar: a hash of the file contents is relatively easy to compute and avoids any questions around needing git (e.g., when building from tarballs with patches applied) to prevent problems. It also gives us longer-lived caches on a more granular basis, and should work for all files that aren't user visible in links. (Even without resource suffix passed). Is there a reason we're not doing that by default? |
We do some loading of resource files from JS: rust/src/librustdoc/html/static/js/main.js Lines 52 to 56 in 3b0d481
Still, the benefits of using hashes might be worth it. |
Thinking about it some more, I'm convinced hashes are the right approach. My goal is to enable faster page loads in general. One was is by turning on caching of static files on doc.rust-lang.org. But another nice improvement would be to improve cache hit rates on docs.rs. Right now it's unlikely that any two crates have the same URL for, e.g. normalize.css, because it will always be specific to the rustdoc version, e.g. https://docs.rs/normalize-20220709-1.64.0-nightly-6dba4ed21.css. But that file basically never changes. If we used a hash as part of the URL, in theory docs.rs could stop setting --resource-suffix, and most page navigations would load a cached normalize.css. Even static files like main.js that change more frequently change less often than nightly and would have a better cache hit rate. |
👍 for not having to keep track of this in docs.rs - ideally we could get rid of --print=invocation-specific at the same time |
I have a branch in progress and would love some eyes on it just to check I'm going in the right direction: https://rustdoc.crud.net/jsha/static-files/std/io/trait.Read.html Basically everything static (Unversioned + InvocationSpecific) now gets written to static.files/, which should make it simpler for docs.rs to figure out what should be copied out. The InvocationSpecific files get a hash in the filename based on their contents. I can also collapse the two categories so the fonts (Unversioned) get the hash treatment as well. Still to be done:
|
Your approach seems a bit complex. I would have done it as follows: generate static files and their hash first, then save this value into a struct stored somewhere and then use it when generating the template files. Like that it wouldn't require too much changes and would still work out easily. The downside with the hash approach is that it's tricky to know what the new files are since you need to look at the file date and it makes impossible to use the resource suffix. I'd say we lose in predictability what we gain in simplicity for the source code. Not sure if it's important or not though. Also, changing/removing cli options is a breaking change I think. So maybe we need a transition period in-between. |
The complexity comes from the dependency graph between them. The You can work around this by generating a separate "file manifest", either inlined into all of the HTML files or stored as another JS file, but that causes its own complexity (at least, it adds more boilerplate to rustdoc's already heavy HTML output).
This is an unstable option anyway. A deprecation period might be a good idea, but it doesn't have to be very long.
|
The order of files is known ahead of time so there shouldn't be too much complexity normally? But maybe I'm missing something obvious. Also, do we know if there are users relying on the replaced cli options apart from docs.rs?
I guess it can be short if no ones uses this option indeed. |
Docs.rs uses this option. It's ok if rustdoc removes it, but we need advance notice (cc @rust-lang/docs-rs) and to know what we should replace it with. |
The To make sure I understand how it works, is this summary accurate? The first time a new toolchain is installed, rustdoc is called on a dummy crate with For regular crate builds, rustdoc is called with With the proposal in my branch, everything emitted by
This is effectively what we do here, though instead of "a struct", I use a series of statics. However, having a single top-level static with fields would indeed make it easier to make these values available to templates. I could give that a shot. |
I was talking about the
👍 |
@GuillaumeGomez said:
Indeed, earlier in this thread I talked about potentially deprecating We could eventually talk about changing the naming scheme for our dynamic JS: instead of being named based on |
PR is up: #101702 |
…illaumeGomez rustdoc: add hash to filename of toolchain files All static files used by rustdoc are now stored in static.files/ and their filenames include a hash of their contents. Their filenames no longer include the contents of the --resource-suffix flag. This clarifies caching semantics. Anything in static.files can use Cache-Control: immutable because any updates will show up as a new URL. Invocation-specific files like crates-NN.js, search-index-NN.js, and sidebar-items-NN.js still get the resource suffix. This has a useful side effect: once toolchain files aren't affected by resource suffix, it will become possible for docs.rs to include crate version in the resource suffix. That should fix a caching issue with `/latest/` URLs: rust-lang/docs.rs#1593. My goal is that it should be safe to serve all rustdoc JS, CSS, and fonts with infinite caching headers, even when new versions of a crate are uploaded in the same place as old versions. The --disable-minification flag is removed because it would vary the output of static files based on invocation flags. Instead, for rustdoc development purposes it's preferable to symlink static files to a non-minified copy for quick iteration. Example listing: ``` $ cd build/x86_64-unknown-linux-gnu/doc/ && find . | egrep 'js$|css$' | egrep -v 'sidebar-items|implementors' | sort ./crates1.65.0.js ./rust.css ./search-index1.65.0.js ./source-files1.65.0.js ./static.files/ayu-2bfd0af01c176fd5.css ./static.files/dark-95d11b5416841799.css ./static.files/light-c83a97e93a11f15a.css ./static.files/main-efc63f77fb116394.js ./static.files/normalize-76eba96aa4d2e634.css ./static.files/noscript-5bf457055038775c.css ./static.files/rustdoc-7a422337900fa894.css ./static.files/scrape-examples-3dd10048bcead3a4.js ./static.files/search-47f3c289722672cf.js ./static.files/settings-17b08337296ac774.js ./static.files/settings-3f95eacb845293c0.css ./static.files/source-script-215e9db86679192e.js ./static.files/storage-26d846fcae82ff09.js ``` Fixes rust-lang#98413
rustdoc has a number of static files that should really be long-cached by the browser for best loading performance: the fonts, the CSS, storage.js, main.js, and so on. We should add a hash to their names so that services can be more confident in setting long cache headers for them.
Right now we categorize things into Unversioned, ToolchainSpecific, and InvocationSpecific:
rust/src/librustdoc/html/render/write_shared.rs
Lines 44 to 57 in 10f4ce3
Unversioned is used just for the font files. ToolchainSpecific is used for the CSS, the images, and most of the JS. InvocationSpecific is used for
search-indexN.NN.N.js
,source-filesN.NN.N.js
,cratesN.NN.N.js
, the JS that contains the list of implementors on trait pages, and the JS that contains the list of additional sidebar items (siblings in a module).Unversioned gets no infix. ToolchainSpecific gets a version suffix, like
main1.63.0.js
(frommain.js
). InvocationSpecific gets the same version suffix.Unversioned and ToolchainSpecific files should be infinitely cacheable. Right now, that's not the case for ToolchainSpecific, because multiple toolchains have the same version infix. For instance, every nightly build right now creates a
main1.63.0.js
, but it's potentially different each night. That means https://doc.rust-lang.org/nightly/main1.63.0.js potentially changes every night, and can't be long-cached. Sincedocs.rs
uses the nightly toolchain, themain1.63.0.js
it produces for a crate today may be different than the one it produces for a crate it builds tomorrow.docs.rs
has special code to recognize the ToolchainSpecific files and rename them to contain a date and a hash, like https://docs.rs/main-20220517-1.63.0-nightly-4c5f6e627.js. Butdoc.rust-lang.org
doesn't have that code, and as a result is less able to cache things that should be cached. And anyone who self-hosts docs is on their own.I propose that we change our file naming scheme. All Unversioned and ToolchainSpecific files should be emitted to a subdirectory
s/<hash>/
, where<hash>
is calculated over the contents of all of those files together. This makes it easy to configure a web server to set Cache-Control headers for everything under that subdirectory.Advantage: this makes calculating URLs for such resources easy, especially when the calculation is done in JS. Disadvantage: if one file changes, the whole hash changes, potentially requiring the user to load more files when navigating between crates generated with different rustdoc versions.
Alternately, we could add a hash of each individual file to that file's name. That makes calculating URLs harder, but means better reuse of cached data across different nightly versions.
/cc @rust-lang/rustdoc @rust-lang/docs-rs
The text was updated successfully, but these errors were encountered: