-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Last-Modified and ETag headers on HTML responses #1560
Comments
The only easy implementation here would be doing that for static assets. Implementing a last-modified date for rustdoc pages is harder, since every new release changes the content for the old releases. Also, an easy E-tag calculation (based on the MD5 for example) might have a performance impact on the webserver for bigger rustdoc files. The performance advantage would likely be only be perceivable for US users, since for other users most of the time is spent in the roundtrip, and e-tag/last-modified caching still does the roundtrip. |
We could use the crate version + rustdoc version as the ETag.
This suggests perhaps an ETag of crate version + rustdoc version + latest crate version. All that said, it seems like Last-Modified would also work pretty well and would be simple to calculate.
That's only true if you assume a full download happens in a single roundtrip. If (at a guess) a typical page is 150kB, and a starting receive window size of 16kb (the Windows default), there will be at least a few roundtrips beyond the first. We could measure this with Wireshark! Another potential benefit, besides end-user speed, is that this could reduce bandwidth costs. |
This needs to include the docs.rs version too, in case the header has changed. Actually that reminds me, we also need to include the latest version in the ETag, so the drop-down gets updated with all newer versions. |
Here's a webpagetest result for https://docs.rs/serde_json/1.0.72/serde_json/struct.Deserializer.html, from Milan, on Chrome, with Cable internet speed: You can click on the waterfall for a detailed view, but I'll copy the details of the first request here for convenience:
The most relevant bit here is "Content Download: 446ms", which is 56% of the total time for that request (790ms). The total page load was ~1900ms. That suggests to me that we could save a good amount of time with this technique. If you click over to the Response tab for that request you can see The Time to First Byte: 233 ms probably represents a combination of roundtrip from Milan to the US, and internal processing in the docs.rs webserver. I think that's the amount that would be reduced by #1552. Of course, #1552 would improve performance for first load and subsequent loads alike, while this would only improve performance for repeat loads. |
btw, if we can calculate the etag and last-updated without generating the page, that would also save processing time on the server because the CDN can directly return the cached page. Both E-tag and last-modified would have to change when any of these change:
|
Yeah, I don't think this needs to be tracked explicitly, even if we do a rebuild it shouldn't change the page unless the docs.rs or rustdoc version have changed. |
Only implementing a valid E-tag is probably easier than trying to make up a usable last-modified timestamp based on these inputs. And for caching in the browser it shouldn't matter.
Valid points, this could be a real improvement, if we can keep calculating the ETag simple, and check if cloudfront caches these too.
I'm not sure why I didn't think about this earlier, but when we count in CloudFront into the ETag, then theoretically subsequent requests from any other browser to the CDN should also be reduced do just the roundtrip to check the latest ETag. |
We now have control over the cache and we could continue on this story. |
This needs all versions btw, the latest server is not necessarily the most recently published crate. |
two points I want to add:
|
Right now, HTML pages on docs.rs get no caching headers at all:
That's in part because they could be updated at any time. However, there's no need for the user's browser to load the whole thing every time. If we set Last-Modified and/or ETag, the browser can send a request with the If-None-Match and/or If-Modified-Since headers. In the common case when the document hasn't been updated, the server can reply with 304 and the browser will use what it has stored locally, saving a lot of bytes downloaded.
Note that doc.rust-lang.org already serves both ETag and Last-Modified, thanks to S3:
The text was updated successfully, but these errors were encountered: