-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to use docs.rs offline for pages that have been visited at least once #845
Comments
You can open an issue on https://github.com/rust-lang/www.rust-lang.org for that site, it's managed by a different team. I imagine they would be very receptive since it's a completely static site.
Hmm, this is an interesting idea. I don't think it would work with relative links though,
The whole point of docs.rs is that you don't have to build the docs yourself, so while useful a useful tip I don't think it should replace being able to use docs.rs offline. I don't know very much about PWAs. If we set pages to be cached for a longer time, would that meet this use case? That way you could visit the cached page even when you lost internet. |
In regards to relative links, I wouldn't be sad if they went away, as they're not really a great thing in the first place. Replacing relative links would probably help simplify a good portion of code while also being less finicky/more difficult to mess up |
I strongly disagree. Without relative links we'd have to hardcode |
Also, rustdoc heavily uses relative links for documentation, I don't see a good way to change that since it doesn't know the absolute URL it will be used with. |
Oh, I thought you meant relative links in reference to how we do our own source browser, with |
Changing the cache expiry would help, however that requires the user to manually toggle offline mode in their browser (which is a very hidden thing nowadays, if not impossible altogether...) |
That seems to defeat the point of caching :( Glancing through the page you linked it seems like the main idea is to have some JavaScript that checks if the page is cached before making a network request. I agree that should be the behavior, but I'm not comfortable enough with JavaScript to implement it /don't have the time. If someone is interested in working on this I'd be happy to mentor though :) almost all of the site can be cached except the home page, /releases, and redirects. |
Angular apps have service workers built into them implicitly, so if you guys are willing to upgrade this from a Python Jinja-like Tera front-end (https://crates.io/crates/tera) to an Angular front-end then you can get the Service Worker caching for free. Here's some more info: https://angular.io/guide/service-worker-intro As for the rust-lang website itself, it has a Handlebars front-end (https://github.com/rust-lang/www.rust-lang.org/blob/master/templates/index.hbs), which could also be replaced with an Angular front-end. However, I think it'd probably be more on-brand for these Rust websites to have a Rust-based front-end that compiles to WebAssembly rather than be Javascript-based. The only such Crate I'm aware of that might do this is Yew, but it doesn't have Service Workers built into it as far as I know. It's not "production-ready", but since these websites are just static pages I don't think that that's a concern. Angular could potentially be overkill since these sites are just static pages, but just because it has a bunch of bells and whistles doesn't mean you have to use them. |
I'd strongly prefer for docs.rs to remain a static site first and foremost, and especially remain usable with JavaScript disabled. I'm fine with JS adding features on top, but the JS shouldn't be necessary just to use the site. That said I don't know much about frontend, so maybe Angular can do that? |
Service workers themselves are implemented on the front-end via Javascript, so I'm not sure that we can have our cake and eat it, too, in this situation. With that design constraint, I'm not sure we can make this website offline-first. All we could do is just ask users to use their browser's "make available offline" feature if they want to use the site while offline. Edit: Even WebAssembly requires Javascript to be enabled, so I'm not sure that any Rust-based WASM solution would work either. |
Let me approach this from a different angle (I really like the framing in https://internals.rust-lang.org/t/pre-rfc-user-namespaces-on-crates-io/12851/96 to discuss things as problems to solve and not solutions to implement). docs.rs currently is a dynamic site which serves static HTML. It does not have caching for rustdoc pages, which means the site is not available when you're offline. The goal of this issue is to be able to use docs.rs offline if you've already visited the relevant pages at least once. If I'd never heard of PWAs, the way I'd imagine imagine implementing this is something like the following:
What this gets docs.rs is three things:
Regardless of the technologies or frameworks used, does that basic idea sound feasible? |
Won't this be an issue for pages like |
@GuillaumeGomez are you saying that this breaks once |
Yes it's what I meant. |
I think this is probably feasible. Some questions to figure out: should all of docs.rs be one big PWA, which manages a cache of all the various docs you've visited? Or should each crate's doc be a separate PWA? Ideally we'd like the same behavior on doc.rust-lang.org, which means the functionality should be in rustdoc, which advocates towards a PWA per crate. Also, it looks like Service Workers allow us to actually prefetch resources that the user hasn't visited yet. So for instance if you visit one page of a crate's docs, it could download all the pages of that crate's docs. The storage could add up fast, though, so we'd need heuristics about when or if to do that. |
I have a local prototype of this that's kinda neat, and plan to work on it some more and will share results when they're good enough. I had high hopes of precaching a whole crate / the whole stdlib, but fetching that many files individually (30,847 for the stdlib) was prohibitively slow. And users probably wouldn't thank us for using that much data without a more explicit opt-in anyhow. Here's my current thinking:
Note that in this scenario, nothing changes for users without JS; they never load the Service Worker. Alternately, we could prefer freshness:
The first approach is quite similar to the Cache-Control stale-while-revalidate directive. As a simpler approach, we could try changing the headers on HTML pages. Right now they have no Cache-Control header. We could add Advantage for the Cache-Control approach: much easier to deploy and reason about. Advantages of the Service Worker approach:
One of the exciting things about both approaches is they have the potentially to dramatically speed up repeat visits even when online. |
By the way, to be able to readily experiment with this without the possibility of breaking docs.rs, it should be possible to run some totally third party site that has a Service Worker and fetches / serves pages from docs.rs as if those pages were on its own origin. But that would require settings Access-Control-Allow-Origin on all/most docs.rs pages. Is that reasonable to do? |
This should be possible once we finally implement downloadable docs :) that serves the docs as one big zipfile for the whole crate.
I would be worried about doing this on docs.rs in prod, but it shouldn't be terribly difficult to run a fork of docs.rs somewhere and add Hmm, I guess that doesn't let you test how it interacts with cloudfront though. |
This is very tempting 😆 it sounds like you're volunteering to do much of the work, which I really appreciate ❤️ but simpler to write also means simpler to review. How hard would it be to switch between the two ideas at a later time? It sounds like a lot of the work is hooking the service worker up to the Cache API and actually changing the page, which is the same between both, right? |
Switching at any point would be the same work as doing either change from scratch. If we use the The thing I worry about with stale-while-revalidate is this:
Of course, now that I write these out I see these are also a problem for the /latest/ change in general. For instance, you could have /latest/ (version 1.0) loaded in your browser when 2.0 is released, and click a link to one of the now-renamed structs. The problem also exists for versioned URLs. For instance, visit https://docs.rs/rustls/0.19.0/rustls/trait.Session.html and click "Go to latest version" (Session was renamed to Connection in 0.20). I see somebody has already thought of the problem, and that link takes you to a search page across 0.20. That's pretty neat! Maybe that's adequate? The other problem with stale-while-revalidate is: say you load the root page, see it's outdated, and reload. Then you click to another page you've visited before. That's also outdated. You have to reload that too. It would get frustrating pretty fast. |
Haha, yeah I spent a while on that :)
Hmm, this should only be a problem if you have the page open for a long time, right? Because (with caching as current, but with #1527) the second you reload the reload the page you'll get the newer version. I think the combination of open for a long time + and intervening release + the struct was renamed is low enough that just having search is fine.
Yeah, that seems confusing. I'm not sure that "if you reload you'll get 2.0" is true though - don't you need to do a hard refresh to ignore the cache directive? I don't think we should do that for the /latest/ page. It seems ok for pages other than /latest/ though, they should only change if a bug in rustdoc itself was fixed and the crate was rebuilt. |
That said, I'm fairly familiar with service workers from working at Cloudflare so if that sounds fun I say go for it 😁 |
With
Wouldn't it require a lot of CPU and storage to store all the crates? I'm thinking of something that would exist for a period of months, where we'd invite testers to try using it as their daily driver version of docs.rs, to see what weird cases would come out of real-life browsing patterns. |
Ahh, that makes sense, I didn't realize that's what the directive did.
I don't see a realistic way to do this. Either we experiment with it in prod (maybe with a feature flag?) or we can write more tests; it's just not feasible to replicate docs.rs at scale. |
I admit I never worked with this kind of frontend caching, but I'm excited to see it if it works. Since caching is hard this feels like that there might be edge-cases with confusing mixtures of cached and uncached pages (and assets), so IMHO having a (even user-visible) feature flag / testing phase would be a great idea. Or building a second setup. I mean, having a staging platform is not a terrible idea :) |
Yes, I definitely want to set up a staging server at some point where people can try things out interactively. I just want to set reasonable expectations for it; it's going to end up like staging.crates.io where maybe 5 people a week visit, it won't let us see problems that only appear at scale. |
I just tested stale-while-revalidate, and it does make the page nicely available when the network is offline, at least in Chrome. Proposal: Let's add |
Sounds like a plan! :) |
A little hiccup: Iron doesn't seem to support stale-while-revalidate, and doesn't allow setting custom strings for the cache-control header: https://docs.rs/iron/0.6.1/iron/headers/enum.CacheDirective.html |
note that the axum migration is done for some time. |
It'd be great to turn docs.rs into an offline-first PWA (Progressive Web App). So the user would still be able to browse the docs they have already visited before even when offline, without having to use a separate website or app.
The same could be done for doc.rust-lang.org.
Originally posted by @teohhanhui in #174 (comment)
The text was updated successfully, but these errors were encountered: