Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloadable docs #174

Closed
Kapeli opened this issue Jan 23, 2018 · 73 comments · Fixed by #1865
Closed

Downloadable docs #174

Kapeli opened this issue Jan 23, 2018 · 73 comments · Fixed by #1865
Labels
A-backend Area: Webserver backend E-easy Effort: Should be easy to implement and would make a good first PR mentor This has instructions for getting started P-high High priority S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work.

Comments

@Kapeli
Copy link

Kapeli commented Jan 23, 2018

I'd like to integrate docs.rs inside Dash.

To achieve this, I need a way to download the docs for a package as HTML files. Please consider supporting this.

edit(@jyn514): see #174 (comment) for mentoring instructions.

@njskalski
Copy link

How about just sharing the database dump, like a backup.torrent or whatever. Crates.io for sure have a backup, this way some users will co-host it for free.

@ketsuban
Copy link

That's not an acceptable solution - note that @Kapeli isn't just asking for tossing the database over the wall, they specific want to integrate docs.rs into a offline documentation viewer so that users can download the documentation for specific crates as needed. The docset format isn't exclusive to Dash, either - other projects such as Zeal also make use of it.

@Kapeli
Copy link
Author

Kapeli commented Sep 20, 2018

One note: I'm not asking for docs.rs to generate Dash docsets. I'm asking for it to provide downloadable docs.

@lilyball
Copy link

I'd really love to see this done. I use Dash for everything else and it's really annoying dealing with third-party Rust crate docs since I can't view them anywhere outside a web browser.

@reeze
Copy link

reeze commented Dec 14, 2018

I am really looking forward this integration.

@dmilith
Copy link

dmilith commented Feb 4, 2019

Can't wait for both integration and Rust syntax coloring support for Dash snippets :)

@libratiger
Copy link

is there any progress or any new plan?

@nhynes
Copy link

nhynes commented Aug 16, 2019

It'd also be nice to get the nightly rustc docs without having to build the rustc. Recursive wget is a(n inefficient) workaround, I suppose.

@brinsche
Copy link

https://github.com/Robzz/cargo-docset was recently released, maybe that's enough for some people here or maybe the some of the code can be reused and integrated into docs.rs somehow

@nanne007
Copy link

Any update on this?

@pietroalbini
Copy link
Member

It's not too practical to generate a downloadable archive of a crate's documentation, as each file is stored individually on S3.

We'd need to fetch all the files individually and generate an archive of that on the fly, which is not practical for large crates. Preparing an archive at build time and storing it separately would increase our storage costs, and due to the unbounded nature of docs.rs we should try avoiding that.

If y'all have better implementation ideas I'd love to read them.

@lilyball
Copy link

Could we prepare an archive at build time but only for crates that are opted in to this using some notion of "important to the community"? For example I'd love to see docs.rs provide a docset for tokei that Dash can keep automatically up-to-date. I don't know who'd provide that curation though. There could be some way to nominate crates and leave it up to the docs.rs maintainers to approve it, or maybe it could be based on traffic to a particular crate's documentation.

@Kapeli
Copy link
Author

Kapeli commented Sep 26, 2019

Could the archive be generated only when requested and have a fixed-size cache where older archives get removed?

Would the archives really be that big though? Docs are generally just text, which compresses very well. You could have a separate archive for the common resources (CSS, images, fonts and so on) and then the docs archives would just be compressed HTML files.

@pietroalbini
Copy link
Member

Could the archive be generated only when requested and have a fixed-size cache where older archives get removed?

That's not really feasible, as some crates (like stm32f0) have ~200k HTML files in them. They're all stored on S3, and just listing them took awscli 2 minutes and 43 seconds from the docs.rs server.

Would the archives really be that big though? Docs are generally just text, which compresses very well. You could have a separate archive for the common resources (CSS, images, fonts and so on) and then the docs archives would just be compressed HTML files.

Resources are already deduplicated on S3, and all files will be compressed soontm. Once we do that storing the prebuilt archives will double our storage requirements. Today we can afford that, but thinking long term we'll want to avoid using too much storage.

@Kapeli
Copy link
Author

Kapeli commented Sep 26, 2019

just listing them took awscli 2 minutes and 43 seconds from the docs.rs server

Taking a long time is fine. For API access you can return a message saying the docs archive isn't ready yet and to try again later, for users trying to download the docs from their browser, show a page saying the same thing, maybe a bit nicer with automatic refresh and so on. With a big enough cache size, you could optimise both CPU and disk space needs.

@jyn514
Copy link
Member

jyn514 commented Jan 9, 2020

It'd also be nice to get the nightly rustc docs without having to build the rustc

@nhynes this is out of scope for docs.rs, we only build user documentation. I'm not sure the right place to open a new issue, maybe https://github.com/rust-lang/www.rust-lang.org/issues ?

@jyn514
Copy link
Member

jyn514 commented Jan 9, 2020

We discussed this internally and this probably won't see action at least until Rust All Hands in March.

Personally, I would like to see #379 implemented and #532 merged before we make any decisions, which would let us see how much storage we'll be using in the future.

@jyn514 jyn514 added the uses-more-storage This is a reasonable request, but will require significantly more storage costs label Feb 24, 2020
@teohhanhui

This comment has been minimized.

@Kixiron

This comment has been minimized.

@teohhanhui

This comment has been minimized.

@jyn514

This comment has been minimized.

@mabbamOG
Copy link

any updates? this is a trivial issue ongoing for 4 years now...

@jsha
Copy link
Contributor

jsha commented Sep 30, 2022

(3) having the single archive for static files would mean we have to update the archive every day, and re-upload it with the whole history. Also the performance-improvement would be very small since these static files are cached in the CDN anyways.

Is the goal of downloadable docs to improve performance, or to ensure docs are available when offline? I'm assuming the latter.

In that case, it's important for tools that want to download docs to be able to enumerate all the static files that might be needed by a bundle of docs. It's not trivial to enumerate these just by processing HTML, because some are loaded by JS (e.g. search-index).

I think we probably need to start recording a mapping of rustdoc release -> list of static files, and provide that listing as part of the bundle for crate docs built with that release.

@syphar
Copy link
Member

syphar commented Sep 30, 2022

Is the goal of downloadable docs to improve performance, or to ensure docs are available when offline? I'm assuming the latter.

yes, the latter. More specifically this issue here is about offline doc readers that have to process the docs anyways to make them usable in their docsets.

In that case, it's important for tools that want to download docs to be able to enumerate all the static files that might be needed by a bundle of docs. It's not trivial to enumerate these just by processing HTML, because some are loaded by JS (e.g. search-index).

Since processing and HTML rewriting is needed anyways right now the idea was to download the missing assets when needed, where needed. The search-index is invocation specific and will be in the archive, while I think the offline doc readers wouldn't use our internal search. But that's up to them.

@jsha
Copy link
Contributor

jsha commented Sep 30, 2022

Ah, I misspoke about search-index. Good catch. But the problem exists for settings.js, settings.css, and search.js. They are loaded at runtime by other JS that uses rustdoc-vars to figure out their paths. Perhaps it's true that these pieces of functionality aren't needed by offline doc readers, but it seems like a potential source of fragility / worrying future bug.

For other, more typical <script> and <link> tags: do we know it's definitely the case that Dash and other offline doc readers will process all downloaded files to find such files and predownload them?

@syphar
Copy link
Member

syphar commented Sep 30, 2022

Ah, I misspoke about search-index. Good catch. But the problem exists for settings.js, settings.css, and search.js. They are loaded at runtime by other JS that uses rustdoc-vars to figure out their paths. Perhaps it's true that these pieces of functionality aren't needed by offline doc readers, but it seems like a potential source of fragility / worrying future bug.

The idea is to start simple, following the comment from above by Kapeli:

I prefer it if you just archive what you have now and I'll fix/rewrite/clean any issues I encounter.

Currently we're "just" exposing the internal archive for everyone that want to work with the build output in a programmatic matter.
When we have the first actual consumers we will see what's missing and if it's even possible to expose for us.

@syphar
Copy link
Member

syphar commented Sep 30, 2022

@jsha your point around assets that are not referenced directly in HTML sounds of course like a valid one, but I don't know enough about how the docsets would be processed to know what is actually needed.

IMO any more sophisticated approach would be based on this archive, so this is a valid first step.

Further steps could be to provide archives for the toolchain specific static files in a sensible way.

@Kapeli
Copy link
Author

Kapeli commented Oct 4, 2022

Thanks for working on this @syphar! I've had a look over the docs archive for verify-call mentioned in #1865 (comment) and I can make it work for my needs.

The CSS and JS are missing, but I can make Dash fetch & save them.

Please let me know when this gets deployed and archives are available for all crates.

@syphar
Copy link
Member

syphar commented Oct 4, 2022

Thanks for working on this @syphar! I've had a look over the docs archive for verify-call mentioned in #1865 (comment) and I can make it work for my needs.

The CSS and JS are missing, but I can make Dash fetch & save them.

That's good to hear, thanks for checking this @Kapeli !

Please let me know when this gets deployed and archives are available for all crates.

Will try to do, this issue will definitely be closed then.

To keep in mind: The archive only exists for all releases built since #1342 (sep 2021). Only looking at the latest version you will have an archive for ~40k crates out of ~92k. We're planning a rebuild for older releases (see #464 ), but this needs some infrastructure work we're also working on.

IMO most popular crates will be fine, and I hope this will be added to Dash anyways.

@Kapeli
Copy link
Author

Kapeli commented Oct 4, 2022

Can you provide info on how to access archives for other crates? For example, I'm not able to get the archive for the sql crate at https://static.docs.rs/rustdoc/sql/0.4.3.zip

@syphar
Copy link
Member

syphar commented Oct 4, 2022

Can you provide info on how to access archives for other crates? For example, I'm not able to get the archive for the sql crate at https://static.docs.rs/rustdoc/sql/0.4.3.zip

This will be possible after #1865 is deployed, via the endpoint directly on docs.rs. It will give you a redirect to static.docs.rs while taking care of the necessary permissions.

@Nemo157
Copy link
Member

Nemo157 commented Oct 4, 2022

And please don't depend on static.docs.rs directly, there are some changes that may require that to change (e.g. if we do #1853).

@syphar
Copy link
Member

syphar commented Oct 15, 2022

While this is merged, closed & deployed, there are some open permissions that need to be set.

I'll update you all here when that's fine

@syphar
Copy link
Member

syphar commented Nov 5, 2022

This change is live & deployed right now.

https://docs.rs/about/download

@malaire
Copy link

malaire commented Nov 5, 2022

Is there any crawling policy for these downloads? For example would it be ok to download all docs with 1 request per second speed?

@jyn514
Copy link
Member

jyn514 commented Nov 5, 2022

@malaire we haven't yet made a policy, but I'm curious why you would want to do that? It's several terabytes of data, mostly for crates that have been used less than a dozen times or aren't the latest version.

@malaire
Copy link

malaire commented Nov 5, 2022

I didn't realize it's that much data.

I don't want to rely on internet and want to keep important data locally, but as I already have all crates downloaded, in this case
it probably makes more sense to just generate docs locally as-needed instead of storing everything I might someday need.

@syphar
Copy link
Member

syphar commented Nov 5, 2022

I didn't realize it's that much data.

I don't want to rely on internet and want to keep important data locally, but as I already have all crates downloaded, in this case it probably makes more sense to just generate docs locally as-needed instead of storing everything I might someday need.

The main point of the downloadable docs archive is to give offline docs-readers like Dash the data to process so they can generate the docsets, which then can be used by Dash or Zeal (on linux).

So if you wait a little longer, you'll have it.

@Kapeli
Copy link
Author

Kapeli commented Nov 9, 2022

Thanks a lot for your work on this @syphar.

I'll integrate it into Dash ASAP.

@wdanilo
Copy link

wdanilo commented Jan 31, 2023

@Kapeli do you have any estimate when it will be available? Dash without docs for packages on crates.io is not much useful and this is an amazing app <3

@Kapeli
Copy link
Author

Kapeli commented Feb 1, 2023

@syphar I'm almost done adding support for this in Dash.

I've found a crate for which downloadable docs are not available: cfg-if: https://docs.rs/crate/cfg-if/1.0.0/download is a 404. Is this intentional (i.e. archived docs are not available for some crates yet and I should throw an error when this happens)?

@jyn514
Copy link
Member

jyn514 commented Feb 1, 2023

@Kapeli

To keep in mind: The archive only exists for all releases built since #1342 (sep 2021). Only looking at the latest version you will have an archive for ~40k crates out of ~92k. We're planning a rebuild for older releases (see #464 ), but this needs some infrastructure work we're also working on.

@syphar
Copy link
Member

syphar commented Feb 1, 2023

@syphar I'm almost done adding support for this in Dash.

this is awesome!

I've found a crate for which downloadable docs are not available: cfg-if: https://docs.rs/crate/cfg-if/1.0.0/download is a 404. Is this intentional (i.e. archived docs are not available for some crates yet and I should throw an error when this happens)?

I didn't check this specific release, but see my comment from above:

To keep in mind: The archive only exists for all releases built since #1342 (sep 2021). Only looking at the latest version you will have an archive for ~40k crates out of ~92k. We're planning a rebuild for older releases (see #464 ), but this needs some infrastructure work we're also working on.

IMO most popular crates will be fine, and I hope this will be added to Dash anyways.

So yes, for now this is expected.

archived docs are not available for some crates yet and I should throw an error when this happens)?

Yes, the most simple solution would be to show an error stating that the archive is not available or something like that. Crate authors always can request a rebuild with us if they want.

When back home I can also check our database and see if we proactively trigger a rebuild for all latest releases with >1k downloads or something like that, this depends on the numbers. But we still need the error message for the user.

I assume you mostly (only?) show latest versions?

@Kapeli
Copy link
Author

Kapeli commented Feb 2, 2023

Thanks for the clarification. I'll add the error message.

I assume you mostly (only?) show latest versions?

Pressing the Download button installs the latest stable version, but users can also install older versions by opening the popover.

Screenshot 2023-02-02 at 04 38 34

@thallada
Copy link

Has this been added to Dash yet? I don't see the Rust Docsets Third-party source in my app.

@syphar
Copy link
Member

syphar commented May 26, 2023

Has this been added to Dash yet? I don't see the Rust Docsets Third-party source in my app.

To my knowledge there is nothing released.

@Kapeli
Copy link
Author

Kapeli commented Jun 30, 2023

I'm sorry, I've been having some health issues, mostly due to burnout. I'm currently working on Dash 7, which will include Rust Docsets support. It's almost done, it just needs some polish, but I'm not sure when I'll actually release it.

@nathany
Copy link

nathany commented Jun 30, 2023

@Kapeli Thank you for the update. I hope you find the rest you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend Area: Webserver backend E-easy Effort: Should be easy to implement and would make a good first PR mentor This has instructions for getting started P-high High priority S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work.
Projects
None yet
Development

Successfully merging a pull request may close this issue.