Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation size can grow significantly due to documentation of impls on type aliases #115718

Closed
steffahn opened this issue Sep 9, 2023 · 20 comments · Fixed by #116471
Closed
Labels
A-rustdoc-ui Area: Rustdoc UI (generated HTML) C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@steffahn
Copy link
Member

steffahn commented Sep 9, 2023

In the latest nightly (I’ve tested 62ebe3a2b 2023-09-08), from #115201 we get better/more documentation on type aliases, including inherent and trait impls of the aliased type.

For some crates, this can have a severe effect on the size of the generated documentation (and the time it takes to generate it).

I’ve pointed this out on the PR after merging, too ( #115201 (comment) ), but it’s better tracked as a separate issue, since the change is overall really nice, and even in some of these cases, the larger documentation is a tradeoff against arguably better documentation.

The crates I’ve tested are nalgebra and typenum. The nalgebra crate features a single and very impl-rich Matrix type together with a very large number of type aliases for Matrix. These aliases at the moment contain notices like

Because this is an alias, not all its methods are listed here. See the Matrix type too.

which indicates, a nice inlined listing of (an appropriately filtered list of) relevant impls is a good addition. However, I’m also noticing the generated docs grow significantly in size. Using default features, I see between 1.72 and the abovementioned nightly, the size grows from 27MB to 515MB.

At the very least, it’s probably quite desired that the significant number of deprecated aliases don’t create extra HTML bloat.

Looking at typenum, it features a handful of types like in particular UInt, NInt, and PInt for type-level unsigned, and signed (positive and negative) integers, each of which come with a significant amount of trait impls, though by far not as large as nalgebra’s Matrix’s impls list. However, the number of type aliases involving these is absolutely huge, and the end result for my test was a size increase from 44MB to 523MB for the generated documentation. For the case of typenum, all this extra HTML content seems – at least to me – pretty much unnecessary.

How to address this?

It seems to me that only a handful of crates like this are really negatively affected. While for the case of deprecated items, we could decide from the side of rustdoc to just not include the impls automatically, for a case like typenum it seems to me like an opt-out might be appropriate.

Right now, a way typenum could avoid the issue would be to use #[doc(hidden)] to entirely hide the alias. (Then, it could still describe in prose which numbers are supported by aliases.) But that seems like it has unnecessary downsides in reader experience. One opt-in solution for hiding impls on an alias could thus be an alternative attribute, similar to existing #[doc(hidden)] and #[doc(inline)] attributes, e.g. it could be named #[doc(no_impls)] or #[doc(hidden_impls)] (and [at least initially] only be supported only for type aliases).

Further evaluation

I should probably also look into the effects on ndarray. And I might be unaware – off the top of my head – of other crates that could be significantly affected. Feel free to point out any crates that have similarly significant additional “bloated” documentation from this change, so we can evaluate whether the discussed or to-be-discussed approaches help in those cases as well.

@steffahn steffahn added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Sep 9, 2023
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 9, 2023
@steffahn steffahn changed the title Documentation size can grow significantly due documentation of impls on type aliases Documentation size can grow significantly due to documentation of impls on type aliases Sep 9, 2023
@steffahn
Copy link
Member Author

steffahn commented Sep 9, 2023

For completeness, let me include the answer ( #115201 (comment) ) from @GuillaumeGomez on the PR thread where I first pointed this out:

I think the concern is valid, and it might be possible to add a mechanism to allow to disable this behaviour, but it needs to be discussed first about:

  • Do we want to allow people to opt out of this feature?
  • How do we disable this feature?
    • A global attribute?
    • A command line option?
    • An attribute to be used on each type alias?
    • An attribute to be used on module?

Of course, any of the item above can be combined with others. Lot of possibilities. ;)

@steffahn
Copy link
Member Author

steffahn commented Sep 9, 2023

@rustbot label regression-from-stable-to-nightly, -regression-untriaged, T-rustdoc, A-rustoc-ui

edit why did this silently do nothing?

@GuillaumeGomez GuillaumeGomez added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Sep 9, 2023
@GuillaumeGomez
Copy link
Member

I added it to rustdoc team agenda to be discussed in next meeting.

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 11, 2023
@the8472
Copy link
Member

the8472 commented Sep 11, 2023

Would the type-definitions in those cases be mergeable or do the type aliases add constraints that make the impl docs unique?

@steffahn
Copy link
Member Author

@the8472 What would you mean by “mergeable”? Like including html from multiple files on the same page? Each impl block gets its own html file and is included into all places where it shall appear? Or do you mean something entirely different?

Is this even possible with HTML? (I’m quite the HTML noob myself.) Or would we like an approach to only include impls on type alias pages if Javascript is enabled?

@the8472
Copy link
Member

the8472 commented Sep 11, 2023

What would you mean by “mergeable”? Like including html from multiple files on the same page?

No, just generating one page for a set of ~similar type aliases. E.g. if a type-alias sets a generic to a specific value and that makes some impls non-applicable then those impls shouldn't be listed on that page. But that only works most of the type-aliases set the same value.

I haven't actually seen an example what these changes look like (the PR didn't have an example)... so maybe my suggestion doesn't make sense.

Is this even possible with HTML?

Kinda, kinda not. iframes could be used but they have lots of limitations. There also are some legacy features around XML/XSLT that did including but that's not relevant to HTML.
Anyway, not what I was suggesting.

@steffahn
Copy link
Member Author

steffahn commented Sep 11, 2023

No, just generating one page for a set of ~similar type aliases. E.g. if a type-alias sets a generic to a specific value and that makes some impls non-applicable then those impls shouldn't be listed on that page. But that only works most of the type-aliases set the same value.

Thanks for the clarification. I think that might be hard to do nicely. I do believe that for most use-cases the current (new) approach of listing relevant impls on the type alias page itself, is a very good and useful approach … my only concern really is only with the memory and time overhead for the more extreme cases of many (typically macro-generated) aliases of a complex (many impls) type with various parameters.

I haven't actually seen an example what these changes look like (the PR didn't have an example)... so maybe my suggestion doesn't make sense.

To look at the documentation in question here, all that’s necessary is:

cargo new test_new_aliases_doc
cd test_new_aliases_doc
cargo add nalgebra
cargo doc --open

and then navigate to …/test_new_aliases_doc/target/doc/nalgebra/base/index.html#types (i.e. module base in crate nalgebra) and choose any of the type synonyms to look at.

A few type aliases can also be found in std, e.g. std::io::Result. You could also look into recently published crates on docs.rs e.g. this one is one I could find that has a few type aliases where you can look at what the docs look like.

@the8472
Copy link
Member

the8472 commented Sep 11, 2023

Then maybe a general heuristic would then make sense to generate a N-similar-item page instead of 1-item alias page once the number of similar aliases in a module exceeds some threshold.

@GuillaumeGomez
Copy link
Member

That seems "too clever" to sound like a good idea to be honest.

@the8472
Copy link
Member

the8472 commented Sep 11, 2023

Nono, it's quite obvious. Multiple Items, Single Documentation (MISD, SIMD's strange cousin) 😉

@GuillaumeGomez
Copy link
Member

Yes, but I don't see how the documentation rendering, especially if there is added documentation on the type alias, will be done nicely and won't be like super confusing.

@the8472
Copy link
Member

the8472 commented Sep 11, 2023

Type aliases for Foo<A=CommonValue, B>

Bar = Foo<A=CommonValue, B=OtherValue> [+]

Description

Baz<T> = Foo<A=CommonValue, T> [+]

Description

Implementations

Note: All the above aliases have the same set of implementations

Impl

...

@GuillaumeGomez
Copy link
Member

It would also mean displaying the "Aliased type" for each too. For small ones, it's fine, for big ones, not so much.

@the8472
Copy link
Member

the8472 commented Sep 11, 2023

Well you can move that down to the collapsible block and just list the name in the headline. I'm gesturing at the general area in the solution space here, not claiming to have solved every issue.

@GuillaumeGomez
Copy link
Member

Sorry if my comments seemed a bit harsh. It's just that UI/UX problems are actually often very complicated and I tried for each of your suggestions to show a potential problem. 😅

@the8472
Copy link
Member

the8472 commented Sep 11, 2023

Yeah I'll acknowledge that such a batch-page wouldn't be ideal and that an actual implementation attempt may run into more serious problems. It is a suggestion to solve the footprint issue and work around the limitation of HTML at potentially non-zero loss of readability but without completely removing the pages which might be a bigger loss.

@GuillaumeGomez
Copy link
Member

The only way I can see for now would be to use JS, but I'm not a big fan of anything JS-related as it increases the code quantity quite a lot (and also doesn't work if you disable JS).

@the8472
Copy link
Member

the8472 commented Sep 11, 2023

If you mean including via JS then we could have a fallback by showing a link to the shared impls page when JS is off.

An even simpler heuristic is to stop rendering the impl blocks above some threshold of ~similar type aliases and link to the original type instead

@apiraino
Copy link
Contributor

apiraino commented Sep 12, 2023

@rustbot label regression-from-stable-to-nightly, -regression-untriaged, T-rustdoc, A-rustoc-ui

edit why did this silently do nothing?

without commas (I think) and fixed one typo

@rustbot label +regression-from-stable-to-nightly -regression-untriaged +T-rustdoc +A-rustdoc-ui

@rustbot rustbot added A-rustdoc-ui Area: Rustdoc UI (generated HTML) regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. and removed regression-untriaged Untriaged performance or correctness regression. labels Sep 12, 2023
@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Sep 12, 2023
@fmease fmease added the I-heavy Issue: Problems and improvements with respect to binary size of generated code. label Sep 25, 2023
@camelid
Copy link
Member

camelid commented Oct 2, 2023

We discussed this in the t-rustdoc meeting today, and we felt that the feature should be kept but we need to change the implementation to reduce the page sizes. Most people were on board with using JS to share the HTML across type aliases, though there's some concern about JS maintainability.

@camelid camelid added the P-high High priority label Oct 2, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 7, 2023
…r=<try>

rustdoc: use JS to inline target type impl docs into alias

Preview docs:

- https://notriddle.com/rustdoc-html-demo-5/js-trait-alias/std/io/type.Result.html

- https://notriddle.com/rustdoc-html-demo-5/js-trait-alias-compiler/rustc_middle/ty/type.PolyTraitRef.html

*Review note: This is mostly just reverting rust-lang#115201. The last commit has the new work in it.*

Fixes rust-lang#115718

This is an attempt to balance three problems, each of which would
be violated by a simpler implementation:

- A type alias should show all the `impl` blocks for the target
  type, and vice versa, if they're applicable. If nothing was
  done, and rustdoc continues to match them up in HIR, this
  would not work.

- Copying the target type's docs into its aliases' HTML pages
  directly causes far too much redundant HTML text to be generated
  when a crate has large numbers of methods and large numbers
  of type aliases.

- Using JavaScript exclusively for type alias impl docs would
  be a functional regression, and could make some docs very hard
  to find for non-JS readers.

- Making sure that only applicable docs are show in the
  resulting page requires a type checkers. Do not reimplement
  the type checker in JavaScript.

So, to make it work, rustdoc stashes these type-alias-inlined docs
in a JSONP "database-lite". The file is generated in `write_shared.rs`,
included in a `<script>` tag added in `print_item.rs`, and `main.js`
takes care of patching the additional docs into the DOM.

The format of `trait.impl` and `type.impl` JS files are superficially
similar. Each line, except the JSONP wrapper itself, belongs to a crate,
and they are otherwise separate (rustdoc should be idempotent). The
"meat" of the file is HTML strings, so the frontend code is very simple.
Links are relative to the doc root, though, so the frontend needs to fix
that up, and inlined docs can reuse these files.

However, there are a few differences, caused by the sophisticated
features that type aliases have. Consider this crate graph:

```text
 ---------------------------------
 | crate A: struct Foo<T>        |
 |          type Bar = Foo<i32>  |
 |          impl X for Foo<i8>   |
 |          impl Y for Foo<i32>  |
 ---------------------------------
     |
 ----------------------------------
 | crate B: type Baz = A::Foo<i8> |
 |          type Xyy = A::Foo<i8> |
 |          impl Z for Xyy        |
 ----------------------------------
```

The type.impl/A/struct.Foo.js JS file has a structure kinda like this:

```js
JSONP({
"A": [["impl Y for Foo<i32>", "Y", "A::Bar"]],
"B": [["impl X for Foo<i8>", "X", "B::Baz", "B::Xyy"], ["impl Z for Xyy", "Z", "B::Baz"]],
});
```

When the type.impl file is loaded, only the current crate's docs are
actually used. The main reason to bundle them together is that there's
enough duplication in them for DEFLATE to remove the redundancy.

The contents of a crate are a list of impl blocks, themselves
represented as lists. The first item in the sublist is the HTML block,
the second item is the name of the trait (which goes in the sidebar),
and all others are the names of type aliases that successfully match.

This way:

- There's no need to generate these files for types that have no aliases
  in the current crate. If a dependent crate makes a type alias, it'll
  take care of generating its own docs.
- There's no need to reimplement parts of the type checker in
  JavaScript. The Rust backend does the checking, and includes its
  results in the file.
- Docs defined directly on the type alias are dropped directly in the
  HTML by `render_assoc_items`, and are accessible without JavaScript.
  The JSONP file will not list impl items that are known to be part
  of the main HTML file already.

[JSONP]: https://en.wikipedia.org/wiki/JSONP
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 23, 2023
…r=<try>

rustdoc: use JS to inline target type impl docs into alias

Preview docs:

- https://notriddle.com/rustdoc-html-demo-5/js-trait-alias/std/io/type.Result.html

- https://notriddle.com/rustdoc-html-demo-5/js-trait-alias-compiler/rustc_middle/ty/type.PolyTraitRef.html

*Review note: This is mostly just reverting rust-lang#115201. The last commit has the new work in it.*

Fixes rust-lang#115718

This is an attempt to balance three problems, each of which would
be violated by a simpler implementation:

- A type alias should show all the `impl` blocks for the target
  type, and vice versa, if they're applicable. If nothing was
  done, and rustdoc continues to match them up in HIR, this
  would not work.

- Copying the target type's docs into its aliases' HTML pages
  directly causes far too much redundant HTML text to be generated
  when a crate has large numbers of methods and large numbers
  of type aliases.

- Using JavaScript exclusively for type alias impl docs would
  be a functional regression, and could make some docs very hard
  to find for non-JS readers.

- Making sure that only applicable docs are show in the
  resulting page requires a type checkers. Do not reimplement
  the type checker in JavaScript.

So, to make it work, rustdoc stashes these type-alias-inlined docs
in a JSONP "database-lite". The file is generated in `write_shared.rs`,
included in a `<script>` tag added in `print_item.rs`, and `main.js`
takes care of patching the additional docs into the DOM.

The format of `trait.impl` and `type.impl` JS files are superficially
similar. Each line, except the JSONP wrapper itself, belongs to a crate,
and they are otherwise separate (rustdoc should be idempotent). The
"meat" of the file is HTML strings, so the frontend code is very simple.
Links are relative to the doc root, though, so the frontend needs to fix
that up, and inlined docs can reuse these files.

However, there are a few differences, caused by the sophisticated
features that type aliases have. Consider this crate graph:

```text
 ---------------------------------
 | crate A: struct Foo<T>        |
 |          type Bar = Foo<i32>  |
 |          impl X for Foo<i8>   |
 |          impl Y for Foo<i32>  |
 ---------------------------------
     |
 ----------------------------------
 | crate B: type Baz = A::Foo<i8> |
 |          type Xyy = A::Foo<i8> |
 |          impl Z for Xyy        |
 ----------------------------------
```

The type.impl/A/struct.Foo.js JS file has a structure kinda like this:

```js
JSONP({
"A": [["impl Y for Foo<i32>", "Y", "A::Bar"]],
"B": [["impl X for Foo<i8>", "X", "B::Baz", "B::Xyy"], ["impl Z for Xyy", "Z", "B::Baz"]],
});
```

When the type.impl file is loaded, only the current crate's docs are
actually used. The main reason to bundle them together is that there's
enough duplication in them for DEFLATE to remove the redundancy.

The contents of a crate are a list of impl blocks, themselves
represented as lists. The first item in the sublist is the HTML block,
the second item is the name of the trait (which goes in the sidebar),
and all others are the names of type aliases that successfully match.

This way:

- There's no need to generate these files for types that have no aliases
  in the current crate. If a dependent crate makes a type alias, it'll
  take care of generating its own docs.
- There's no need to reimplement parts of the type checker in
  JavaScript. The Rust backend does the checking, and includes its
  results in the file.
- Docs defined directly on the type alias are dropped directly in the
  HTML by `render_assoc_items`, and are accessible without JavaScript.
  The JSONP file will not list impl items that are known to be part
  of the main HTML file already.

[JSONP]: https://en.wikipedia.org/wiki/JSONP
@bors bors closed this as completed in 6f349cd Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-ui Area: Rustdoc UI (generated HTML) C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants