Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust platform size #61978

Closed
ghost opened this issue Jun 20, 2019 · 18 comments · Fixed by #64823
Closed

Rust platform size #61978

ghost opened this issue Jun 20, 2019 · 18 comments · Fixed by #64823
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@ghost
Copy link

ghost commented Jun 20, 2019

I went to this page:

https://forge.rust-lang.org/other-installation-methods

and I discovered that the download is 203 MB. This was surprising to me, so I
looked at other languages:

Language Size Link
Go 135 MB https://golang.org/dl
Perl 102 MB http://strawberryperl.com/releases.html
Julia 50 MB https://julialang.org/downloads
Python 25 MB https://python.org/downloads/release/python-373
PHP 25 MB https://windows.php.net/download
D 23 MB https://dlang.org/download.html
Nim 19 MB https://nim-lang.org/install_windows.html
Ruby 11 MB https://rubyinstaller.org/downloads

So Rust is 70% larger than Go. Or to put another way, Rust is larger than Julia,
Python, PHP, D, Nim and Ruby combined. Can anything be done about this or is
the large size unavoidable?

@jonas-schievink
Copy link
Contributor

IIUC #59800 will reduce the size, potentially by a lot

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 20, 2019
@jonas-schievink
Copy link
Contributor

Also the comparison might want to include Haskell, whose installer is 268 MB (https://www.haskell.org/platform/windows.html).

@Centril
Copy link
Contributor

Centril commented Jun 20, 2019

These comparisons are probably missing various aspects, e.g. Haskell includes the full haskell platform with a list of packages as well as a runtime. Rust might include other things... We should compare with our past selves first and foremost.

@est31
Copy link
Member

est31 commented Jun 20, 2019

@cup where are you getting that 205 MB number from? When I try to download https://static.rust-lang.org/dist/rust-1.35.0-x86_64-unknown-linux-gnu.tar.gz it tells me 252 MB so the problem is even worse than you say it is... Maybe linux vs windows? Some points for where the potential bloat could come from:

  • licenses are stored multiple times for each package. Costs a few KB but I guess it can't be avoided. It would probably be useful to have a compression format that can do content based deduplication, but I guess there is none out there, tar.gz/zip is what we need to use.

  • save analysis of all std crates is being stored. Using json, the save analysis format is extremely verbose. It contains repetitive stuff like references to the same source file over and over. It could probably be improved a lot by a) interning stuff like source file names and b) a binary format. As IIRC serde is used already for json output and input, adopting bincode would be a first easy step then followed by interning (interning can be done transparently from bincode). The save-analysis format is unstable and probably always will be, given that replacement is on its way with rust-analyzer. If there is ever a desire to stabilize, one can switch back to json. As for the point that we are using deflate compression, sure it helps but a native compression would help even more.

  • rust's copy of LLVM is 76 MiB while julia's copy of LLVM is 48 MB (both uncompressed). Why is that the case? Rust has LLVM 8.0 and julia 6.0 but that can't be the reason, has LLVM grown this much in size between two versions?

  • there is a documentation directory that contains rustdoc/mdbook of various components. Each component has its very own copy of fonts (each almost half of a MB large). It's repeated multiple times. I'm sure much can be saved by using a shared copy of those fonts.

@mati865
Copy link
Contributor

mati865 commented Jun 21, 2019

rust's copy of LLVM is 76 MiB while julia's copy of LLVM is 48 MB (both uncompressed). Why is that the case?

At least for Linux a lot of dependencies are linked statically (even libstdc++) to make it work on old Linux releases.

@est31
Copy link
Member

est31 commented Jul 7, 2019

@cup no need to wait for nightlies, the built artifacts are already being uploaded to public servers. This gives us the following size impact of #59800 :

file name dbeed58 (prior to #59800 merge) dd2e804 (after #59800 merge)
rustc-nightly-x86_64-unknown-linux-gnu.tar.gz 122 073 207 100 939 501
rust-std-nightly-x86_64-unknown-linux-gnu.tar.gz 82 951 040 211 427 679
cargo-nightly-x86_64-unknown-linux-gnu.tar.gz 6 751 312 6 752 518
rustc-nightly-x86_64-pc-windows-msvc.tar.gz 82 391 063 59 805 292
rust-std-nightly-x86_64-pc-windows-msvc.tar.gz 74 663 690 206 604 497
cargo-nightly-x86_64-pc-windows-msvc.tar.gz 4 450 362 4 443 326

Numbers collected via commands like export COMMIT=dd2e8040a35883574ae0c4cc7a4e887ecb66469c; export TOOL=rustc-nightly-x86_64-unknown-linux-gnu.tar.gz; curl -I https://s3-us-west-1.amazonaws.com/rust-lang-ci2/rustc-builds/${COMMIT}/${TOOL} | rg Content-Length.

So for rustc there is a nice sweet reduction but std has a massive size increase. @Zoxc why is this the case? Maybe some cache not being emptied?

@Zoxc
Copy link
Contributor

Zoxc commented Jul 7, 2019

@est31 rust-std also includes the compiler crates, and with #59800 there's duplication for these as the same code is included in the rlibs and rustc_driver's dylib.

@lilydjwg
Copy link

lilydjwg commented Jul 8, 2019

Hi, I have a question. What needs those rlib files in rust-std? Can I remove those rlib files from rust-std and have most crates build as normal? (I'm repackaging rust-nightly for Arch Linux.)

It's 109MiB larger now and it'll take me four more minutes to download (what's worse, I can't watch online videos meanwhile)....

@est31
Copy link
Member

est31 commented Jul 8, 2019

There are some generic bloat problems that Rust has like #46477 which increase binary size which probably play into Rust's large platform size, too.

@mati865
Copy link
Contributor

mati865 commented Jul 8, 2019

Before #59800 a lot of symbols were placed into shared libs and duplication wasn't an issue.
Many of those shared libs were replaced by static ones (.rlib) which has benefits (better performance, simplification of Rust internals) and downsides (duplicated symbols, bigger size).

Using .tar.xz archives instead of .tar.gz reduces download size of linux-gnu std by 30 MiB at the expense of extraction time. It can help with slow connections.

Hi, I have a question. What needs those rlib files in rust-std?

Rust itself and the crates.

Can I remove those rlib files from rust-std and have most crates build as normal?

Some of the libs aren't necessary for every use case but they are relatively small. Big libs which are the issue here cannot be removed.


It doesn't mean nothing can be improved.

Multiple dependencies are built multiple times, it's clearly visible with libc. In linux-gnu std from commit dd2e804 there are 2 liblibc-*.rlib with size 2 MiB each.
Most of the duplicates are much smaller but also harder to find because they don't have their own libs and can be found only inside other libs.

EDIT: cc #57076

@ghost
Copy link
Author

ghost commented Jul 8, 2019

@mati865 thank you for the detailed response. However this part is concerning:

Using .tar.xz archives instead of .tar.gz reduces download size of linux-gnu
std by 30 MiB at the expense of extraction time. It can help with slow
connections.

that seems to be "cutting at the branches rather than cutting at the root". To
compare Go 1.12.6 is 118 MB:

https://dl.google.com/go/go1.12.6.windows-amd64.msi

Then Rust is 294 MB:

https://static.rust-lang.org/dist/rust-nightly-x86_64-pc-windows-gnu.msi

thats nearly 2.5 times larger. I know every language is different but this
seems too much.

@mati865
Copy link
Contributor

mati865 commented Jul 8, 2019

that seems to be "cutting at the branches rather than cutting at the root"

It's not the solution by any means, I just wanted to make people aware of better compressed archives.

I know every language is different but this
seems too much.

Yes, it's concerning for sake of the completeness there is another big compiler.
Official LLVM .tar.xz Linux archive is 325 MiB and for Rust commit dd2e8040a35883574ae0c4cc7a4e887ecb66469c it's 262 MiB.
It also means Rust is now bigged than GHC.

That said nightly to nightly it's over 100 MiB download size regression and there should be an effort on reducing it but there is no immediate solution.

@cuviper
Copy link
Member

cuviper commented Jul 25, 2019

Can I remove those rlib files from rust-std and have most crates build as normal?

Some of the libs aren't necessary for every use case but they are relatively small. Big libs which are the issue here cannot be removed.

The two biggest are librustc.rlib (85M) and librustc_mir.rlib (42M) -- are you sure those cannot be removed? I suppose nightly/unstable users may need those, but for stable ISTM we only need the artifacts from x.py build libstd and libtest (which includes the proc-macro shim).

@eddyb
Copy link
Member

eddyb commented Aug 16, 2019

We could potentially replace the .rlibs with just .rmetas, which would still allow to use those crates, we'd just need a way to force linking against librustc_driver.so (which contains the actual machine code from those crates).

I've previously suggested this in #59800 (comment).

@ehuss
Copy link
Contributor

ehuss commented Sep 26, 2019

I'm sorry for being dense, but what is the reason for having two copies of the compiler? That is, each distribution contains the duplicates lib/librustc_driver-7f45b60a9f549617.dylib and lib/rustlib/x86_64-apple-darwin/lib/librustc_driver-7f45b60a9f549617.dylib (among many others).

@cuviper
Copy link
Member

cuviper commented Sep 26, 2019

lib/* are host libraries installed by the rustc component, and lib/rustlib/$target/lib/* are target libraries installed by the rust-std component. The latter may be used for cross compilation too.

But yes, that duplication is unfortunate.

@ehuss
Copy link
Contributor

ehuss commented Sep 26, 2019

Can you explain more what purpose they serve for cross compilation? Targets which do not have host builds do not include a second copy of the compiler. Is it to support projects that link the compiler directly with extern crate rustc? I'm not sure I see how the target's librustc_driver.so is relevant for cross compiling.

@cuviper
Copy link
Member

cuviper commented Sep 26, 2019

Is it to support projects that link the compiler directly with extern crate rustc?

Yes, you could cross compile something using rustc libs -- unstable, of course. But if #64823 goes through, we won't include this in rust-std anymore, probably just another optional component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants