Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to omit LLVM bitcode from rlibs. #66598

Closed
wants to merge 1 commit into from

Conversation

nnethercote
Copy link
Contributor

Because it's only needed when doing LTO, so including it all the time
wastes time and disk space.

r? @michaelwoerister

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 21, 2019
@nnethercote
Copy link
Contributor Author

I originally tried disabling the compression of LLVM bitcode in rlibs. It gave a 1-5% speed-up for debug builds but increased rlib size by about 25%.

@alexcrichton then observed that bitcode simply isn't necessary unless you are building with LTO. (And libstd is a special case.) He said if I implement a "no rlib bitcode" flag he could modify Cargo's invocations to use it where appropriate.

I'm not 100% certain I've done everything correctly. In particular I left allocator_config unaffected by this because Alex said libstd needed to be treated differently... but I'm not sure if allocator_config actually corresponds to libstd. But all the tests pass, which is good. Perhaps there should be a new test added, too?

I measured a debug build of https://github.com/mozilla/fix-stacks, which uses 85 crates. The total size of the .rlib files drops from 276.6 MB to 219.2 MB, a 20.8% reduction!

I also measured the speed of a debug build of the regex benchmark. Omitting the bitcode gives a 13% reduction in instruction counts! I will do fuller measurements of all the rustc-perf benchmarks later today.

@nnethercote nnethercote changed the title Add an option to omit LLBM bitcode from rlibs. Add an option to omit LLVM bitcode from rlibs. Nov 21, 2019
Because it's only needed when doing LTO, so including it all the time
wastes time and disk space.
@nnethercote
Copy link
Contributor Author

Local performance measurements are excellent. Here are instruction count results for "clean" debug and opt builds:

issue-46449-debug
        avg: -18.4%     min: -18.4%     max: -18.4%
regression-31157-debug
        avg: -17.9%     min: -17.9%     max: -17.9%
webrender-debug
        avg: -15.9%     min: -15.9%     max: -15.9%
syn-debug
        avg: -14.9%     min: -14.9%     max: -14.9%
cargo-debug
        avg: -14.7%     min: -14.7%     max: -14.7%
piston-image-debug
        avg: -13.8%     min: -13.8%     max: -13.8%
regex-debug
        avg: -13.4%     min: -13.4%     max: -13.4%
style-servo-debug
        avg: -12.4%     min: -12.4%     max: -12.4%
encoding-debug
        avg: -12.3%     min: -12.3%     max: -12.3%
cranelift-codegen-debug
        avg: -12.2%     min: -12.2%     max: -12.2%
clap-rs-debug
        avg: -11.9%     min: -11.9%     max: -11.9%
hyper-2-debug
        avg: -11.4%     min: -11.4%     max: -11.4%
html5ever-debug
        avg: -10.9%     min: -10.9%     max: -10.9%
deeply-nested-debug
        avg: -10.9%     min: -10.9%     max: -10.9%
webr_api-debug
        avg: -10.3%     min: -10.3%     max: -10.3%
futures-debug
        avg: -5.7%      min: -5.7%      max: -5.7%
inflate-debug
        avg: -4.8%      min: -4.8%      max: -4.8%
ucd-debug
        avg: -3.6%      min: -3.6%      max: -3.6%
html5ever-opt
        avg: -3.6%      min: -3.6%      max: -3.6%
webrender-opt
        avg: -3.3%      min: -3.3%      max: -3.3%
cranelift-codegen-opt
        avg: -3.2%      min: -3.2%      max: -3.2%
ucd-opt
        avg: -2.7%      min: -2.7%      max: -2.7%
keccak-debug
        avg: -2.7%      min: -2.7%      max: -2.7%
encoding-opt
        avg: -2.3%      min: -2.3%      max: -2.3%
ctfe-stress-4-opt
        avg: -2.3%?     min: -2.3%?     max: -2.3%?
ctfe-stress-4-debug
        avg: -1.7%?     min: -1.7%?     max: -1.7%?
issue-46449-opt
        avg: -1.5%      min: -1.5%      max: -1.5%
style-servo-opt
        avg: -1.4%      min: -1.4%      max: -1.4%
deeply-nested-opt
        avg: -1.4%      min: -1.4%      max: -1.4%
hyper-2-opt
        avg: -1.3%      min: -1.3%      max: -1.3%
tokio-webpush-simple-debug
        avg: -1.1%      min: -1.1%      max: -1.1%
syn-opt
        avg: -1.1%?     min: -1.1%?     max: -1.1%?
ripgrep-debug
        avg: -1.1%      min: -1.1%      max: -1.1%
regression-31157-opt
        avg: -1.1%      min: -1.1%      max: -1.1%
cargo-opt
        avg: -1.0%      min: -1.0%      max: -1.0%

The wall-time improvements were not as high, but still good, up to 9% improvements.

Unfortunately we can't do a CI perf run because the comparison requires changing the rustc-perf harness to set RUSTFLAGS="-C no-rlib-bitcode" for the second run.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice wins!

As proposed this is an instantly stable option, which means that I think we'll want to get buy-in from the compiler team before landing. As a mild amount of bikeshedding, how about -C rlib-bitcode=no instead of -C no-rlib-bitcode? That way if we ever wanted we'd have a way to to -C rlib-bitcode=uncompressed or something like that.

Also can you add a test or two exercising this? It might be good to sanity check what the error message looks like for LTO as well, for example if you comiple an rlib without bitcode, and then attempt to LTO with it later, does rustc ICE or have an otherwise intelligible error message?

if need_crate_bitcode_for_rlib(sess) {
modules_config.emit_bc_compressed = true;
if !sess.opts.cg.no_rlib_bitcode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this extra conditional be folded into the need_crate_bitcode_for_rlib function? I think that would still work, although it'd be good to check.

@nnethercote
Copy link
Contributor Author

I am happy to change the option name. I am also happy to write tests, but I need some help on that. What existing tests should I look at for guidance? Thanks.

@nnethercote
Copy link
Contributor Author

Oh, and how should I gain buy-in from the compiler team?

@alexcrichton
Copy link
Member

For tests I think it'd be fine to add something in src/test/ui which just has a few compile-flags directives. For the compiler team, let's try a cc @rust-lang/compiler to see if they're interested :)

@eddyb
Copy link
Member

eddyb commented Nov 21, 2019

I would suggest referring to bitcode by its --emit name, i.e. llvm-bc.
Together with @alexcrichton's suggestion, it'd be -C rlib-llvm-bc=no.

Bikeshedding aside, this is an impressive win, I'm a bit shocked we still had chunks this large of work that could be omitted (even if only in most cases, not all).

@Centril
Copy link
Contributor

Centril commented Nov 21, 2019

As proposed this is an instantly stable option, which means that I think we'll want to get buy-in from the compiler team before landing. As a mild amount of bikeshedding, how about -C rlib-bitcode=no instead of -C no-rlib-bitcode? That way if we ever wanted we'd have a way to to -C rlib-bitcode=uncompressed or something like that.

Shouldn't this start as a -Z flag first just to make sure there are no unforeseen issues in the implementation?

@eddyb
Copy link
Member

eddyb commented Nov 21, 2019

Another idea I just had: since Cargo shouldn't care about LLVM being the backend, but rather is interested in controlling "does this rlib need to be LTO-enabled?", we could tailor the flag to that.

That is, something like -C rlib-supports-lto=no (more bikeshed material I suppose).

@michaelwoerister
Copy link
Member

In general, I am very much for getting rid of LLVM bitcode in rlibs where possible!
I think the main issue that needs resolving is interaction with other features (especially xLTO) so we don't end up with a weird set of commandline options that kind of overlap although they shouldn't.

@@ -1222,6 +1222,8 @@ options! {CodegenOptions, CodegenSetter, basic_codegen_options,
"compile the program with profiling instrumentation"),
profile_use: Option<PathBuf> = (None, parse_opt_pathbuf, [TRACKED],
"use the given `.profdata` file for profile-guided optimization"),
no_rlib_bitcode: bool = (false, parse_bool, [TRACKED],
"don't put LLVM bitcode in rlib files"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling this only requires the rlib itself to be rebuild, instead of throwing away the whole incr cache, right?

@michaelwoerister
Copy link
Member

This might also interact with #64124.

@michaelwoerister
Copy link
Member

michaelwoerister commented Nov 25, 2019

Before we go ahead and merge this I'd like to discuss a different approach:

Clang handles LTO differently from rustc in that it just emits LLVM bitcode instead of machine code into object files. However, it also has a fallback option of adding LLVM bitcode to the object file. As far as I know, the LLVM linker plugin supports this second format too and it would allow us to ship a fat libstd that can be used with and without LTO (with the added bonus of making libstd available to cross-language LTO (EDIT: fixed link)). rustc actually already supports this via -Zembed-bitcode.

So my proposal would be to make rustc's behavior match Clang's:

  • An rlib compiled for whole-crate-graph (Thin)-LTO does not contain any machine code. It contains .o files that are actually (uncompressed) LLVM bitcode.
  • An rlib not compiled with explicit LTO support contains only machine code and no LLVM bitcode.
  • The standard library is compiled with -Zembed-bitcode so it is available for LTO.
  • The Rust compiler is updated to read bitcode from object-files instead of extra RLIB entries.
  • Cargo compiles rlibs with the appropriate -Clto flags.

I think it would be great if we followed Clang's model here in order to tap into existing toolchain support. It also does not waste space. And LLVM actually wants to do some things differently depending on whether a module is intended for being using with thin, fat, or no LTO.

The only downsides I see are:

  • This approach inherits the somewhat confusing fact that .o files are sometimes machine code and sometimes LLVM machine code. I think this is a minor issue.
  • I'm not sure if cargo can do all of this in a backwards compatible way. With this approach we need to tell rustc if an RLIB is intended to be used for LTO. We don't need to do that currently, I think.

@alexcrichton
Copy link
Member

I'm definitely on board for your plan @michaelwoerister, that all sounds great to me. Cargo is paired with only one version of rustc, so we can handle any Cargo changes on our side of things. It may be necessary to do a small dance though to phase this in. For now I'd just focus on landing rustc changes and then we can fix any issues in Cargo if they arise.

@bjorn3
Copy link
Member

bjorn3 commented Nov 25, 2019

What about non cargo users of rustc? If I understand correctly you can currently compile using LTO by only compiling the executable using -Clto, while removing bitcode from rlibs by default would prevent that. Maybe only remove bitcode from rlibs when explicitely told to do so? For example using -Clto=no.

@nnethercote
Copy link
Contributor Author

@michaelwoerister, the first two points of your proposal match something @alexcrichton and I had discussed on Zulip, which I was calling "bitcode XOR machine code". Your entire proposal sounds reasonable to me (a non-expert on this stuff). But it's also a much bigger change, and the next steps aren't clear to me, including who should work on this.

(BTW, your "making libstd available to cross-language LTO" link is incorrect above; I'm not sure what the correct address should be.)

What about non cargo users of rustc? If I understand correctly you can currently compile using LTO by only compiling the executable using -Clto, while removing bitcode from rlibs by default would prevent that. Maybe only remove bitcode from rlibs when explicitely told to do so? For example using -Clto=no.

I'm not sure about the exact details and possibilities here. In general, I strongly suggest that the common case should be as optimized (in terms of speed and disk space) as possible. Which isn't currently the case -- non-LTO users have to endure the significant cost of adding bitcode even though they don't use it.

@Centril
Copy link
Contributor

Centril commented Nov 25, 2019

(How) does @michaelwoerister's proposal impact having e.g. Cranelift as an alternative backend?

@bjorn3
Copy link
Member

bjorn3 commented Nov 25, 2019

When LTO is disabled nothing changes, as the bitcode wasn't used anyway. When LTO is enabled other backends can just ignore the fact and produce object files containing machine code like normally. In the worst case linking to rlibs generated by cg_llvm with LTO enabled is not possible. When using linker plugin LTO I think even that should work though.

@michaelwoerister
Copy link
Member

@bjorn3

What about non cargo users of rustc? If I understand correctly you can currently compile using LTO by only compiling the executable using -Clto, while removing bitcode from rlibs by default would prevent that.

Yes, with my proposal one would already have to compile rlibs for use with LTO (like is the case with Clang).

Maybe only remove bitcode from rlibs when explicitely told to do so? For example using -Clto=no.

The case where one forgets to compile RLIBs with -Clto would still compile successfully, it's just that the RLIBs would not participate in LTO. Since this is a harmless failure scenario, I'm not sure if it justifies defaulting to the less efficient case. If this were a new feature, I'd definitely opt for omitting bitcode by default.

@nnethercote I fixed the link.

Off the top of my head, I'd say the next steps are:

  • Take a look at what the -Clinker-plugin-lto flag does. It already causes rustc to emit bitcode into object files. Make rustc match that behavior for when an RLIB is compiled with -Clto, except for the things that are -Clinker-plugin-lto specific workarounds (which should be identifiable by comments).
  • Take a look at where the backend gets LLVM IR from and make that read the corresponding object file instead. This might entail checking if the object file is a regular object, LLVM bitcode, or a fat object file with bitcode in a special section. (I'm afraid that some of this is in a messy part of the codebase).
  • Bonus points for making rustc able to handle LTO dependencies in the non-LTO case, i.e. falling back to compiling their bitcode instead of failing. That's not strictly necessary though.

Much of the infrastructure should already be in place. I'm not quite sure how to handle Rust dylibs. Probably make them "fat" by default.

@Mark-Simulacrum
Copy link
Member

The case where one forgets to compile RLIBs with -Clto would still compile successfully, it's just that the RLIBs would not participate in LTO. Since this is a harmless failure scenario, I'm not sure if it justifies defaulting to the less efficient case. If this were a new feature, I'd definitely opt for omitting bitcode by default.

If I'm understanding you correctly, this doesn't sound "harmless" to me -- if I specify lto to Cargo/rustc, I would prefer that rustc or Cargo tell me that this isn't possible, rather than silently not perform it for some code in my dependency tree.

@michaelwoerister
Copy link
Member

If you go through Cargo then you wouldn't hit this problem because Cargo invokes rustc appropriately. But yes, "harmless" might be too strong.

@nnethercote
Copy link
Contributor Author

This PR won't land. We should file a new issue for the new proposal. @michaelwoerister, could you do that? (I could do it, but I think you'll get the details more correct than I would.)

@nnethercote nnethercote deleted the no-rlib-bitcode branch November 26, 2019 22:13
@michaelwoerister
Copy link
Member

We should file a new issue for the new proposal.

I'll do that next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants