-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustbuild: Compile rustc with ThinLTO #45400
Conversation
@@ -175,6 +175,9 @@ fn main() { | |||
if let Ok(s) = env::var("RUSTC_CODEGEN_UNITS") { | |||
cmd.arg("-C").arg(format!("codegen-units={}", s)); | |||
} | |||
if stage != "0" && env::var("RUSTC_THINLTO").is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this skips stage0 as the current stage0 doesn't have ThinLTO (but that should be coming soon) but despite that the ThinLTO implementation on beta is probably too buggy to use.
cc @michaelwoerister |
src/bootstrap/builder.rs
Outdated
@@ -618,6 +618,10 @@ impl<'a> Builder<'a> { | |||
// FIXME: cargo bench does not accept `--release` | |||
if self.config.rust_optimize && cmd != "bench" { | |||
cargo.arg("--release"); | |||
|
|||
if mode != Mode::Libstd { | |||
cargo.env("RUSTC_THINLTO", "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not want this for std?
This will (afaict) override the codegen-units setting in config.toml unconditionally, and isn't gated either, both of which seem somewhat odd. I think I'm okay with the latter (we expect ThinLTO to work reliably and not lose performance) but I'd prefer that we then remove codegen-units from the configuration.
OTOH, if we are disabling this for std due to performance concerns, then I feel like it should be behind a toggle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment I'm just trying to be conservative, my plan is to, after this lands, enable this for libstd. I just suspect that'll have more bugs/regressions associated with it (for whatever reason) and wanted to get the rustc wins ASAP
Once Travis is fixed, I'd also like to run a perf try on this to make sure it's not a regression in performance (don't want to eat compiler dev cycles for no reason). |
@@ -175,6 +175,9 @@ fn main() { | |||
if let Ok(s) = env::var("RUSTC_CODEGEN_UNITS") { | |||
cmd.arg("-C").arg(format!("codegen-units={}", s)); | |||
} | |||
if stage != "0" && env::var("RUSTC_THINLTO").is_ok() { | |||
cmd.arg("-Ccodegen-units=16").arg("-Zthinlto"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we respect the rust.codegen-units
settings in config.toml
here instead of hard-coding "16"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all intended to be pretty temporary, and for now I'd prefer to just keep this easy to remove later as I don't think anyone's really configuring codegen units anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton Ok, please add a FIXME.
(I did set the CGU to 3 😆)
Awesome |
b00770b
to
115d187
Compare
@bors: try @Mark-Simulacrum should be fixed now I think |
⌛ Trying commit 115d187eaed09cff8a60dacf5694d237f9b8b15f with merge 4306d764ad37fe6df6c145c2790541f26e6134aa... |
☀️ Test successful - status-travis |
Looks good to me. I'll go ahead and run perf once I can and if that doesn't show any major regressions r+. Excited to improve rustc compile times! |
Hm, so this looks like it regresses compiler perf by quite a bit, so I don't think we'd be ready to do this quite yet: http://perf.rust-lang.org/compare.html?start=95272a07f1fe3a82497225a4cd0d67080b8ffebf&end=4306d764ad37fe6df6c145c2790541f26e6134aa&stat=instructions%3Au. Do you think we can do something within this PR to improve that? Otherwise this seems like it'd be a net loss to compiler devs. |
Thanks for collecting the data @Mark-Simulacrum! I do sort of disagree with the conclusion I think though. The best thing to look at is the timing comparison rather than the instruction comparison (aka instructions mostly just show where else to look). Here we still see a "wall of red" but everything is quite modest. It looks like most incremental benchmarks fared particulary bad, but again losing at most hundreds of milliseconds. I fear that if we accept zero runtime regressions that we will never make this change. I consider turning ThinLTO and multiple CGUs on for rustc to be an important step #45320, and if we never do this we'll never actually close #45320 as finished. The data on #45320 shows massive wins across the ecosystem. I think it's inevitable that enabling ThinLTO and multiple CGUs is going to come at a slight performance loss. It's simply architecturally different and less optimized than today's setup. On the other hand, however, I'd like to consider the other extreme for a moment. Let's say that we had LTO enabled unconditionally for all Rust crates at 1.0.0. Clearly the compile times are sub-par and have easy wins (turn LTO off). If you disable LTO, however, you're very likely to regress on almost all benchmarks slightly. Despite this I don't think we'd ever consider enabling LTO by default! What we are doing, however, is effectively enabling "whole crate LTO" by default today. This compilation model is mostly unprecedented in C/C++ and is often why Rust code performs so well despite not paying much attention to it. That we blocked on enabling mulitple CGUs in release mode on ThinLTO was surprising enough to me originally, but if we don't enable multiple CGUs at this juncture I fear that we never will. We should of course be honest in that ThinLTO and multiple CGUs will come at a performance regression, but the "regression" here, in my opinion, is very much in the "lost in the noise" territory. For example all of these 1-5% regressions could indicating timing differences or immediately get swallowed up by other improvements. Furthermore we continue to have an escape hatch to get localized equivalences to today's codegen, I don't think a policy of "zero regressions tolerated" is too helpful in this situation, and in this situation I think not merging this could be actively harmful to compile times in rustc in the long run actually. Here we see some modest regressions which may or may not improve as ThinLTO is refined upstream over time. If this actually affects code in practice there's an escape hatch through With my own personal opinions out of the way, I did indeed dig more into the differences between these two compilers. I chose the syntex_syntax benchmark as it showed the largest cpu-clock regression (~10%), and this benchmark is specifically for a building the syntex_syntax crate in incremental mode where the incremental cache is 100% full and has 100% cache hits (no source changes). I was indeed able to reproduce the regression, where compile times increased from ~13s to ~14s. Taking a look at the profile data from before this change and then after this change I noticed a few things:
So my conclusion here is that the performance difference mostly likely does indeed just come down to inlining decisions. I'm not sure why ThinLTO didn't inline this function whereas one CGU did indeed inline it. I personally think we should move forward regardless with this patch no matter what. I think multiple CGUs has the potential for a huge speedup of rustc compile times (trans/LLVM is an absolutely huge portion of bootstrap, parallelizing anything would be a huge boon). Despite this, though, I'm curious to evaluate the impact of |
115d187
to
2997c03
Compare
@bors: try |
⌛ Trying commit 2997c03c831acbbb94c2943fd70e1e33c89e0aaa with merge 57a4ae1e8edd91e10f5d7865423b81f79e22adb6... |
2997c03
to
8327f1c
Compare
This commit enables ThinLTO for the compiler as well as multiple codegen units. This is intended to get the benefits of parallel codegen while also avoiding any major loss of perf. Finally this commit is also intended as further testing for rust-lang#45320 and shaking out bugs.
8327f1c
to
d01427f
Compare
@bors: try |
rustbuild: Compile rustc with ThinLTO This commit enables ThinLTO for the compiler as well as multiple codegen units. This is intended to get the benefits of parallel codegen while also avoiding any major loss of perf. Finally this commit is also intended as further testing for #45320 and shaking out bugs.
☀️ Test successful - status-travis |
Ok the new comparison I believe is less worse than the old comparison, and sure enough the before/after of the |
I'm happy with your explanation, thanks! Seems like this overall should be an improvement in compile times for the rustc itself, and only a marginal loss for the ecosystem, so I'm happy with that. |
📌 Commit d01427f has been approved by |
I guess it's okay for development builds, but it looks like this affects compilers released on stable channel as well. |
We can disable the ThinLTO on dist builds -- but then again, I think the argument @alexcrichton makes is that we don't enable LTO either (which would net us perf wins, I'd imagine). I'm okay with disabling ThinLTO on dist builds, but I do think it should be by default enabled for developers working on rustc, since we expect it to improve compile times of rustc. In the long run, I expect (perhaps wrongly) that ThinLTO + codgen-units will eventually be equivalent to a non-parallel optimizer, and perhaps even slightly better. Since next beta will dist without this though I think we're okay landing this for now and we can iterate: I don't think it's worth not landing this, but if you or others disagree, feel free to r-. |
For stable releases it would be nice to enable it as well. |
rustbuild: Compile rustc with ThinLTO This commit enables ThinLTO for the compiler as well as multiple codegen units. This is intended to get the benefits of parallel codegen while also avoiding any major loss of perf. Finally this commit is also intended as further testing for #45320 and shaking out bugs.
☀️ Test successful - status-appveyor, status-travis |
So here are some thoughts from me:
Overall I'm in favor of keeping this enabled for nightly builds. |
@petrochenkov sorry didn't get a chance to comment on this before it merged, will continue on #45444 |
This commit enables ThinLTO for the compiler as well as multiple codegen units.
This is intended to get the benefits of parallel codegen while also avoiding
any major loss of perf. Finally this commit is also intended as further testing
for #45320 and shaking out bugs.