-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile time+memory regression between 1.49.0 and 1.50.0 #84873
Comments
@olix0r can you paste the output of |
If you can come up with a smaller example than "linkerd" that would also be helpful, but it will be more difficult than just running time-passes. |
time-passes
|
We're going to try to see if we can avoid this with boxing, which may help us identify a smaller repro, but this may take some time... |
Wow, that is a lot of memory in LLVM and a lot of time in rust/compiler/rustc_mir/src/monomorphize/partitioning/mod.rs Lines 350 to 362 in 716394d
Not sure who to ask about that - maybe @wesleywiser has ideas what's going on? |
Maybe a bisection could shed some light onto the regression causing change? https://github.com/rust-lang/cargo-bisect-rustc/blob/master/TUTORIAL.md |
I was able to identify the nightly date that goes bad:
My system is having trouble running the git-based bisect, but I'll report back when that completes... For reference, I created a cgroup with limited memory so I could make the build to fail without waiting ~40 minutes (RSS is reliably under 2GB for good versions). |
Commit range is eb4fc71...f745834 |
searched nightlies: from nightly-2020-12-18 to nightly-2020-12-19 bisected with cargo-bisect-rustc v0.6.0Host triple: x86_64-unknown-linux-gnu cargo bisect-rustc --start=2020-12-18 --end=2020-12-19 --preserve --with-src |
Oh interesting, this is related to symbol names like you suspected: #80122. |
#80122 reverted part of #76030 but that had only landed a few months prior. Does this also fail to build prior to say 1.47 before the original PR was merged? Edit: 1.47 => killed after a few minutes @ 20gb of memory usage |
FWIW, on 1.46.0, we hit a type length limit after 20 minutes, though the heap only makes it to around 600MB.
|
In v0 mangling, the longest mangled symbol has 11981 bytes and in non-verbose mode of c++filt demangles to 1111905465 bytes, which should give an idea about length of type names that are being fed into LLVM ( This particular symbols contains a number of BoxFuture types inside, so if possible additionally erasing the type with dyn might be helpful (it seems like |
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group. @rustbot label -I-prioritize +P-high |
Version 1.50 of the Rust compiler introduced a regression (rust-lang/rust#84873) that results in the compiler using extremely large amounts of memory (and eventually getting OOM killed) when compiling code involving very large nested types. This regression is triggered by a number of future types in the proxy. This branch introduces several `BoxService` layers, primarily around `Switch` layers, to reduce the size of future types, so that the proxy can successfully be compiled on Rust 1.50+. This branch does *not* update the toolchain version. The Rust 1.51 toolchain also introduces some new clippy lints, which trigger on some code patterns that are very common in the current proxy. Therefore, the diff from a compiler update will be much larger than just adding additional boxing. I'll land the compiler update in a separate branch after this merges.
Version 1.50 of the Rust compiler introduced a regression (rust-lang/rust#84873) that results in the compiler using extremely large amounts of memory (and eventually getting OOM killed) when compiling code involving very large nested types. This regression is triggered by a number of future types in the proxy. This change adds several `BoxService` layers--primarily within `switch` layers--to reduce the size of future types so that the proxy can successfully be compiled on Rust 1.50+. This change does not update the Rust toolchain version. Co-authored-by: Oliver Gould <[email protected]>
I reported a bug against tokio-rs/tracing, which turned out to be because of this bug. My reproducing repo could maybe be helpful as a minimal reproducing example? |
…jackh726 Pretty print generator witness only in `-Zverbose` mode In release build of deeply-nested-async benchmark the size of `no-opt.bc` file is reduced from 46MB to 62kB. Helps with rust-lang#84873, where in one of reported test cases the size of `no-opt.bc` file is reduced from 2.3GB to 799kB.
I got curious and tested my reproducing repo on nightly, and compile times turned normal again between June 12 and 13. So #86240 seems to have fixed it for me. |
This looks promising for us! We ended up adding cargo 1.54.0 (5ae8d74b3 2021-06-22)
cargo 1.56.0-nightly (cc17afbb0 2021-08-02)
Nightly uses less than half of the RSS of 1.54.0 and we shave 25% off of the compile time as well! |
I'm happy to hear progress has been observed here. I think the main task that remains is for someone on our team to go back and check how the newer versions of the compiler behave on the old versions of linkerd, prior to when they put all the extra boxes in to work around this issue. |
It seems the progress has been for the other reproduction thanks to #86240, and by the use of boxing in linkerd. The original issue looks to be still present when checking the old version of
However, the good news is that using v0 mangling on 1.65/nightly brings things back to 1.49 levels: around 2m15s wall-time, 1.35GB max-rss. |
Our compile times and memory footprint regressed substantially between Rust 1.49.0 and 1.50.0.
This regression is a hard blocker for us to be able to upgrade our project from 1.49.0. As this is a severe issue for us, please let us know if there's anything we can do to provide more diagnostics, test changes, etc.
This may be related to other recent issues that describe similar behavior:
Code
This regression is observed on recent versions of the Linkerd proxy. It's known that the proxy can manifest large type signatures, so builds disable debug symbols by default
This regression is not obvious with other, smaller projects that we maintain, so I'm unable to provide a smaller repro.
Version it worked on
With Rust 1.49.0, the binary compiles in a little over two minutes, using a little over 1GB of memory:
cargo clean && /usr/bin/time -v cargo +1.49.0 build -p linkerd2-proxy
Version with regression
Using Rust 1.50.0, rustc runs for about 40 minutes before exhausting the system's memory:
cargo clean && /usr/bin/time -v cargo +1.50.0 build -p linkerd2-proxy
rustc --version --verbose
:We see similar behavior with more recent versions of Rust as well, including 1.51.0 and nightly (
cargo 1.53.0-nightly (0ed318d18 2021-04-23)
) as well.cc @hawkw, who can provide some more details. I believe we've tested with
lto=off
without any changes in behavior.@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged
The text was updated successfully, but these errors were encountered: