-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monolithic lto link time very slow when building with several codegen units #48025
Comments
Or maybe just don't use multiple codegen units by default when monolithic LTO is enabled. |
I think the optimal "balance" for linktime/buildtime with monolithic lto would be to
However this will probably introduce indeterministic builds depending on the number of threads a host machine has. So a single core machine would optimize better than a 8 core machine when building the same crate with monolithic lto. :/ |
I made some benchmarks (cpu: i5-2540M dual core), rustc 1.25.0-nightly (b8398d9 2018-02-11) [profile.release]
lto = true
opt-level = 3
codegen-units = x
|
I believe this is fixed in #48163 |
er, didn't mean to close yet |
This commit updates our Fat LTO logic to tweak our custom wrapper around LLVM's "link modules" functionality. Previously whenever the `LLVMRustLinkInExternalBitcode` function was called it would call LLVM's `Linker::linkModules` wrapper. Internally this would crate an instance of a `Linker` which internally creates an instance of an `IRMover`. Unfortunately for us the creation of `IRMover` is somewhat O(n) with the input module. This means that every time we linked a module it was O(n) with respect to the entire module we had built up! Now the modules we build up during LTO are quite large, so this quickly started creating an O(n^2) problem for us! Discovered in rust-lang#48025 it turns out this has always been a problem and we just haven't noticed it. It became particularly worse recently though due to most libraries having 16x more object files than they previously did (1 -> 16). This commit fixes this performance issue by preserving the `Linker` instance across all links into the main LLVM module. This means we only create one `IRMover` and allows LTO to progress much speedier. From the `cargo-cache` project in rust-lang#48025 a **full build** locally when from 5m15s to 2m24s. Looking at the timing logs each object file was linked in in single-digit millisecond rather than hundreds, clearly being a nice improvement! Closes rust-lang#48025
…kruppe rustc: Persist LLVM's `Linker` in Fat LTO This commit updates our Fat LTO logic to tweak our custom wrapper around LLVM's "link modules" functionality. Previously whenever the `LLVMRustLinkInExternalBitcode` function was called it would call LLVM's `Linker::linkModules` wrapper. Internally this would crate an instance of a `Linker` which internally creates an instance of an `IRMover`. Unfortunately for us the creation of `IRMover` is somewhat O(n) with the input module. This means that every time we linked a module it was O(n) with respect to the entire module we had built up! Now the modules we build up during LTO are quite large, so this quickly started creating an O(n^2) problem for us! Discovered in rust-lang#48025 it turns out this has always been a problem and we just haven't noticed it. It became particularly worse recently though due to most libraries having 16x more object files than they previously did (1 -> 16). This commit fixes this performance issue by preserving the `Linker` instance across all links into the main LLVM module. This means we only create one `IRMover` and allows LTO to progress much speedier. From the `cargo-cache` project in rust-lang#48025 a **full build** locally went from 5m15s to 2m24s. Looking at the timing logs each object file was linked in in single-digit millisecond rather than hundreds, clearly being a nice improvement! Closes rust-lang#48025
…kruppe rustc: Persist LLVM's `Linker` in Fat LTO This commit updates our Fat LTO logic to tweak our custom wrapper around LLVM's "link modules" functionality. Previously whenever the `LLVMRustLinkInExternalBitcode` function was called it would call LLVM's `Linker::linkModules` wrapper. Internally this would crate an instance of a `Linker` which internally creates an instance of an `IRMover`. Unfortunately for us the creation of `IRMover` is somewhat O(n) with the input module. This means that every time we linked a module it was O(n) with respect to the entire module we had built up! Now the modules we build up during LTO are quite large, so this quickly started creating an O(n^2) problem for us! Discovered in rust-lang#48025 it turns out this has always been a problem and we just haven't noticed it. It became particularly worse recently though due to most libraries having 16x more object files than they previously did (1 -> 16). This commit fixes this performance issue by preserving the `Linker` instance across all links into the main LLVM module. This means we only create one `IRMover` and allows LTO to progress much speedier. From the `cargo-cache` project in rust-lang#48025 a **full build** locally went from 5m15s to 2m24s. Looking at the timing logs each object file was linked in in single-digit millisecond rather than hundreds, clearly being a nice improvement! Closes rust-lang#48025
I noticed my project https://github.com/matthiaskrgr/cargo-cache took noticeably longer to build with "lto=true" at some point.
I benchmarked link time (touch src/main.rs; cargo build --release) which appear to blow up as soon as we have several codegen units.
link times:
So it seem codegen-units=1 makes for bad parallelism while building but speeds up lto because less translation units need to be merged.
Maybe it is possible to dynamically use as little codegen units as possible (but still enough to utilize all cores) when building? //cc #46910
The text was updated successfully, but these errors were encountered: