-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize librustc_driver.so
with BOLT
#116352
Conversation
@@ -904,6 +904,11 @@ impl Step for Rustc { | |||
cargo.arg("-p").arg(krate); | |||
} | |||
|
|||
if compiler.stage == 1 { | |||
// Relocations are required for BOLT to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would've expected this if to gate on the usage of BOLT somehow -- do we have a config flag for that? I think just a generic config flag is fine ("enable-bolt-settings") is ok, we don't need it to actually do optimizations. Maybe even "opt-dist" is the right name for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I have discussed in the PR description and on Zulip. @onur-ozkan has expressed a desire to avoid a BOLT-specific flag, although it seems like the most straightforward solution to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have problem with doing this with using flags/config.toml. My point was more about its behaviour.
Add something like --bolt-flags, which would set the flag only for the cases where it's needed (and thus tying its behavior to opt-dist)
Like here, "which would set the flag only for the cases where it's needed" this seems very limited experience to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the interaction of the flag seems very specific. However, I feel like that's also true in general of several other similar flags. For example, the PGO generate/use flags are also only performed in stage 2, and basically I don't think that they are very useful outside of opt-dist
.
The fact that we pass a specific linker flag is tied to BOLT - that won't really change. The fact that we only pass it to librustc_driver
is tied to the way the compiler is distributed, and unless that changes, it also don't make sense to do it in any other way. And performing BOLT on earlier stage than 2 also doesn't make sense. So from this point of view, I think that the behavior of the flag is not that limited, because it's pretty much the only way we could do it that currently makes sense.
bootstrap and opt-dist
are intrinsically intertwined, for better or worse, so I don't think that this flag will change that or create some unique, "anti-systemic" solution :D
Given the desire to pass it only to one specific invocation of rustc, my inclination is to use an environment variable set by opt-dist and read by the custom rustc wrapper (src/bootstrap/bin/rustc.rs). |
☔ The latest upstream changes (presumably #114623) made this pull request unmergeable. Please resolve the merge conflicts. |
5eb9a9a
to
9a0e90f
Compare
This PR changes |
I added the @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Optimize `librustc_driver.so` with BOLT This PR optimizes `librustc_driver.so` on 64-bit Linux CI with BOLT. ### Code One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to: 1) Only pass it when we actually plan to use BOLT. How to best do that? `config.toml` entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done by `opt-dist`, therefore bootstrap doesn't know about it by default. 2) Only pass it to `librustc_driver.so` (see performance below). Some discussion of this flag already happened on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap/topic/Adding.20a.20one-off.20linker.20flag). ### Performance Latest perf. results can be found [here](rust-lang#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?). Summary: - ✔️ `-1.8%` mean improvement in cycle counts across many primary benchmarks. - ✔️ `-1.8%` mean Max-RSS improvement. - ✖️ 34 MiB (+48%) artifact size regression of `librustc_driver.so`. - This is caused by building `librustc_driver.so` with relocations (which are required for BOLT). Hopefully, it will be [fixed](https://discourse.llvm.org/t/bolt-rfc-a-new-mode-to-rewrite-entire-binary/68674) in the future with BOLT improvements, but now trying to reduce this size increase is [tricky](rust-lang#114649). - Note that the size of this file was recently reduced in rust-lang#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year. - ✖️ 1.4 MiB (+53%) artifact size regression of `rustc`. - This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to `librustc_driver.so` (where they are needed) and for `rustc` (where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstrap `rustc` shim to only apply the relocation flag for the shared library and not for `rustc`. ### CI time CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and `librustc_driver` BOLT profile gathering at the same time (now they are gathered separately for LLVM and `librustc_driver`). r? `@Mark-Simulacrum` Also CC `@onur-ozkan,` primarily for the bootstrap linker flag issue.
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Optimize `librustc_driver.so` with BOLT This PR optimizes `librustc_driver.so` on 64-bit Linux CI with BOLT. ### Code One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to: 1) Only pass it when we actually plan to use BOLT. How to best do that? `config.toml` entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done by `opt-dist`, therefore bootstrap doesn't know about it by default. 2) Only pass it to `librustc_driver.so` (see performance below). Some discussion of this flag already happened on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap/topic/Adding.20a.20one-off.20linker.20flag). ### Performance Latest perf. results can be found [here](rust-lang#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?). Summary: - ✔️ `-1.8%` mean improvement in cycle counts across many primary benchmarks. - ✔️ `-1.8%` mean Max-RSS improvement. - ✖️ 34 MiB (+48%) artifact size regression of `librustc_driver.so`. - This is caused by building `librustc_driver.so` with relocations (which are required for BOLT). Hopefully, it will be [fixed](https://discourse.llvm.org/t/bolt-rfc-a-new-mode-to-rewrite-entire-binary/68674) in the future with BOLT improvements, but now trying to reduce this size increase is [tricky](rust-lang#114649). - Note that the size of this file was recently reduced in rust-lang#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year. - ✖️ 1.4 MiB (+53%) artifact size regression of `rustc`. - This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to `librustc_driver.so` (where they are needed) and for `rustc` (where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstrap `rustc` shim to only apply the relocation flag for the shared library and not for `rustc`. ### CI time CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and `librustc_driver` BOLT profile gathering at the same time (now they are gathered separately for LLVM and `librustc_driver`). r? `@Mark-Simulacrum` Also CC `@onur-ozkan,` primarily for the bootstrap linker flag issue.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c543b6f): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 628.052s -> 623.517s (-0.72%) |
The interesting metrics here are cycle counts (-2% mean improvement amongst primary benchmarks) and Max-RSS (-2% mean improvement amongst primary benchmarks). Bootstrap time has also been reduced by ~5s (usually 2-3s tends to be noise for bootstrap timings, so 5s seems like a statistically significant improvement to me). The instruction count regressions stem happens mostly for "hello-world-like" programs that contain tiny amounts of code. It's basically a fixed startup cost regression that happens because the compiler library is now ~30 MiB larger. The size of @rustbot label: +perf-regression-triaged |
54: Pull upstream master 2023 10 17 r=pietroalbini a=Veykril * rust-lang/rust#116196 * rust-lang/rust#116824 * rust-lang/rust#116822 * rust-lang/rust#116477 * rust-lang/rust#116826 * rust-lang/rust#116820 * rust-lang/rust#116811 * rust-lang/rust#116808 * rust-lang/rust#116805 * rust-lang/rust#116800 * rust-lang/rust#116798 * rust-lang/rust#116754 * rust-lang/rust#114370 * rust-lang/rust#116804 * rust-lang/rust#116802 * rust-lang/rust#116790 * rust-lang/rust#116786 * rust-lang/rust#116709 * rust-lang/rust#116430 * rust-lang/rust#116257 * rust-lang/rust#114157 * rust-lang/rust#116731 * rust-lang/rust#116550 * rust-lang/rust#114330 * rust-lang/rust#116724 * rust-lang/rust#116782 * rust-lang/rust#116776 * rust-lang/rust#115955 * rust-lang/rust#115196 * rust-lang/rust#116775 * rust-lang/rust#114589 * rust-lang/rust#113747 * rust-lang/rust#116772 * rust-lang/rust#116771 * rust-lang/rust#116760 * rust-lang/rust#116755 * rust-lang/rust#116732 * rust-lang/rust#116522 * rust-lang/rust#116341 * rust-lang/rust#116172 * rust-lang/rust#110604 * rust-lang/rust#110729 * rust-lang/rust#116527 * rust-lang/rust#116688 * rust-lang/rust#116757 * rust-lang/rust#116753 * rust-lang/rust#116748 * rust-lang/rust#116741 * rust-lang/rust#116594 * rust-lang/rust#116691 * rust-lang/rust#116643 * rust-lang/rust#116683 * rust-lang/rust#116635 * rust-lang/rust#115515 * rust-lang/rust#116742 * rust-lang/rust#116661 * rust-lang/rust#116576 * rust-lang/rust#116540 * rust-lang/rust#116352 * rust-lang/rust#116737 * rust-lang/rust#116730 * rust-lang/rust#116723 * rust-lang/rust#116715 * rust-lang/rust#116603 * rust-lang/rust#116591 * rust-lang/rust#115439 * rust-lang/rust#116264 * rust-lang/rust#116727 * rust-lang/rust#116704 * rust-lang/rust#116696 * rust-lang/rust#116695 * rust-lang/rust#116644 * rust-lang/rust#116630 * rust-lang/rust#116728 * rust-lang/rust#116689 * rust-lang/rust#116679 * rust-lang/rust#116618 * rust-lang/rust#116577 * rust-lang/rust#115653 * rust-lang/rust#116702 * rust-lang/rust#116015 * rust-lang/rust#115822 * rust-lang/rust#116407 * rust-lang/rust#115719 * rust-lang/rust#115524 * rust-lang/rust#116705 * rust-lang/rust#116645 * rust-lang/rust#116233 * rust-lang/rust#115108 * rust-lang/rust#116670 * rust-lang/rust#116676 * rust-lang/rust#116666 Co-authored-by: Benoît du Garreau <[email protected]> Co-authored-by: Colin Finck <[email protected]> Co-authored-by: Ian Jackson <[email protected]> Co-authored-by: Joshua Liebow-Feeser <[email protected]> Co-authored-by: León Orell Valerian Liehr <[email protected]> Co-authored-by: Trevor Gross <[email protected]> Co-authored-by: Evan Merlock <[email protected]> Co-authored-by: joboet <[email protected]> Co-authored-by: Ralf Jung <[email protected]> Co-authored-by: DaniPopes <[email protected]> Co-authored-by: Mark Rousskov <[email protected]> Co-authored-by: onur-ozkan <[email protected]> Co-authored-by: Nicholas Nethercote <[email protected]> Co-authored-by: The 8472 <[email protected]> Co-authored-by: Samuel Thibault <[email protected]> Co-authored-by: reez12g <[email protected]> Co-authored-by: Jakub Beránek <[email protected]>
if let Some("rustc_driver") = crate_name { | ||
cmd.arg("-Clink-args=-Wl,-q"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added after the debug printing, so the extra flags won't show up when the command is printed. Was that deliberate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not, good catch, thanks! #123189
Log BOLT args in bootstrap `rustc` shim Before, the BOLT argument would not be logged, because it was only added after the logging has happened. Found by `@RalfJung` [here](rust-lang#116352 (comment)).
Rollup merge of rust-lang#123189 - Kobzol:rustc-shim-log, r=onur-ozkan Log BOLT args in bootstrap `rustc` shim Before, the BOLT argument would not be logged, because it was only added after the logging has happened. Found by `@RalfJung` [here](rust-lang#116352 (comment)).
Log BOLT args in bootstrap `rustc` shim Before, the BOLT argument would not be logged, because it was only added after the logging has happened. Found by `@RalfJung` [here](rust-lang/rust#116352 (comment)).
This PR optimizes
librustc_driver.so
on 64-bit Linux CI with BOLT.Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
config.toml
entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done byopt-dist
, therefore bootstrap doesn't know about it by default.librustc_driver.so
(see performance below).Some discussion of this flag already happened on Zulip.
Performance
Latest perf. results can be found here. Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?).
Summary:
-1.8%
mean improvement in cycle counts across many primary benchmarks.-1.8%
mean Max-RSS improvement.librustc_driver.so
.librustc_driver.so
with relocations (which are required for BOLT). Hopefully, it will be fixed in the future with BOLT improvements, but now trying to reduce this size increase is tricky.rustc
with a single CGU on x64 Linux #115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.rustc
.librustc_driver.so
(where they are needed) and forrustc
(where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstraprustc
shim to only apply the relocation flag for the shared library and not forrustc
.CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and
librustc_driver
BOLT profile gathering at the same time (now they are gathered separately for LLVM andlibrustc_driver
).r? @Mark-Simulacrum
Also CC @onur-ozkan, primarily for the bootstrap linker flag issue.