Compile time regression introduced in r322401 #37436

alexcrichton · 2018-07-07T01:31:27Z


Bugzilla Link	38088
Version	trunk
OS	All
CC	@aelovikov-intel,@bmrzycki,@dwblaikie,@orivej,@sebpop

Extended Description

We're currently attempting to upgrade LLVM in the Rust project 1 but we're unfortunately losing performance from this upgrade on a number of our benchmarks 2. I've been doing some bisection and it's somewhat difficult to narrow down on any one regression because the regression to track down is smaller and smaller.

In any case though I've tracked down one regression to r322401 or https://reviews.llvm.org/D40146. The IR file 3 takes awhile to pass through the opt tool, but before this commit it took 16.8s locally and afterwards it takes 17.9.

I haven't dove too much further into the patch, but let me know if I can help out!

The text was updated successfully, but these errors were encountered:

sebpop · 2018-07-08T02:34:41Z

Thanks for reporting this issue.

Is this performance degradation due to a slower compiler or due to the quality of code generated by llvm?
In other terms, is rust using llvm during the execution of the rust programs, or is rust using llvm ahead of time?

alexcrichton · 2018-07-08T04:47:55Z

Ah yes, a good clarification to make! The site at perf.rust-lang.org is tracking the compile time of various benchmarks. That means it's testing how long it takes rustc to compile various crates. This execution time, however, is reflective of both LLVM's generated code as well as LLVM's execution time. The Rust compiler itself is compiled by LLVM, so some of the runtime is code generated by LLVM's master branch (the Rust compiler pieces) and some of the runtime is LLVM itself (optimizations and whatnot).

In this particular case though I believe the regression is specifically in LLVM's execution time, not in the generated code. In the timings that I was looking at the timings of the Rust-specific pieces were comparable both before and after (aka testing LLVM's code generation). The variable times were the amount of time it took the optimization passes in LLVM.

bmrzycki · 2018-07-09T15:50:19Z

Hello Alex, thank you for the detailed report.

The commit you referenced adds a feature where a type of analysis (Dominators) are now kept up-to-date across one of the larger passes within LLVM. Due to the design of LLVM this analysis information is kept separate from the actual IR and it means when changes to one happen, they also have to be made in the other.

The initial application of this patch was much worse and we were forced to batch up the Dominator updates to minimize the performance impact. I am a bit surprised by your loss of around 6.5% percent though. It may be that rust emits IR that is highly branchy and requires much more iterations inside JumpThreading to stabilize than code emitted by clang. In my testing of the batched mode I saw most compile-sensitive tests around the 1% mark with 3 outliers:
https://reviews.llvm.org/feed/6519861976962257235/

There is work ongoing to revise this interface again:
https://reviews.llvm.org/D48967

It is unclear if this patch will improve performance, or possibly degrade further since it's attempting to preserve the analysis of two data structures instead of just one (PDT and DT).

With all of this said, have you tried the latest tip of the tree? There is a new patch in JumpThreading that improves its performance and may help offset some of the loss you're seeing in keeping this analysis up-to-date. The patch is the following:

https://reviews.llvm.org/D44282

alexcrichton · 2018-07-09T16:03:03Z

Ok thanks for the information! Unfortunately though the patch you mentioned was actually a source of an even bigger regression 1 in our testing which I reported as bug 38047 and was reverted in r336393.

I'd definitely believe though that rustc's LLVM IR generation is subpar in various areas, and this is also just one particular benchmark in rustc rather than a massive regression across everything, so it's certainly not the end of the world.

Thanks for taking the time to review!

bmrzycki · 2018-07-09T17:56:28Z

Ok thanks for the information! Unfortunately though the patch you mentioned
was actually a source of an even bigger regression [1] in our testing which
I reported as bug 38047 and was reverted in r336393.
Interesting, I had not seen that bug. Thanks for pointing it out and I'll watch it too.

It's possible you're seeing the regression here due to the change in how the code is generated. It added a synchronization (flush()) to every call where JumpThreading rewires the edges of the IR, and is similar to the immediate update method I originally abandoned. See here:

https://reviews.llvm.org/D44282?id=138097#inline-392430

I was hoping with the change that the performance offset of saving time updating the IR directly would result in a net win.

I'd definitely believe though that rustc's LLVM IR generation is subpar in
various areas, and this is also just one particular benchmark in rustc
rather than a massive regression across everything, so it's certainly not
the end of the world.
Yes, there are some input IR cases that unfortunately take more time and I doubt they can be easily refactored. The way JumpThreading works is the following:

for Block in Function:
do
StateChanged = Process(BB)
while not StateChanged

The Process(BB) is a complex function that attempts several changes and transforms to the IR, especially incoming and outgoing edges. Every change to these need to be tracked and synchronized.

The only way I can see this becoming more efficient is to overhaul Process(BB) to be more intelligent of the opportunities it attacks and to not just keep trying until everything stabilizes.

That foo.ll file is huuuuge at 60MB! Keep in mind that JumpThreading is a function-level scalar transform. The more basic blocks you have in a single function the more JumpThreading has to work. One way you can reduce this from the code generated out of Rust is to either reduce the number of basicblocks or try to strike a balance between calling external functions or forcing all the code inline.

This code reminds me quite a bit of the amalgamated SQlite source and that too performed worse with the change for adding Dominator preservation.

alexcrichton · 2018-07-09T18:13:04Z

Ah yeah and sorry I don't mind this as like "hair on fire" bug at all! I figured it would basically be good hygiene to report this and make sure there was no obvious mistake (which it sounds like there definitely isn't). In that sense I think we're likely to merge our LLVM upgrade in Rust with this hit to perf on a few benchmarks.

FWIW the huge IR file I didn't try to reduce at all, it was basically just the first one that showed a perceptible-by-me difference in performance that I could bisect on. AFAIK we haven't spent a ton of time in Rust improving our LLVM IR generation that we hand to the backend, but that's definitely a possible optimization for us if it comes to it!

Y'all can definitely feel free to do what you'd like with this bug, if you'd prefer to close it as "working as intended for now" that's totally fine!

bmrzycki · 2018-07-09T18:49:24Z

Ah yeah and sorry I don't mind this as like "hair on fire" bug at all! I
figured it would basically be good hygiene to report this and make sure
there was no obvious mistake (which it sounds like there definitely isn't).
In that sense I think we're likely to merge our LLVM upgrade in Rust with
this hit to perf on a few benchmarks.
I definitely appreciate the bug report, thank you for taking the time to report it. At the very least it helps gauge how common large functions are in the real world. Dominator analysis is valuable in being able to analyze further run-time enhancements to JumpThreading and was the original reason for adding the code.

FWIW the huge IR file I didn't try to reduce at all, it was basically just
the first one that showed a perceptible-by-me difference in performance that
I could bisect on. AFAIK we haven't spent a ton of time in Rust improving
our LLVM IR generation that we hand to the backend, but that's definitely a
possible optimization for us if it comes to it!
No worries. :) I just saw the size and it didn't surprise me that JumpThreading slowed down on such a large file. the larger the function, the greater the tendency for JumpThreading to "bounce" between a few intermediate states adding, and then subsequently removing, a potential IR change to enhance the chance of finding a JumpThreading opportunity. It's not the most elegant pass.

Any attempt by Rust to reduce the number of blocks emitted in a single function will help reduce the JumpThreading processing time.

There are also a few tuning parameters to JumpThreading you can play with to see if helps strke a compromise between quality of emitted code and the time it takes to compile:

BBDuplicateThreshold("jump-threading-threshold",
cl::desc("Max block size to duplicate for jump threading"),
cl::init(6), cl::Hidden);

static cl::opt
ImplicationSearchThreshold(
"jump-threading-implication-search-threshold",
cl::desc("The number of predecessors to search for a stronger "
"condition to use to thread over a weaker condition"),
cl::init(3), cl::Hidden);

Y'all can definitely feel free to do what you'd like with this bug, if you'd
prefer to close it as "working as intended for now" that's totally fine!
This ticket is similar to bug 37929. I plan on leaving these open to see if others with more clout in the community decide the performance loss is more important than the added functionality of having Dominance analysis within JumpThreading.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

arsenm added the slow-compile label Aug 13, 2023

Endilll added the llvm:optimizations label Jul 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile time regression introduced in r322401 #37436

Compile time regression introduced in r322401 #37436

alexcrichton commented Jul 7, 2018

sebpop mannequin commented Jul 8, 2018

alexcrichton commented Jul 8, 2018

bmrzycki commented Jul 9, 2018

alexcrichton commented Jul 9, 2018

bmrzycki commented Jul 9, 2018

alexcrichton commented Jul 9, 2018

bmrzycki commented Jul 9, 2018

Compile time regression introduced in r322401 #37436

Compile time regression introduced in r322401 #37436

Comments

alexcrichton commented Jul 7, 2018

Extended Description

sebpop mannequin commented Jul 8, 2018

alexcrichton commented Jul 8, 2018

bmrzycki commented Jul 9, 2018

alexcrichton commented Jul 9, 2018

bmrzycki commented Jul 9, 2018

alexcrichton commented Jul 9, 2018

bmrzycki commented Jul 9, 2018