[Mar 8th Discussion] Provably Correct Peephole Optimizations with Alive #299

atucker · 2022-03-02T01:43:11Z

atucker
Mar 2, 2022

This is the discussion thread for Provably Correct Peephole Optimizations with Alive
Nuno P. Lopes, David Menendez, Santosh Nagarakatte, and John Regehr. PLDI 2015!

Hosted by Aaron Tucker.

alaiasolkobreslin · 2022-03-08T00:54:00Z

alaiasolkobreslin
Mar 8, 2022

In the "Threats to Validity" section, the authors state that it is possible to have a transformation with a counterexample that is wider than 64 bits, but Alive will verify its correctness because of its bounded verification. I have two questions about this:

Just to clarify, this means that soundness only holds assuming that integers are not wider than 64 bits?
Could improvements to SMT solvers solve this issue, or is LLVM's maximum integer bit width just too large for this to be realistic?

2 replies

atucker Mar 8, 2022
Author

That's a good question, but I'm not totally sure about the answers.

On 1, my understanding is that everything that Alive can verify is formally correct as as the integer isn't wider than 64 bits. The authors also mention that if they formalized different semantics than intended in LLVM they could also make mistakes.

On 2, I don't know enough about SMT solvers to say. It looks like LLVM allows up to 128 bit integers, so there could be integers in LLVM which are too big to verify with Alive. It seems to me that it would work to just not run the optimizations on any 128 bit integers, since that would preserve correctness but just be slower. If 128 bit integers are rare, then it shouldn't slow things down that much as long as a large fraction of the time is spent doing an operation on a 128 bit integer.

I also wonder what the distribution looks like is for the bit width of the shortest counterexample -- intuitively it seems to me that it would be pretty weird for the first counterexample for something to happen at 65 bits, but this also seems checkable.

5hubh4m Mar 9, 2022

I don't understand why we are even checking correctness for arbitrary bit-width types. Why not just 8, 16, 32, 64, 128?

Will that reduce the search space so that maybe it's easier to verify for 128 bit wide ints? I guess I don't understand what is so different about 128-bit wide ints compared to 64 since presumably Z3 should be looking at some general arithmetic properties, right? The only reason 128-bit is slower is if Z3 was verifying by enumerating.

I wonder if anyone has any thoughts on this? Or must this be chalked upto the black magic of Z3.

tonyjie · 2022-03-08T03:36:15Z

tonyjie
Mar 8, 2022

We can use Alive DSL to write peephole optimization pass, and prove it to be correct formally. Then Alive can turn it into C++ code that can be linked into LLVM and used as an LLVM optimization pass.

Here my question is: how is the efficiency comparing "C++ code generated by Alive" and "manually-written and optimized LLVM pass"? Maybe when the optimzation becomes more complex, the C++ code generated by Alive would have a lot more redundant and useless code? In Section 4, the author mentioned that some cleanup work will be done by the subsequent dead-code elimination pass. But it is able to clean all the redundancy effectively? Maybe it would be better to evaluate more on the compilation time (LLVM+Alive vs. LLVM 3.6)

3 replies

atucker Mar 8, 2022
Author

I'm not quite sure if you're talking about the speed of the LLVM compiler or the speed of the compiled code.

In Section 6.4 they mention that LLVM+Alive is 7% faster than LLVM because it only runs ~30% of the InstCombine optimizations. I would be curious about what that looks like if they had compared it to an implementation of InstCombine that only implements the same optimizations. They also mention that the compiled code is 3% slower than LLVM across the SPEC benchmarks. It seems like they have an interesting tradeoff between only doing 30% of the optimizations, and strengthening the postconditions for the optimizations so that the compiler can chain more optimizations together.

tonyjie Mar 8, 2022

Yeah, I also noticed this numbers. I'm just wondering how would the compile time of LLVM+Alive compared to LLVM with the same amount of optimizations. Probably slower (?), because it is only 7% faster than LLVM when it only runs ~30% InstCombine optimizations. But that also depends on if the InstCombine optimizations are costly.

chhzh123 Mar 8, 2022

I think it is reasonable that LLVM+Alive is much slower than the original LLVM since the SMT problem is typically NP-hard and it takes time to solve SMT even nowadays solvers use heuristics, but if they decouple their code into small pieces, the overheads can be amortized. Since the paper only includes limited optimizations, I think the comparison is not that fair, and it is also hard to know whether their tool is truly useful when coupled with more optimization passes in a productive industry compiler.

andrewb1999 · 2022-03-08T03:38:47Z

andrewb1999
Mar 8, 2022

From a philosophical view, this work seems like a perfect application of formal verification to reduce bugs in real-world programs. It will likely always be challenging to create fully formally verified software, but the approach used in this paper of formally verifying common sub-components in a program seems like a good way to introduce some level of formal verification.

This paper also shows the importance of a good DSL to access the tool. Catering this DSL to its potential users within the llvm project probably made it much more likely to be used in real-world code.

I'm confused about how memory is encoded into SMT without the theory of arrays and why this would be faster than using the theory of arrays?

1 reply

atucker Mar 8, 2022
Author

I'm not totally sure that I understood section 3.3.3, but to me it looked like they were saying that many SMT solvers are slower when they have the theory of arrays, especially when they have quantifiers. So in order to try to deal with that they handcoded their store and load statements as if/then/else statements. It makes sense to me that running the SMT solver on the ifs would be faster than doing things 8 bits at a time, but I'm not sure if that's what the array theory does, or if that's a thing that they have to do in both cases. On the other hand, I'm now confused about how they square this with not using branches. Maybe it's just that an if/then/else is reasonably easy to describe as condition ^ then v (not condition) ^ else? In either case, it seems like they did do experiments that compared using their handwritten version to "the theory of arrays", so neither option is prohibitively expensive.

atucker · 2022-03-08T04:37:07Z

atucker
Mar 8, 2022
Author

Overall, I agree with Andrew that I really liked how they picked a narrow enough problem that they could formally verify it, while putting it in a widely used enough tool for it to be useful for people.

In the paper they mentioned trying to rewrite an LLVM pass themselves, as well as reviewing proposed patches for errors. Impressively, it looks like the project is still going, and has even been updated in the last week!!

I think it's worth thinking about factors that helped make this project easier for people to use, and got people motivated enough to keep working on it.

2 replies

orkosinha Mar 8, 2022

From some quick reading on Alive2, it seems like it has different approach to proving compiler correctness. Alive2, compared to Alive(1), performs translation validation. From the Alive paper on translation validation as an approach there's this blurb in Section 7,

"An alternative approach to compiler correctness is translation validation [25, 26, 29] where, for each compilation, it is proved that the optimized code refines the unoptimized code. Many techniques and tools have been developed [10, 31, 32, 34, 38]. Translation validation appears to be a very promising approach, but it suffers from the drawback of requiring proof machinery to execute during every compilation. Our judgment was that the LLVM developers would not tolerate this, so Alive instead aims for once-and-for-all proof of correctness of a limited slice of the compiler."

So, I guess the LLVM developers did end up tolerating it?

susan-garry Mar 8, 2022

I wonder if Alive2 has been as well-received as Alive. A big strength of Alive is that it can be used to verify the correctness of LLVM without actually requiring developers to alter the code of LLVM or compromise on the amount of time it takes to compile. Perhaps the success of Alive would have led to a welcome reception, but I wonder if the benefits have been perceived as outweighing the costs.

JonathanDLTran · 2022-03-08T05:57:17Z

JonathanDLTran
Mar 8, 2022

I found the technique of reducing the verification problem to down to a SMT equation to be quite interesting, and also enjoyed the generality that Alive provides in matching LLVM IR code to optimize.

I'm curious about the efficiency of the Alive verification using the SMT solver. In particular, I saw that in the paper, larger bitwidths can cause the verification time to take several hours. I imagine this verification method also fails to scale well when there are more than a few instructions considered in the peephole optimization. I wonder if this failure to scale can be partially ameliorated by having multiple smaller peephole optimizations, which could speed up verification time.

1 reply

charles-rs Mar 8, 2022

I wonder how easy it would be for a human to do this when possible, and if the only optimizations that can be broken down are "obviously" cases of implementing two kind of separate optimizations on top of each other. If it is possible to break them down in some way, it could be interesting to also stitch them back together, so that if we have a lot of optimizations, the compiler can do them in as few passes as possible. (I don't really know what a reasonable number of passes is, but I'd imagine if it is possible to split up some optimizations into many small ones, it might start to push the limits)

michaelmaitland · 2022-03-08T06:51:59Z

michaelmaitland
Mar 8, 2022

Finding 8 bugs over the course of a research project aimed to reduce bugs sounds like a big win to me. I wonder what they would find if applied it to more parts of llvm. They argue alive is practical. I’d be curious to know if there are lots more than 8 bugs to be fixed. If there’s not many more, how practical is it to use alive? What has been the case since this paper?

2 replies

sampsyo Mar 8, 2022
Maintainer

You might be interested in reading the 2021 paper about Alive2, which has a different focus.

5hubh4m Mar 8, 2022

There is also this effort on formal verifying (a subset of) LLVM https://www.cis.upenn.edu/~stevez/vellvm/

zzzDavid · 2022-03-08T13:46:39Z

zzzDavid
Mar 8, 2022

I find it very interesting that there are three types of undefined behaviours in LLVM. I'm a bit surprised about two things:

Instruction attributes like nsw and nuw makes overflow undefined. I had similar experience recently when I'm working with a compiler that uses LLVM integer arithmetic to mimic fixed-point arithmetics, and when the fixed-point numbers unintentionally overflow, the program exhibits indeterministic behaviour. After reading this paper, I think it might be a "no wrapping" overflow issue.
The other thing I find surprising is how compiler treats undef values during aggressive optimization. From the paper, it seems that the compiler can treat undef values as a set, and choose whichever value in the set for optimization. How is this a "safe" optimization? Would it affect the correctness of the program?

1 reply

anshumanmohan Mar 16, 2022

Re: 2. I think the question is not so much "is this program correct" as it is "is the modified program as correct as the original". Choosing any value from the set is indeed going to feel weird, but IMO the key thing is that it's not a new source of weird behavior. I found Fig 4 useful; it hints that the construction of the "undefined set" has some useful logic behind it. For example, the undefined set in 4(d) is not going to be some massive group of integers, it's going to be {true, false} and this will cause one of the branches of the conditional to be taken. Similarly, 4(c) shows a large set, but the set does have the property of having only odd numbers. Of course we should still not rely on undefined behavior, but it's not as though the value chosen by the compiler will obey no restrictions at all.

gsvic · 2022-03-08T14:45:12Z

gsvic
Mar 8, 2022

I think that I am a little bit concerned (or confused) by the mention that Alive does not support branching. Isn't that very limiting? Furthermore, how could Alive be correlated to formal specification methods/languages like TLA+?

0 replies

andreyyao · 2022-03-08T17:24:02Z

andreyyao
Mar 8, 2022

In the paper it was mentioned that Alive is parametric over types, and so for it to be correct it must be correct for all possible type inferences of a given program. In particular it needs to use an SMT solver to enumerate all these satisfiable type assignments and then check the correctness over all of them. I wonder if there are better ways to accomplish the same goal. For example, maybe we can require the program is equipped with some type annotations? Or maybe something like the ML type inference thingy where the weakest type assignments are selected and also proving that the correctness of programs propogate from weaker type assignments to stronger ones?

0 replies

yy665 · 2022-03-09T05:11:40Z

yy665
Mar 9, 2022

I think this paper shows that combining a DSL and a SMT solver could be super powerful. Having a DSL to describe both source and target semantics gives a clearly defined scope of verifiable optimizations. One thing I really like about this tool is that it's not super complex but it's actually practical in real world cases. Compared to previous work, for example, CompCert. Alive is reasonably light-weighted. My main concern is, there are just so many different abstractions existing in LLVM, and given this fact, how would the DSL have enough expressiveness for abstractions like memory models or uncommon data types?

I feel like the synthesize part is nice addition, but not a necessary/key part of the paper (having a practical optimization properly specified and verified is already very useful), so I am not too concerned with any issues from that part.

0 replies

sampsyo · 2022-03-09T17:50:02Z

sampsyo
Mar 9, 2022
Maintainer

Thanks for the wonderful discussion yesterday---especially to @atucker for leading us!

Here are a few follow-up links & notes based on stuff that came up in the discussion:

If you are (quite reasonably!) skeptical about the need for undefined behavior in general, I thought this recent blog post by Ralf Jung explains the quandary that leads to its necessity quite clearly.
We should all be using the UndefinedBehaviorSanitizer!
On the specific topic of the semantics of undef in LLVM IR, here is some recommended reading: a thrilling mailing list post about an ongoing effort to replace it with a related but distinct construct called poison (from an overlapping set of authors with Alive).
“Beware of bugs in the above code; I have only proved it correct, not tried it.” --Donald Knuth

0 replies

atucker · 2022-03-16T05:36:41Z

atucker
Mar 16, 2022
Author

Thanks everyone for the great discussion in class!

I wrote up a blog post for the paper, with a pull request here, and a reasonably readable version of the page here.

Let me know what you all think!

2 replies

atucker Mar 28, 2022
Author

@sampsyo Do I need to do anything else for this? It looks like the original pull request had a broken front matter, but now that's fixed and the tests are passing.

sampsyo Mar 28, 2022
Maintainer

Yo, @atucker—I have left a few comments on the draft as a PR review! No specific deadline on this, but once you have a chance to do a few revisions, I'll merge your post.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mar 8th Discussion] Provably Correct Peephole Optimizations with Alive #299

{{title}}

Replies: 12 comments 14 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Mar 8th Discussion] Provably Correct Peephole Optimizations with Alive #299

Replies: 12 comments · 14 replies

atucker Mar 8, 2022 Author

atucker Mar 8, 2022 Author

atucker Mar 8, 2022 Author

atucker Mar 8, 2022 Author

sampsyo Mar 8, 2022 Maintainer

sampsyo Mar 9, 2022 Maintainer

atucker Mar 16, 2022 Author

atucker Mar 28, 2022 Author

sampsyo Mar 28, 2022 Maintainer

Replies: 12 comments 14 replies

atucker Mar 8, 2022
Author

atucker Mar 8, 2022
Author

atucker Mar 8, 2022
Author

atucker
Mar 8, 2022
Author

sampsyo Mar 8, 2022
Maintainer

sampsyo
Mar 9, 2022
Maintainer

atucker
Mar 16, 2022
Author

atucker Mar 28, 2022
Author

sampsyo Mar 28, 2022
Maintainer