Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document building Clang with BOLT #65010

Open
zamazan4ik opened this issue Aug 26, 2023 · 6 comments
Open

Document building Clang with BOLT #65010

zamazan4ik opened this issue Aug 26, 2023 · 6 comments
Labels
cmake Build system in general and CMake in particular documentation

Comments

@zamazan4ik
Copy link

There is a great article in the official LLVM documentation "How to build Clang and LLVM with Profile-Guided Optimization".

I suggest adding an additional article (or extending the existing one) with information on how to build Clang and LLVM with LLVM Bolt. Clang already supports building with BOLT with CMake scripts.

We need to add the following information to the documentation:

  • What benefits are expected from BOLT for Clang? Some benchmark numbers would be great (as it's already done for Clang and PGO). These numbers can be found here.
  • How to build Clang with BOLT. A step-by-step guide would be nice to see since it allows users/maintainers to apply BOLT to Clang more easily.

Having this information in the official documentation improves the visibility of the additional way to improve Clang performance with LLVM-native optimization tooling.

@EugeneZelenko EugeneZelenko added documentation cmake Build system in general and CMake in particular and removed new issue labels Aug 26, 2023
@boomanaiden154
Copy link
Contributor

Note that there is documentation here on how to use the CMake caches. It isn't as detailed as the PGO page, but it shows how to build clang/LLVM with Bolt and the CMake invocation using the caches handles most details automatically.

Following those steps should get you a BOLT-optimized clang with not too much work or even a PGO+ThinLTO+BOLT optimized clang.

@zamazan4ik
Copy link
Author

@boomanaiden154 Thank you for the link! I think one thing is missing on the mentioned by you page - the performance benefits from BOLT on Clang. So right now isn't clear why I should try to optimize Clang with BOLT. I think would be great to add some performance numbers directly to this guide (like it's already done for PGO instructions with Clang).

If you haven't actual numbers from your own measurements, I guess they could be taken from these slides. In this case, for the users/maintainers will be more motivation to use during the Clang build process.

@boomanaiden154
Copy link
Contributor

That's a good point. I'd probably prefer to do fresh measurements since they are liable to change and I'd want to get them on the exact configuration under consideration since things like the specific tasks used for performance training, instrumentation vs sampling for profile collection, and specific versions of the code can make fairly big differences.

It shouldn't take too much effort to generate some performance numbers, just a little bit of time to go through all the benchmarking.

@ptr1337
Copy link
Contributor

ptr1337 commented Aug 30, 2023

I have created a while ago some results with a static llvm/clang build here:
https://github.com/ptr1337/llvm-bolt-scripts/blob/master/results.md

They could vary a bit, since they were tested against llvm 15 and with instrumentation.
The performance benefit with LBR should be bigger.

Also a little note:
Bolting clang with a shared build (like archlinux and other distributions does provide it) does not make much sense, since the "libLLVM.so" needs to be optimized in shared builds.
I also talked with aapuov a bit and he said, that optimizing shared builds makes "not much sense" since they don't focus on performance.

@zamazan4ik
Copy link
Author

More results on BOLTing Clang from Android project - link.

@ptr1337
Copy link
Contributor

ptr1337 commented Aug 31, 2023

Here some new results on a ZEN 4 7950X3D:
Clang 16 ThinLTO + PGO:

real    3m1.740s
user    87m25.048s
sys    5m31.901s

Clang 16 ThinLTO + PGO + BOLT

real    3m13.146s
user    92m23.847s
sys    5m37.822s

So ca. 7.5 % improvement.
The time is the build of the projects clang and llvm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake Build system in general and CMake in particular documentation
Projects
None yet
Development

No branches or pull requests

4 participants