-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate using additional optimizations like LTO and PGO #22
Comments
Thanks for this, it is very interesting. I will try to replicate these results myself and will definitely consider including this in release binaries if it makes a noticeable improvement. |
Working on this in the |
I have implemented this in
|
Excuse me for the so late response. Thank you a lot for implementing it in the scripts! Yep, that's what exactly I was thinking about. I have several small and neat pieces of advice:
I guess just a hardware difference. Anyway, we still have nice user time improvement (since system time cannot be improved with PGO). |
Thanks for the response!
|
Oh, I see. No worries - in your case you should be able to ignore it. This recommendation is true for larger applications. For example, we want to optimize some large applications, like a database that internally has a lot of different subsystems. When we are waiting for a minute, it increases the chances that all (or almost all) subsystems will be executed at least once during the workload (like running database benchmarks).
Yep, sounds good |
There have been some substantial improvements to the performance of tex-fmt over the last few releases, and I'm no longer seeing any advantages when using PGO. As such, I'm going to close this issue for now. Thank you very much for your help and for discussing this; I am more than happy to reopen the issue in the future if necessary. |
Hi!
As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the application's performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other apps, I decided to apply it to the project to see if the performance win can be achieved. Here are my benchmark results.
This information can be interesting for anyone who wants to achieve more performance with the library in their use cases.
Test environment
tex-fmt
version:main
branch on commitf2689ac7e2c713cfb6106220c09a44141770a638
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. For all measurements I used the same command but with different binaries -
taskset -c 0 tex_fmt tests/source/* tests/target/*
.taskset -c 0
is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).Results
I got the following results in
hyperfine
's format:where (with binary size information - it's important for some cases too):
tex_fmt_release
- default Release profile, 2.6 Mibtex_fmt_lto
- default Release profile + LTO, 2.4 Mibtex_fmt_optimized
- default Release profile + LTO + PGO optimized, 2.4 Mibtex_fmt_instrumented
- default Release profile + LTO + PGO instrumented, 4.5 MibAccording to the results, LTO and PGO measurably improve the application's performance.
Further steps
As a first easy step, I suggest enabling LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project since LTO consumes an additional amount of time to finish the compilation routine. If you think that a regular Release build should not be affected by such a change as well, then I suggest adding an additional
release-lto
profile where additionally to regularrelease
optimizations LTO also will be added. Such a change simplifies life for maintainers and others interested in the project persons who want to build the most performant version of the application. Using ThinLTO also should help).Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT to
tex-fmt
.Thank you.
P.S. It's just an idea, not an actual issue. Possibly, Ideas in GitHub's Discussions is a better place to discuss such proposals.
The text was updated successfully, but these errors were encountered: