-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile-Guided Optimization (PGO) benchmark results #118
Comments
Thanks for looking into this. It's interesting that roxmltree had the least improvement from PGO. I guess it's already well optimized then? Suggesting that you can get 5-10% improvement with PGO is probably unnecessary, because how troublesome the setup is. Can the trained PGO data be preserved somehow or it has to be gathered each time? We could include it in the repo if it is possible. |
I would say "is better optimized than other XML libraries" ;) Great job!
I would politely argue with that. How important is "5-10% performance improvement" compared to "enabling PGO build for an application" depends on multiple things: how the application is difficult to build, how much they care about performance, project roadmap, etc. Documenting such an improvement for the library can be important for the library users - if the application spends enough (from the application devs perspective ofc) with parsing XML documents - they can consider enabling PGO if they see benchmark numbers. It's a good thing to have - you don't need to tweak your dependency to improve its performance - you can do it via an additional compiler switch set. PGO enablement troublesome is a good topic to discuss. However, from my experience, the Rust ecosystem has one of the most straightforward (if not the most one!) and easy-to-use way to enable PGO build for an application - via
If you already were running the library under various profilers, then I guess the only working option here will be comparing "PGOed" vs "non-PGOed" assembly for different
Short answer: yes, they can be preserved but with some limitations. Long answer. PGO profiles are compiler-dependent things since PGO profiles have some internal structure. Your PGO profiles can become stale (read it as "non-working for performance optimization") due to multiple reasons: compiler update (with almost 100% you need to regenerate PGO profiles after the compiler update), changing compiler flags like from Instead, in such situations, I suggest you prepare some PGO training scenario (like a bash script, sample XML files - you already have them in the benchmarks) or something like that. It will allow you to not think about such things and simply reuse training scenarios each time when you want e.g. on each build. Nice side-effect - the library users also will be able to use these scripts to generate their PGO profiles for the library if they have a different target workload. However, |
I don't think directly applying PGO to non-binary crates is useful beyond investigations towards optimization as you already discuss. For applications using the library, you will definitely need profiles collected by running that application and not profiles based on one of your dependency's micro-benchmarks. Real applications usually do more than just parse XML and their performance profiles might therefore look remarkably different even in the XML parser code itself, for example due to contention for shared resources like caches. Sadly this insolent complexity of the real-world also makes PGO significantly harder than just applying This is further complicated by a lot of applications like network services or control systems not performing mostly pure computations like parsers, compilers, etc. but reacting to outside events which can be difficult to capture in synthetic benchmarks. |
Actually, I agree with you that making some PGO pre-optimization for the libraries (especially if we are talking about distributed mostly by sources libraries - exactly the Rust ecosystem case) is a bit overhead. I didn't find any library yet that does such things. So documenting/mentioning performance benefits somewhere in the visible to users places can be all that could be done for the library. I have several examples about how it can be done in practice - but they are also all for applications, not for libraries.
It depends. If the library's PGO profile is "real-world enough", actually it's ok to use it as a part of your application even if the application itself doesn't use PGO for optimizing the whole application (there are so many reasons to not enable PGO for an application - many of them you already mentioned). However, I agree that collecting PGO profiles on the application level should be a default strategy. Of course, it isn't required to use the PGO microbenchmarks above for creating library-specific PGO profiles. E.g. you can prepare a dedicated "PGO training scenario" that tries to perform "default" operations with the library - it will work as well.
Yep but how much an application spends its time with parsing XML and how it is important for the application you don't know - only application developers can know it. At least in my experience, I had multiple issues with parsing libraries' performance (JSON, XML). The README file also shows some numbers about performance improvement for the library - so at least somewhere it's quite important ;) So optimizing only a part of the application -
It depends on the application. For some applications you are right - the PGO process can be complicated. For others - yep, it could be really done just via calling
Yep, it can happen in some cases. Good compilers like GCC have
Yep, you are right. For such systems, PGO profiles are collected directly from a "production" environment and then used during the compilation phase. There are multiple issues with such an approach like PGO profile delivery, storing, Instrumentation PGO overhead (that's why Sampling PGO was implemented - to allow collecting PGO profiles from production environments). Companies like Google, Facebook, Datadog (AFAIK) do exactly this thing. However, these details are not so important for |
I will agree with @adamreichold here. PGO is an interesting tool for applications, but not for libraries. And since there is not much we can do to improve Speaking of applications, you can try PGO on resvg. It has ~1600 test files to use for training. |
Hi!
Recently I tested Profile-Guided Optimization (PGO) compiler optimization on different projects in different software domains - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO shows measurable improvements in many cases (especially different parsers - check the repo above for
xml-rs
,quick-xml
and other similar libraries), I decided to perform PGO benchmarks on this library too. Here are my results - I hope they will be helpful for someone.Test environment
roxmltree
version: the latest for now from themaster
branch on commit0a6e7afce2e05d7bc4108c675b3f14f0f2640e45
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with
taskset -c 0 cargo bench
command. The PGO training phase is done withtaskset -c 0 cargo pgo bench
, PGO optimization phase - withtaskset -c 0 cargo pgo optimize bench
.taskset -c 0
is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).Results
I got the following results:
Release:
PGO optimized compared to Release:
(just for reference) PGO instrumentation compared to Release:
According to the results, PGO measurably improves the library's performance in many cases. I expected such or similar results since PGO is especially good for things like parsers.
Further steps
I can suggest the following action points:
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO). However, I recommend starting from the usual PGO since it's a much more stable technology with much fewer limitations.
I would be happy to answer your questions about PGO.
P.S. Please do not treat the issue like a bug or something like that - it's just a benchmark report. Since the "Discussions" functionality is disabled in this repo, I created the Issue instead.
The text was updated successfully, but these errors were encountered: