Profile-Guided Optimization (PGO) benchmark report #152

zamazan4ik · 2024-09-17T18:26:24Z

Hi!

As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many libraries, I decided to apply it to rust-lexical to see if the performance win (or lose) can be achieved. Here are my benchmark results.

This information can be interesting for anyone who wants to achieve more performance with the library in their use cases.

Test environment

Fedora 40
Linux kernel 6.10.9
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.81.0
rust-lexical version: main branch on commit aeab32205c18c4569402f83817eeb9fb983ee64b
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench results I got with taskset -c 0 cargo bench command. The PGO training phase is done with taskset -c 0 cargo pgo bench, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench.

taskset -c 0 is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

However, I don't know how the project benchmarks is good for now due to #148 - I don't have enough knowledge about it (yet).

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/56f7aaf405116f136acbfc5744c3622f
PGO optimized compared to Release: https://gist.github.com/zamazan4ik/d043e078410e748d97a582b7cb3abc01
(just for reference) PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/acc060af1465000e5aa302214ceb9957

According to the results, we see improvements in many cases. However, some cases are also regressed. This can come from conflicting workloads in the bench quite where optimization in one place pessimizes other places - it sometimes happens with PGO. This information is still useful if users have some specific workloads for rust-lexical and don't have such conflicts.

Further steps

At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about rust-lexical performance in their workloads. Maybe a small note somewhere in the documentation (the README file?) will be enough to raise awareness about this work. Another way - try to figure out the root cause of performance differences between PGO and non-PGO library versions, and, probably, try to tweak the library sources a bit more - however this way also requires some time to analyze the resulting LLVM IR/assembly differences between them.

Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT to applications with apps that use rust-lexical. However, it's a much less mature optimization technique compared to PGO.

Thank you.

P.S. It's just a benchmark report, not a bug. Probably Discussions is a better place to put such things but they are disabled for the repo for now.

The text was updated successfully, but these errors were encountered:

Alexhuszagh · 2024-09-17T18:32:28Z

This is wonderful, I'll make this a priority and take a careful look at this.

Alexhuszagh added this to the 1.1 milestone Sep 17, 2024

Alexhuszagh added enhancement New feature or request good first issue Good for newcomers labels Sep 17, 2024

Alexhuszagh self-assigned this Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) benchmark report #152

Profile-Guided Optimization (PGO) benchmark report #152

zamazan4ik commented Sep 17, 2024

Alexhuszagh commented Sep 17, 2024

Profile-Guided Optimization (PGO) benchmark report #152

Profile-Guided Optimization (PGO) benchmark report #152

Comments

zamazan4ik commented Sep 17, 2024

Test environment

Benchmark

Results

Further steps

Alexhuszagh commented Sep 17, 2024