Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #103

Open
zamazan4ik opened this issue Oct 7, 2023 · 6 comments
Open

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #103

zamazan4ik opened this issue Oct 7, 2023 · 6 comments

Comments

@zamazan4ik
Copy link

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects (including interpreters and compilers like Rustc, Clang, GCC, CPython, and many others) - the results are available here. So that's why I think it's worth trying to apply PGO to frawk.

I can suggest the following things to do:

  • Evaluate PGO's applicability to frawk.
  • If PGO helps to achieve better performance - add a note to frawk's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for frawk.
  • Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.

Here are some examples of how PGO is already integrated into other projects' build scripts:

After PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO.

For the Rust projects, I recommend starting with cargo-pgo for doing PGO optimizations.

@zamazan4ik
Copy link
Author

zamazan4ik commented Oct 7, 2023

I did some quick benchmarks with PGO.

Test environment

  • Fedora 38
  • Linux kernel 6.5.5
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.73
  • frawk version: the latest for now from the master branch on commit 3288069a9de60ddc65b6e6a57278c6b563ebae72
  • Other details: Disabled Turbo boost

Benchmark setup

For the simple benchmark purposes I use yes | head -n 100000000 | ./frawk_release '{ print \$0 }' > /dev/null command from #98 issue . For PGO profile collection the same arguments and test file were used. Release version is built with cargo build --release, PGO optimization is done with cargo-pgo. All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc).

For all builds, I disabled llvm_backend and unstable features from the defaults. So during the tests, I use Frawk with Cranelift backend and built with the stable Rust version.

Results

I got the following results:

hyperfine --warmup 3 --min-runs 10 "yes | head -n 100000000 | ./frawk_release '{ print \$0 }' > /dev/null" "yes | head -n 100000000 | ./frawk_pgo_optimized '{ print \$0 }' > /dev/null"
Benchmark 1: yes | head -n 100000000 | ./frawk_release '{ print $0 }' > /dev/null
  Time (mean ± σ):      7.112 s ±  0.048 s    [User: 7.188 s, System: 0.342 s]
  Range (min … max):    7.039 s …  7.168 s    10 runs

Benchmark 2: yes | head -n 100000000 | ./frawk_pgo_optimized '{ print $0 }' > /dev/null
  Time (mean ± σ):      4.970 s ±  0.019 s    [User: 5.055 s, System: 0.322 s]
  Range (min … max):    4.946 s …  5.006 s    10 runs

Summary
  yes | head -n 100000000 | ./frawk_pgo_optimized '{ print $0 }' > /dev/null ran
    1.43 ± 0.01 times faster than yes | head -n 100000000 | ./frawk_release '{ print $0 }' > /dev/null

Just for reference, frawk in the Instrumentation mode (during the PGO profile collection) has the following results (in time format):

time yes | head -n 100000000 | frawk_instrumented '{ print $0 }' > /dev/null
yes  0,01s user 0,07s system 0% cpu 9,657 total
head -n 100000000  0,10s user 0,13s system 2% cpu 9,657 total
frawk_instrumented '{ print $0 }' >   9,59s user 0,10s system 100% cpu 9,666 total

At least according to the simple benchmark above, PGO has a measurable positive effect on frawk performance. I think more robust PGO tests should be performed at least according to the https://github.com/ezrosent/frawk/blob/master/info/performance.md article.

@ezrosent
Copy link
Owner

ezrosent commented Oct 8, 2023

Those results look very promising! I'm curious though, is there a standard way to deploy PGO-optimized binaries ? e.g. suppose we had a representative invocation we can use for the instrumentation / profiling process, it seems like it would be hard to deploy this in a portable way (i.e. in a way that users could just cargo build and then get the benefits of PGO).

@zamazan4ik
Copy link
Author

is there a standard way to deploy PGO-optimized binaries

Good question! As far as I know, there is no standard way to deploy PGO-optimized binaries in the Rust ecosystem. Usually, building a program with PGO is done via project-specific build scripts (e.g. Rustc project uses PGO in this way). So if you can prebuild binary with PGO - you do all PGO-related stuff in your CI setup, and the end-user doesn't care about building with PGO alone.

If you want to give a user an easy way to build a program with PGO with cargo build, I think the best available option here is using build.rs way. Or write a corresponding Makefile with all steps for PGO (e.g. building frawk with cargo-pgo) as a part of a possible release_with_pgo command in this Makefile.

@ezrosent
Copy link
Owner

Gotcha. I'll try and keep this in mind next time I do benchmarking for frawk. I'm not seeing a good way to get build.rs to work without requiring users to also install cargo-pgo manually (sim. for a makeful); but that may be alright if the error messages are good enough.

Either way; this seems super useful. Thanks again for the report!

@zamazan4ik
Copy link
Author

I'm not seeing a good way to get build.rs to work without requiring users to also install cargo-pgo manually (sim. for a makeful); but that may be alright if the error messages are good enough.

There is a way to implement PGO optimization without cargo-pgo usage - you just need to implement the same logic as cargo-pgo has inside build.rs script. cargo-pgo does not involve any kind of magic - it simply uses this https://doc.rust-lang.org/rustc/profile-guided-optimization.html . You can manually pass all required compiler flags directly to the compiler even without cargo-pgo.

Either way; this seems super useful. Thanks again for the report!

You are welcome!

@ghuls
Copy link
Contributor

ghuls commented Nov 24, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants