Profile-Guided Optimization (PGO) benchmarks #117

zamazan4ik · 2024-02-10T00:31:49Z

Hi!

I tried to apply Profile-Guided Optimization (PGO) to optimize llrt performance further (as I already did for many other projects - see all current results here). I performed some basic benchmarks and want to share the results here.

Test environment

Fedora 39
Linux kernel 6.7.3
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.76
llrt version: the latest for now from the main branch on commit c040bfd05a2be8d3300e7a1bbfc9405c42a865fa
Disabled Turbo boost (for more stable results across benchmark runs)

Benchmark

As a benchmark, I use the same command as I found in the Makefile - llrt fixtures/hello.js. The same scenario is used for the PGO training phase. All PGO optimization steps are done with cargo-pgo tool. PGO instrumented version is built with cargo pgo build, PGO optimized version - cargo pgo optimize build. taskset -c 0 is used for reducing CPU scheduling influence on the results.

Results

I got the following results:

hyperfine -u microsecond -N --warmup=2000 --min-runs 10000 "taskset -c 0 ./llrt_optimized ../fixtures/hello.js" "taskset -c 0 ./llrt_release ../fixtures/hello.js"
Benchmark 1: taskset -c 0 ./llrt_optimized ../fixtures/hello.js
  Time (mean ± σ):     2664.8 µs ±  78.8 µs    [User: 590.1 µs, System: 1943.3 µs]
  Range (min … max):   2478.1 µs … 4486.1 µs    10000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: taskset -c 0 ./llrt_release ../fixtures/hello.js
  Time (mean ± σ):     2796.1 µs ±  63.6 µs    [User: 601.4 µs, System: 2068.9 µs]
  Range (min … max):   2647.5 µs … 4495.0 µs    10000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  taskset -c 0 ./llrt_optimized ../fixtures/hello.js ran
    1.05 ± 0.04 times faster than taskset -c 0 ./llrt_release ../fixtures/hello.js

, where llrt_release - usual Release version, llrt_optimized - PGO-optimized version.

I ran the benchmark multiple times, with different command orders, etc - in all cases, the PGO-optimized version was faster than the usual release version. However, it would be awesome to perform some more precise benchmarks.

Further steps

I can suggest to do the following things:

Perform more PGO benchmarks with some more precise performance measurements.
If PGO is worth it - add a note to the documentation about it and, possibly, make an option in the build scripts to optimize llrt easier with the existing build infrastructure.
Try to play with Post-Link Optimization (PLO) with tools like LLVM BOLT.

I hope these benchmark results can be interesting to someone.

The text was updated successfully, but these errors were encountered:

richarddavison · 2024-02-13T09:35:39Z

This is very interesting! I will rerun the benchmark with PGO (with profile data form test runs) and see the results! PLO is also super interesting but is a different beast! Right now, we use zig as a cross compiler. Since LLRT is a fully static build using musl libc, we can probably use musl sources and clang-15 directly (since it may come with bolt) and apply both PGO, PLO and LTO 🥇

EricDunaway · 2025-01-03T22:34:50Z

If instrumentation/sampling and testing could be streamlined it would be interesting to see if a per lambda optimization with pgo+bolt would be beneficial for some use cases rather than a generic optimization

richarddavison added enhancement New feature or request rust Pull requests that update Rust code performance labels Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) benchmarks #117

Profile-Guided Optimization (PGO) benchmarks #117

zamazan4ik commented Feb 10, 2024

richarddavison commented Feb 13, 2024

EricDunaway commented Jan 3, 2025

Profile-Guided Optimization (PGO) benchmarks #117

Profile-Guided Optimization (PGO) benchmarks #117

Comments

zamazan4ik commented Feb 10, 2024

Test environment

Benchmark

Results

Further steps

richarddavison commented Feb 13, 2024

EricDunaway commented Jan 3, 2025