Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) improvements #1540

Open
zamazan4ik opened this issue Sep 12, 2023 · 0 comments
Open

Profile-Guided Optimization (PGO) improvements #1540

zamazan4ik opened this issue Sep 12, 2023 · 0 comments

Comments

@zamazan4ik
Copy link

Hi!

I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for Delta. I did some benchmarks on my local machine and want to share my results.

Test environment

  • Apple Macbook M1 (full charge, AC connected)
  • macOS 13.4 Ventura
  • Rust: 1.72
  • Latest Delta from the master branch (commit 7375f7a165dabe430e12d531fedd84bb3a027c6b )

Test workload

As a test scenario, I used make benchmark command. All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee ofc). The measurements were performed with hyperfine. The PGO optimization is done with cargo-pgo.

Results

Here are the results. Also, I posted Instrumentation results so you can estimate how delta slow in the Instrumentation mode.

PGO optimized compared to Release builds:

hyperfine --warmup 10 --min-runs 20 'delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null' 'delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):     405.8 ms ±   2.1 ms    [User: 396.2 ms, System: 38.4 ms]
  Range (min … max):   403.0 ms … 410.1 ms    20 runs

Benchmark 2: delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):     413.0 ms ±   3.4 ms    [User: 403.6 ms, System: 38.7 ms]
  Range (min … max):   409.7 ms … 422.6 ms    20 runs

Summary
  delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null ran
    1.02 ± 0.01 times faster than delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null

PGO-instrumented version:

LLVM_PROFILE_FILE=/Users/zamazan4ik/open_source/delta/target/pgo-profiles/delta_%m_%p.profraw hyperfine --warmup 10 --min-runs 20 'target/aarch64-apple-darwin/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: target/aarch64-apple-darwin/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):     575.4 ms ±   4.7 ms    [User: 562.2 ms, System: 44.1 ms]
  Range (min … max):   569.0 ms … 588.7 ms    20 runs

So, PGO makes some small improvements at least in the project's benchmarks. But getting a "free" 1-2% performance is not a bad thing after all :)

Possible further steps

I can suggest to do the following things:

  • Add a note to the Delta documentation (maybe somewhere in the README file) about building with PGO if you think it's worth it for the project. In this case, users and maintainers who build their own Delta binaries will be aware of PGO as an additional way to optimize the project
  • Try to use LLVM BOLT in addition to PGO. However, I do not expect huge improvements from BOLT in this project
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant