You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. According to my tests, PGO helps with achieving better performance in many application domains, including the network-oriented software (e.g. see the results for Envoy, HAProxy, httpd). Since this, I decided to test PGO on Legba. And here are my results.
Test environment
Fedora 38
Linux kernel 6.5.6
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.73
Legba version: the latest for now from the main branch on commit 5f0739a974f4ad92c254ddfe37aca033b40600e6
Disabled Turbo boost
Benchmark
For benchmark purposes, I use "HTTP basic auth" scenario from the test_server directory with the legba http.basic -t 127.0.0.1:8888 --username admin666 --password ./passwords_1m.txt --concurrency 1 command line. concurrency 1 is used just for reducing multithreading jitter influence on the results. As password_1m.txt file I use this where test12345 password is moved to the end of the file.
For the training PGO phase, I use completely the same command but with a smaller password file (1050 passwords + test12345 at the end) (just to boost the PGO training phase).
I tested the following Legba configurations:
Release build: cargo build --release
Release + lto = true + codegen-units = 1 (enable LTO): Apply LTO changes to Cargo.toml and then cargo build --release
Release + lto = true + codegen-units = 1 + PGO + BOLT: Also via cargo-pgo
All benchmarks are done multiple times, on the same machine (with the same hardware/software configuration), with the same background noise (as much as I can guarantee ofc).
At least in the benchmark above, LTO and PGO help with achieving better performance in Legba. However, seems like LLVM BOLT has no measurable results in this benchmark.
For reference, here are results for the smaller file with 1051 password, so you can estimate how slower PGO instrumented Legba is compared to other configurations:
Perform more PGO benchmarks on Legba in various scenarios. If it shows improvements - add a note to the documentation about possible improvements in legba's performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize legba according to their own workloads.
Here are some examples of how PGO optimization is integrated in other projects:
@zamazan4ik thank you for such useful insights! I have to admit i didn't know about PGO and BOLT, so I'll have to study a bit before being able to make any meaningful changes to the build system.
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. According to my tests, PGO helps with achieving better performance in many application domains, including the network-oriented software (e.g. see the results for Envoy, HAProxy, httpd). Since this, I decided to test PGO on Legba. And here are my results.
Test environment
main
branch on commit5f0739a974f4ad92c254ddfe37aca033b40600e6
Benchmark
For benchmark purposes, I use "HTTP basic auth" scenario from the
test_server
directory with thelegba http.basic -t 127.0.0.1:8888 --username admin666 --password ./passwords_1m.txt --concurrency 1
command line.concurrency 1
is used just for reducing multithreading jitter influence on the results. Aspassword_1m.txt
file I use this wheretest12345
password is moved to the end of the file.For the training PGO phase, I use completely the same command but with a smaller password file (1050 passwords +
test12345
at the end) (just to boost the PGO training phase).I tested the following Legba configurations:
cargo build --release
lto = true
+codegen-units = 1
(enable LTO): Apply LTO changes toCargo.toml
and thencargo build --release
lto = true
+codegen-units = 1
+ PGO:cargo pgo build
+cargo pgo optimize build
. It's done with cargo-pgo.lto = true
+codegen-units = 1
+ PGO + BOLT: Also viacargo-pgo
All benchmarks are done multiple times, on the same machine (with the same hardware/software configuration), with the same background noise (as much as I can guarantee ofc).
Results
I got the following results:
lto = true
+codegen-units = 1
: 262slto = true
+codegen-units = 1
+ PGO optimized: 247slto = true
+codegen-units = 1
+ PGO optimized + BOLT optimized: 247sAt least in the benchmark above, LTO and PGO help with achieving better performance in Legba. However, seems like LLVM BOLT has no measurable results in this benchmark.
For reference, here are results for the smaller file with 1051 password, so you can estimate how slower PGO instrumented Legba is compared to other configurations:
Here are binary sizes after the
strip
command:Also, I measured build time changes between configurations:
lto = true
+codegen-units = 1
: 6m 57slto = true
+codegen-units = 1
+ PGO instrumented: 11m 14slto = true
+codegen-units = 1
+ PGO optimized: 6m 40sFurther steps
I can suggest the following action points:
Here are some examples of how PGO optimization is integrated in other projects:
configure
scriptThe text was updated successfully, but these errors were encountered: