You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. So that's why I think it's worth trying to apply PGO to Czkawka. I already performed some benchmarks and want to share my results here.
Test environment
Fedora 38
Linux kernel 6.5.5
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.73
Czkawka version: the latest for now from the master branch on commit 99277b9ea50f2e08dab7343e5cf3b89afa23b769
Disabled Turbo boost
Benchmark setup
For benchmarking purposes, I use the command czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project. As a benchmark directory, I use a fully cloned LLVM project repo. I test only CLI version since it's easier to test via SSH connection to the server - all results should be almost the same for the GUI version too.
Release build is done with cargo build --release, PGO and BOLT optimized builds are done with cargo-pgo. PGO and BOLT profiles are collected from the benchmark workload itself.
All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc). Between each run all cached files were cleared.
hyperfine --warmup 20 --min-runs 200 './release_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka' './release_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka' './optimized_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka' './bolt_optimized_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka'
Benchmark 1: ./release_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
Time (mean ± σ): 139.4 ms ± 4.4 ms [User: 200.2 ms, System: 702.4 ms]
Range (min … max): 133.0 ms … 157.0 ms 200 runs
Benchmark 2: ./release_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
Time (mean ± σ): 136.9 ms ± 4.3 ms [User: 185.6 ms, System: 710.0 ms]
Range (min … max): 129.7 ms … 154.7 ms 200 runs
Benchmark 3: ./optimized_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
Time (mean ± σ): 133.9 ms ± 4.6 ms [User: 171.7 ms, System: 693.7 ms]
Range (min … max): 126.6 ms … 153.6 ms 200 runs
Benchmark 4: ./bolt_optimized_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
Time (mean ± σ): 133.3 ms ± 4.2 ms [User: 163.9 ms, System: 703.0 ms]
Range (min … max): 126.3 ms … 147.5 ms 200 runs
Summary
./bolt_optimized_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka ran
1.00 ± 0.05 times faster than ./optimized_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
1.03 ± 0.05 times faster than ./release_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
1.05 ± 0.05 times faster than ./release_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
According to the tests, it's possible to achieve several percent improvements with PGO at least in the benchmark above. However, BOLT does not show huge improvements here.
Also, for reference, I post the results for the same benchmark but for the PGO and BOLT instrumented versions (so you can estimate how Czkawka is slow in the Instrumentation mode):
hyperfine --warmup 10 --min-runs 50 './instrumented_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka' './bolt_instrumented_czkafka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka'
Benchmark 1: ./instrumented_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
Time (mean ± σ): 171.2 ms ± 4.5 ms [User: 1081.6 ms, System: 652.1 ms]
Range (min … max): 162.3 ms … 186.6 ms 50 runs
Benchmark 2: ./bolt_instrumented_czkafka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
Time (mean ± σ): 362.2 ms ± 5.9 ms [User: 1562.3 ms, System: 739.7 ms]
Range (min … max): 349.0 ms … 379.7 ms 50 runs
Summary
./instrumented_lto_czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka ran
2.12 ± 0.07 times faster than ./bolt_instrumented_czkafka_cli dup -d /home/zamazan4ik/open_source/llvm-project >> /dev/null && rm -rf /home/zamazan4ik/.cache/czkawka
Evaluate PGO and BOLT applicability to Czkawka in more scenarios.
If PGO helps to achieve better performance - add a note to Czkawka's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for Czkawka.
Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.
Optimize prebuilt binaries with PGO.
Here are some examples of how PGO is already integrated into other projects' build scripts:
Hi!
Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. So that's why I think it's worth trying to apply PGO to Czkawka. I already performed some benchmarks and want to share my results here.
Test environment
master
branch on commit99277b9ea50f2e08dab7343e5cf3b89afa23b769
Benchmark setup
For benchmarking purposes, I use the command
czkawka_cli dup -d /home/zamazan4ik/open_source/llvm-project
. As a benchmark directory, I use a fully cloned LLVM project repo. I test only CLI version since it's easier to test via SSH connection to the server - all results should be almost the same for the GUI version too.In this benchmark, I use 4 build configurations:
codegen-units=1
+lto = fat
buildcodegen-units=1
+lto = fat
+ PGO buildcodegen-units=1
+lto = fat
+ PGO + BOLT buildRelease build is done with
cargo build --release
, PGO and BOLT optimized builds are done with cargo-pgo. PGO and BOLT profiles are collected from the benchmark workload itself.All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc). Between each run all cached files were cleared.
Results
Builds:
release_czkawka_cli
- Default Release buildrelease_lto_czkawka_cli
- Release +codegen-units=1
+lto = fat
buildoptimized_lto_czkawka_cli
- Release +codegen-units=1
+lto = fat
+ PGO buildbolt_optimized_czkawka_cli
- Release +codegen-units=1
+lto = fat
+ PGO + BOLT buildI got the following results with Hyperfine:
According to the tests, it's possible to achieve several percent improvements with PGO at least in the benchmark above. However, BOLT does not show huge improvements here.
Also, for reference, I post the results for the same benchmark but for the PGO and BOLT instrumented versions (so you can estimate how Czkawka is slow in the Instrumentation mode):
where:
instrumented_lto_czkawka_cli
- Release +codegen-units=1
+lto = fat
+ PGO instrumentation buildbolt_instrumented_czkafka_cli
- Release +codegen-units=1
+lto = fat
+ PGO optimized + BOLT instrumented buildBinary sizes for all binaries with
size
command:Further steps
I can suggest the following things to do:
Here are some examples of how PGO is already integrated into other projects' build scripts:
configure
scriptThe text was updated successfully, but these errors were encountered: