Evaluate using Profile-Guided Optimization (PGO) and Post Link Optimization (PLO) #4692
Closed
zamazan4ik
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
This is all very lovely work and a good reference for anyone interested in PGO themselves - but I don't think there's anything actionable that we can do here. We're fundamentally a library that people use in their own programs, so they are in charge of the compile settings used. We do have a few binaries (wgpu-info, naga-cli, etc) but we don't actually publish binaries for them as they're just developer tools. Everyone who uses them uses them through It's good to know that PGO does help naga (it makes sense, it's very branch heavy). |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. PGO results for LLVM-related tooling are here. According to the tests, PGO usually helps with the compiler and compiler-like workloads (like static analysis or code formatters) - e.g. Clang gets +20% compilation speed with PGO. Since this, I think trying to optimize Wgpu tools like
naga-cli
, would be a good idea. I already did some PGO benchmarks and want to share my results here.Test environment
I performed tests on two configurations: Linux and macOS.
Linux:
trunk
branch on commit611471c4bc8b1adab4d7d96fd2ba8bae82dccb28
macOS:
trunk
branch on commit611471c4bc8b1adab4d7d96fd2ba8bae82dccb28
Benchmark
For benchmark purposes, I run
cargo bench --workspace --no-fail-fast -p naga --bench criterion
. PGO training phase was done on the same scenario withcargo pgo bench -- --workspace --no-fail-fast -p naga --bench criterion
. PGO optimization is done with cargo-pgo.Benchmarks are done on the same machine, with the same background "noise". The benchmark was performed multiple times - the results are reproducible.
Results
I got the following results (in
cargo bench
format):Linux:
macOS:
At least in the scenario above PGO helps with achieving better performance.
One benchmark does not work on my setup but right I have no idea how to fix it.
Further steps
I can suggest the following action points:
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is already integrated into other projects:
configure
scriptI am not sure how performance is critical for Wgpu now. If it isn't the biggest priority right now - could be a nice feature to get in the future.
Beta Was this translation helpful? Give feedback.
All reactions