Evaluate enabling additional optimization options like LTO, PGO and Post-Link Optimization (PLO) #67
zamazan4ik
started this conversation in
Ideas
Replies: 1 comment 2 replies
-
Hi there, thank you for the interest. There are a few low-hanging fruits (relatively) at the system level and algorithm level for vtracer. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. According to the tests, PGO can help with achieving better performance in many cases for many applications. Since this, I think trying to optimize vtracer with PGO can be a good idea. Also, I found that Vtracer does not use LTO for some reason - enabling it also would be a good idea.
I already did some benchmarks and want to share my results here.
Test environment
master
branch on commit74f2a04a17d8c246d80c439fb162780160a7c3e9
Benchmark
For benchmark purposes, I use
vtracer --input input.jpg --output output.svg
command from the README file. For PGO optimization I use cargo-pgo tool. The same command was used for the PGO training phase. PGO instrumented Vtracer is built withcargo pgo build
. PGO optimized version is built withcargo pgo optimize build
.Unfortunately, due to the bug in the Rustc compiler right now PGO cannot be enabled simultaneously with LTO for Vtracer. So I compare three Vtracer versions here: "Vtracer Release", "Vtracer Release with LTO" and "Vtracer Release with PGO". Later, when the bug will be fixed, we can apply LTO + PGO to Vtracer at the same time - it should work well. LTO for Vtracer is enabled with the following addition to the root
Cargo.toml
:All tests are done on the same machine, done multiple times (with
hyperfine
), with the same background "noise" (as much as I can guarantee of course).As a test input, I use Sample JPG 5 Mib from https://sample-videos.com/download-sample-jpg-image.php .
Results
I got the following results (in
hyperfine
format):where:
vtracer_release_no_lto
- usual Releasevtracer_release_with_lto
- Release with LTOvtracer_optimized_no_lto
- Release with PGOAccording to the tests above, LTO and PGO improve Vtracer performance.
Someone can be interested in how LTO and PGO perform if the training workload is different from the evaluation workload (usual questions in the ML world). So I did a simple measurement with another file from https://sample-videos.com/download-sample-jpg-image.php (30 Mib sample). The results are the following:
Just for reference, here is the information about Vtracer slowdown in the PGO instrumentation mode (during the PGO training phase):
Further steps
I can suggest the following action points:
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual LTO and PGO.
Here are some examples of how PGO optimization is integrated in other projects:
configure
scriptBeta Was this translation helpful? Give feedback.
All reactions