Profile-Guided Optimization (PGO) benchmark report #7

zamazan4ik · 2024-06-22T12:56:06Z

zamazan4ik
Jun 22, 2024

Hi!

I was interested in optimizing the library's performance even further. I evaluated Profile-Guided Optimization (PGO) on many projects - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since this compiler optimization works well in many places, especially different parsers, I decided to apply it to the project - here are my benchmark results.

Test environment

Fedora 40
Linux kernel 6.9.4
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.79
lady-deirdre version: the latest for now from the master branch on commit 8fed627c54fef9e832a3ff93987d5367d51183ae
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with taskset -c 0 cargo bench --workspace --all-features command. The PGO training phase is done with taskset -c 0 cargo pgo bench -- --workspace --all-features, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench -- --workspace --all-features. taskset -c 0 is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/c6b3cf2b3aec993558e5bbf73a998c43
PGO optimized compared to Release: https://gist.github.com/zamazan4ik/1ab804b086615e991aca8bbfe829f37c
(just for reference) PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/544c563492be3df26bcee00a1f816959

According to the results, PGO measurably improves the library's performance in many cases.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks with other datasets (if you are interested enough in it). If it shows improvements - add a note to the documentation (the README file, I guess) about possible improvements in the library's performance with PGO.
Probably, you can try to get some insights about how the code can be optimized further based on the changes that the compiler performed with PGO. It can be done via analyzing flamegraphs before and after applying PGO to understand the difference.

I would be happy to answer your questions about PGO.

P.S. Please do not treat the issue like a bug or something like that - it's just a benchmark report. Since the "Discussions" functionality is disabled in this repo, I created the Issue instead.

Eliah-Lakhin · 2024-06-22T15:37:19Z

Eliah-Lakhin
Jun 22, 2024
Maintainer

@zamazan4ik Thank you very much for your report!

That's a very interesting result. I didn't realize that Lady Deirdre could perform notably better than reported in my current benchmarks. Nom and Logos performance seem to improve as well. But what surprised me is that Tree-Sitter's result stays almost the same. I don't have much experience with PGO, but based on your expertise, do you think it's because PGO optimizations are already applied to the linked C library (TS is a wrapper of a C lib), or is it due to something else?

Ideally, I think that for a fair estimation, the benchmark tests should include more grammars (e.g., Java, C#). However, it is time-consuming, and I'm not sure if I have the time to implement it in the near future. It is certainly on my schedule, and I will consider PGO optimizations as well.

0 replies

zamazan4ik · 2024-06-22T15:55:23Z

zamazan4ik
Jun 22, 2024
Author

Thank you very much for your report!

You are welcome!

But what surprised me is that Tree-Sitter's result stays almost the same. I don't have much experience with PGO, but based on your expertise, do you think it's because PGO optimizations are already applied to the linked C library (TS is a wrapper of a C lib), or is it due to something else?

No, the reason here is that cargo-pgo is able to apply PGO only to Rust libraries. It doesn't support optimizing with PGO libraries in other languages like C. If you want to optimize tree-sitter with PGO, you need to pass the corresponding flags (usually -fprofile-generate/-fprofile-use) manually via env variables, build.rs or similar things. I did such a thing for ast-grep project (that also has a tree-sitter dependency) - the results with some insights are available here: ast-grep/ast-grep#738 . For the lady-dreidra library I didn't do that because the main aim was to measure improvements for lady-dreidra, not other libraries. In the awesome-pgo repo I also have many benchmarks for other parsers like serde, nom, etc.

Ideally, I think that for a fair estimation, the benchmark tests should include more grammars (e.g., Java, C#). However, it is time-consuming, and I'm not sure if I have the time to implement it in the near future. It is certainly on my schedule, and I will consider PGO optimizations as well.

Agree. In this case, I can suggest just writing a note somewhere in the documentation that applying PGO on lady-dreidra can help optimize its performance. It can be interesting for some performance-oriented users.

0 replies

Eliah-Lakhin · 2024-06-22T17:07:15Z

Eliah-Lakhin
Jun 22, 2024
Maintainer

Agree. In this case, I can suggest just writing a note somewhere in the documentation that applying PGO on lady-dreidra can help optimize its performance. It can be interesting for some performance-oriented users.

Sounds good. I will add a note in the dedicated section of the Benchmarks README page linking to our conversations. This information should be useful for the users.

P.S. I'll also move this conversation to the Discussions that I just enabled. Thank you for your help again!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) benchmark report #7

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Profile-Guided Optimization (PGO) benchmark report #7

zamazan4ik Jun 22, 2024

Test environment

Benchmark

Results

Further steps

Replies: 3 comments

Eliah-Lakhin Jun 22, 2024 Maintainer

zamazan4ik Jun 22, 2024 Author

Eliah-Lakhin Jun 22, 2024 Maintainer

zamazan4ik
Jun 22, 2024

Eliah-Lakhin
Jun 22, 2024
Maintainer

zamazan4ik
Jun 22, 2024
Author

Eliah-Lakhin
Jun 22, 2024
Maintainer