Evaluate Profile-Guided Optimization (PGO) #2354

zamazan4ik · 2023-07-29T20:44:19Z

Hi!

There is an idea that PGO could help with improving performance even more! There are a lot of examples of different software, where PGO helps a lot with performance - you can check it here. E.g. in this list are a lot of databases like PostgreSQL and ClickHouse.

There are several options. I'd appreciate it if you could provide an easy way to build Qdrant with PGO. And experienced users will be able to do it on their own for their own usage scenarios. Another option is to optimize Qdrant build with a generic-enough profile. Providing PGO-optimized binaries could be a trickier task (since it requires preparing a good-enough profile) but as an option would be great to see too. Another idea of how to use PGO - optimize your own cloud-based Qdrant installation.

As an additional optimization way, I suggest taking a look at LLVM BOLT. But from my experience, it would be better to start with PGO and then try to use BOLT.

For the Rust projects, I recommend starting with cargo-pgo.

agourlay · 2023-08-30T09:37:31Z

This sounds interesting, thank you for raising this issue 👍

What do you miss or need to be able to try cargo-pgo?

zamazan4ik · 2023-08-30T09:46:11Z

What do you miss or need to be able to try cargo-pgo?

I've created the issue as an idea to try PGO on Qdrant. Regarding applying PGO to Qdrant, if https://github.com/qdrant/qdrant/tree/master/benches benchmarks have good coverage from the Qdrant functionality perspective, I think we can try to test PGO on it.

The only thing I miss is free time to do it :) (since I am working on enabling PGO for multiple projects).

agourlay · 2023-08-30T10:09:58Z

The only thing I miss is free time to do it

I did not mean to put you under pressure to get it done 👍

since I am working on enabling PGO for multiple projects

It seems you have much more experience with PGO than us for the time being so your input is very valuable.

I think it would help to prioritize this work to have a basic experiment demonstrating how much it costs to setup/maintain for which potential performance gain.

zamazan4ik · 2023-08-30T10:12:55Z

I think it would help to prioritize this work to have a basic experiment demonstrating how much it costs to setup/maintain for which potential performance gain.

Well, if we are going to use benchmarks as a PGO training and evaluation set, I need to understand how to run the benchmarks with Qdrant. Are they integrated via cargo bench?

agourlay · 2023-08-30T13:21:44Z

We use criterion but bench should work as well AFAIK.

zamazan4ik · 2023-08-30T13:46:58Z

Well, at least cargo bench in the root qdrant repo runs nothing:

Finished bench [optimized] target(s) in 4m 43s
     Running unittests src/main.rs (target/release/deps/qdrant-e69b7ac971f11c78)

running 16 tests
test actix::api::collections_api::tests::timeout_is_deserialized ... ignored
test actix::api::read_params::test::deserialize_empty_string ... ignored
test actix::api::read_params::test::deserialize_empty_value ... ignored
test actix::api::read_params::test::deserialize_factor ... ignored
test actix::api::read_params::test::deserialize_type ... ignored
test actix::api::read_params::test::try_deserialize_factor_0 ... ignored
test actix::tests::test_version ... ignored
test common::helpers::tests::test_is_ready ... ignored
test common::metrics::tests::test_endpoint_whitelists_sorted ... ignored
test consensus::tests::collection_creation_passes_consensus ... ignored
test greeting::tests::test_welcome ... ignored
test settings::tests::test_custom_config ... ignored
test settings::tests::test_default_config ... ignored
test settings::tests::test_no_config_files ... ignored
test settings::tests::test_runtime_development_config ... ignored
test tonic::api::tests::test_validation ... ignored

test result: ok. 0 passed; 0 failed; 16 ignored; 0 measured; 0 filtered out; finished in 0.00s

As far as I see in https://github.com/qdrant/qdrant/blob/master/benches/search-points/search-points.sh , the benchmarks should be run in a different way:

Compile and run Qdrant instance
Run search-points.sh script
Collect benchmark data

Maybe, I will try to do it a bit later :)

agourlay · 2023-08-30T13:53:05Z

Then cargo criterion --all should unleash the kraken 🐙

zamazan4ik · 2023-08-30T13:56:05Z

Then cargo criterion --all should unleash the kraken

Yeah, it definitely should :) The problem is that https://github.com/Kobzol/cargo-pgo relies on cargo bench integration. So if we want to build PGO-optimized Qdrant version, we need to do a little bit more work. It's of course possible to do (no rocket science at all) but just a little bit more work to do here.

agourlay · 2023-08-30T14:48:25Z

If you step into the lib crates, you can run benchmarks directly.

cd lib/segment
cargo bench --bench hnsw_build_graph

zamazan4ik · 2023-08-31T23:23:14Z

@agourlay During testing on Macbook M1 with macOS 13.4 (Ventura) and Qdrant from master branch (commit 0f102a5575ac33df03f06563844198f3ea26136b) I get the following error:

cd lib/segment
RUST_BACKTRACE=1 cargo bench --bench hnsw_build_asymptotic
    Finished bench [optimized] target(s) in 0.24s
     Running benches/hnsw_build_asymptotic.rs (/Users/zamazan4ik/open_source/qdrant/target/release/deps/hnsw_build_asymptotic-215d8cb651e6cb54)
Gnuplot not found, using plotters backend
hnsw-index-build-asymptotic/build-n-search-hnsw
                        time:   [26.704 µs 26.792 µs 26.902 µs]
                        change: [-0.6981% -0.2924% +0.1105%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
hnsw-index-build-asymptotic/build-n-search-hnsw-10x
                        time:   [31.965 µs 32.401 µs 32.844 µs]
                        change: [-1.5955% +0.5111% +2.6498%] (p = 0.64 > 0.05)
                        No change in performance detected.
hnsw-index-build-asymptotic/build-n-search-hnsw-10x-score-point
                        time:   [33.176 µs 33.408 µs 33.735 µs]
                        change: [-0.5501% +0.1561% +0.9129%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

Benchmarking scoring-vector/score-point: Warming up for 3.0000 sthread 'main' panicked at 'internal error: entered unreachable code: FakeMetric::distance', lib/segment/benches/hnsw_build_asymptotic.rs:113:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
   2: <hnsw_build_asymptotic::FakeMetric as segment::spaces::metric::Metric>::distance
   3: segment::vector_storage::raw_scorer::raw_scorer_impl
   4: criterion::bencher::Bencher<M>::iter
   5: <criterion::routine::Function<M,F,T> as criterion::routine::Routine<M,T>>::warm_up
   6: criterion::routine::Routine::sample
   7: criterion::analysis::common
   8: criterion::benchmark_group::BenchmarkGroup<M>::bench_function
   9: hnsw_build_asymptotic::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: bench failed, to rerun pass `--bench hnsw_build_asymptotic`

zamazan4ik · 2023-09-01T00:14:46Z

I benchmarked segment with PGO on my Macbook M1 Pro, macOS Ventura 13.4. The background noise was the same. The only disabled benchmark is hnsw_build_asymptotic since it doesn't work on the current Qdrant version on my machine (see the comment above about the panic).

The results are the following (in cargo bench format):

Release: https://pastebin.com/cKtsaWn8
PGO Instrumentation compared to Release: https://pastebin.com/crsEY7yH
PGO Optimized compared to Release: https://pastebin.com/31TtmCky

According to these microbenchmarks, PGO helps with achieving better performance in almost all cases. However, would be much more interesting to test Qdrant itself with PGO.

Additionally, I want to highlight, that some used by Qdrant 3rd-parties could be optimized with PGO as well. For E.g. RocksDB as a C++ dependency will not be optimized with cargo pgo but RocksDB benefits from PGO too according to my tests (somewhere near 10% in performance).

agourlay · 2023-10-18T08:45:00Z

Thanks for the deep investigation 👍

FYI we have finally fixed the panic in the hnsw_build_asymptotic bench :)

agourlay added the enhancement New feature or request label Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Profile-Guided Optimization (PGO) #2354

Evaluate Profile-Guided Optimization (PGO) #2354

zamazan4ik commented Jul 29, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 31, 2023

zamazan4ik commented Sep 1, 2023 •

edited

Loading

agourlay commented Oct 18, 2023

Evaluate Profile-Guided Optimization (PGO) #2354

Evaluate Profile-Guided Optimization (PGO) #2354

Comments

zamazan4ik commented Jul 29, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 30, 2023

agourlay commented Aug 30, 2023

zamazan4ik commented Aug 31, 2023

zamazan4ik commented Sep 1, 2023 • edited Loading

agourlay commented Oct 18, 2023

zamazan4ik commented Sep 1, 2023 •

edited

Loading