Report logging benchmarks for C++/Python/Rust to CI #4100

Wumpf · 2023-10-31T13:22:03Z

Keep things super simple and do end-to-end profing: Profile running a benchmark binary that internally has a bunch of test cases. This way we can integrate results from all benchmarks in the same way into our CI generated benchmark stats.

We execute it with different parameters to check for different test cases.
Basic set of test cases we should start with:

log a lot of scalars with different time
log a point cloud
log an image

In all cases (unless configured otherwise) log to a memory recording. (profiling other parts of the flow should be part of a different Rust benchmark)

Since we want to simply time process from spawn to end must make sure that data generation is super fast. Maybe print out additional timings in each language where appropriate - this is harder to integrate into CI graphs, but nice for debugging.

Ideally same data on all SDKs. There might be variations though in logging flow that don't map to each of them.

emilk · 2023-11-01T08:12:46Z

Very similar to:

Profile the logging SDKs #3084

### What * Part of #4100 Implements SDK sided logging benchmark for C++ & Rust. Kept as simple as possible, meant for whole-process profiling so we capture all sideeffects. This makes tthe data generation ('prepare') inside the benchmark apps ofc quite tricky as it has to be as fast as possible. Additionally, both Rust & Cpp app expose a way to get more fine grained logging. Cpp does this via a simple profiler scope, Rust via Puffin/re_tracing. Logging always happens to a memory recording. Data is currently never passed in in the Rerun format Contains the tree initial benchmarks we wanted to have: * points3d_large_batch * Single batch of 50mio points (color, position, radius, single label) * points3d_many_individual * 1mio individual points with different time stamp each (color, position, radius) * image * log 4 different 16k x 16k RGBA8 images (4GiB of memory!) Running instructions in `main.rs` & `main.cpp`! Timings on my M1 Max in seconds (tests are not perfectly comparable, they do not do the exact same thing. Prepare times are also slightly different and most significant in the _large_batch test) * points3d_large_batch * C++: 0.94s * Rust: 1.34s * points3d_many_individual * C++: 16.86s (⚠️ there's almost certainly some involuntary allocation going on there) * Rust: 2.75s * image * C++: 3.11s * Rust: 1.10s Missing * Python version * utility script for building, running and publishing data ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/4181) (if applicable) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4181) - [Docs preview](https://rerun.io/preview/73a3736ac3c0be33fa8d6e6b40a2af243c4aa2d9/docs)  - [Examples preview](https://rerun.io/preview/73a3736ac3c0be33fa8d6e6b40a2af243c4aa2d9/examples)  - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)

### What This PR adds the 3 basic benchmarks for Python. One notable different w.r.t the rust version is that the benchmarks use a single recording, but create a fresh memory sink for each iteration (the rust version creates a fresh recording for each iteration as well). This is due to #4410. To run: ``` just py-bench ``` **IMPORTANT**: the python version of `many_individual` runs 100k points instead of 1M for the other benchmarks! * Part of #4100 On my machine: <img width="1590" alt="image" src="https://github.com/rerun-io/rerun/assets/49431240/99a74354-aa09-4267-a0fa-6587ecd9f8e5"> ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [app.rerun.io](https://app.rerun.io/pr/4411) (if applicable) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4411) - [Docs preview](https://rerun.io/preview/0f8403061c76b2147bebef25cfebd1b0c5e47c73/docs)  - [Examples preview](https://rerun.io/preview/0f8403061c76b2147bebef25cfebd1b0c5e47c73/examples)  - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

Wumpf · 2023-12-12T09:48:51Z

We have now benchmarks for Python/Rust/C++ but we still don't upload the results

emilk · 2024-01-16T09:11:29Z

For reference, here are our performance targets:

Tracking issue: short-term performance targets #4423

emilk · 2024-01-16T09:14:15Z

We also want to explicitly benchmark logging scalars, including setting a timeline value for each logged scalar, i.e. something like

for frame_nr in range(0, 1_000_000) {
    rr. set_time_sequence("frame", frame_nr)
    rr.log("scalar", rr.TimeSeriesScalar(sin(frame_nr / 1000.0)))
}

emilk · 2024-01-16T12:53:38Z

just rs-plot-dashboard --num-plots 10 --num-series-per-plot 5 --num-points-per-series 5000 --freq 1000

emilk · 2024-01-16T13:14:25Z

We also want to check the memory use in the viewer when we have logged 100M scalars or so, to measure the RAM overhead.

emilk · 2024-01-16T13:15:35Z

This is closed when we have an easy way to run benchmarks for all languages, and those results are published (perhaps manually) somewhere publicly.

Wumpf added 🧑‍💻 dev experience developer experience (excluding CI) 📉 performance Optimization, memory use, etc labels Oct 31, 2023

Wumpf added this to the 0.11 C++ polish milestone Oct 31, 2023

Wumpf added the 🔨 testing testing and benchmarks label Oct 31, 2023

Wumpf mentioned this issue Oct 31, 2023

Realistic benchmark #359

Closed

emilk mentioned this issue Nov 6, 2023

rr.log(…, rr.Image(…)) slow #3085

Closed

Wumpf self-assigned this Nov 7, 2023

Wumpf mentioned this issue Nov 8, 2023

Simple logging benchmarks for C++ & Rust #4181

Merged

4 tasks

Wumpf removed their assignment Nov 13, 2023

emilk removed this from the 0.11 milestone Nov 21, 2023

abey79 mentioned this issue Nov 30, 2023

Add Python version of the basic benchmarks #4411

Merged

4 tasks

emilk mentioned this issue Dec 4, 2023

Tracking issue: short-term performance targets #4423

Open

emilk changed the title ~~Add logging benchmarks for C++/Python/Rust~~ Report logging benchmarks for C++/Python/Rust to CI Jan 30, 2024

nikolausWest modified the milestone: Triage Jan 30, 2024

emilk modified the milestones: Triage, Spring Cleaning Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report logging benchmarks for C++/Python/Rust to CI #4100

Report logging benchmarks for C++/Python/Rust to CI #4100

Wumpf commented Oct 31, 2023 •

edited

Loading

emilk commented Nov 1, 2023

Wumpf commented Dec 12, 2023 •

edited

Loading

emilk commented Jan 16, 2024

emilk commented Jan 16, 2024 •

edited

Loading

emilk commented Jan 16, 2024

emilk commented Jan 16, 2024

emilk commented Jan 16, 2024

Report logging benchmarks for C++/Python/Rust to CI #4100

Report logging benchmarks for C++/Python/Rust to CI #4100

Comments

Wumpf commented Oct 31, 2023 • edited Loading

emilk commented Nov 1, 2023

Wumpf commented Dec 12, 2023 • edited Loading

emilk commented Jan 16, 2024

emilk commented Jan 16, 2024 • edited Loading

emilk commented Jan 16, 2024

emilk commented Jan 16, 2024

emilk commented Jan 16, 2024

Wumpf commented Oct 31, 2023 •

edited

Loading

Wumpf commented Dec 12, 2023 •

edited

Loading

emilk commented Jan 16, 2024 •

edited

Loading