Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up C++ logging for many individual log calls #4287

Closed
Wumpf opened this issue Nov 21, 2023 · 0 comments · Fixed by #4296
Closed

Speed up C++ logging for many individual log calls #4287

Wumpf opened this issue Nov 21, 2023 · 0 comments · Fixed by #4296
Assignees
Labels
🌊 C++ API C/C++ API specific 🚀 performance Optimization, memory use, etc
Milestone

Comments

@Wumpf
Copy link
Member

Wumpf commented Nov 21, 2023

The recent performance improvement

got the C++ sdk a lot faster. But compared to Rust we're still behind for individual log calls (like time series scalars!).

An obvious candidate to improve is not building & sending the schema every time: Right now on every log call we convert the schema to C FFI and then create a Rust/arrow2 representation from it. Add a simple lazy schema registry/handle system for this!

Should do a little bit more profiling though to get an idea where the perf goes. E.g. there's likely many many other needless allocs on the way.

@Wumpf Wumpf added 🚀 performance Optimization, memory use, etc 🌊 C++ API C/C++ API specific labels Nov 21, 2023
@Wumpf Wumpf added this to the 0.11 milestone Nov 21, 2023
@Wumpf Wumpf self-assigned this Nov 21, 2023
Wumpf added a commit that referenced this issue Nov 22, 2023
…ing a component type registry (#4296)

### What

* Fixes #4287
* Follow-up to #4273

As expected, not doing the C++ datatype -> C FFI schema -> Rust datatype
roundtrip for each log call helps perf quite a bit, especially when we
do a lot of smaller log calls.

The registry a single RwLock protected Vec (we never deregister) which
is exposed via a single c entry point.
On the C++ side we use the local `static` variable mechanism for
threadsafe lazy registration (slight codegen adjustment).
Indicator components had some special handling before and were
refactored to fit in this system - in the process I made their arrow
array shared across all instantiations, further cutting down on per-log
work.

---

Benchmark results:
* large point cloud: `0.15s` -> `0.14s`
* many points: `7.52s` -> `4.52s`
* large images: `0.57s` -> `0.51s`

Old values from previous PR. New values are median over three runs,
single executable run (this makes more and more of a difference with all
these registries!), timings without prepare step, same M1 macbook.

A quick look over the profiler for running `log_benchmark
points3d_many_individual` in isolation tells us that of the actual
benchmark running time we spend..
* 35% of the the time in `rr_recording_stream_log` (of which in turn
20%, so 7% overall, is still arrow FFI translation of the array!!)
* 30% in the various `to_data_cell` methods
* 10% in exporting arrow arrays to C FFI
* 6% in setting the time
* the rest in various allocations along the way

(taken via `Instruments` on my Mac)
<img width="969" alt="image"
src="https://github.com/rerun-io/rerun/assets/1220815/5632589f-52b1-4e92-b7a0-1482e69528ad">


---

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/4296) (if
applicable)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG

- [PR Build Summary](https://build.rerun.io/pr/4296)
- [Docs
preview](https://rerun.io/preview/8bf1ee59d9a2bc5e192c1c8169c98dd40b621100/docs)
<!--DOCS-PREVIEW-->
- [Examples
preview](https://rerun.io/preview/8bf1ee59d9a2bc5e192c1c8169c98dd40b621100/examples)
<!--EXAMPLES-PREVIEW-->
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌊 C++ API C/C++ API specific 🚀 performance Optimization, memory use, etc
Projects
None yet
1 participant