Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof-of-concept: Hand-crafted optimizations to pave the way forward for code-gen #2954

Closed
wants to merge 7 commits into from

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Aug 9, 2023

What

Demonstrating that with the right generated deserializer optimizations the new code-gen can out-perform the legacy queries.

There's a couple of performance improvements all rolled in here to really push the envelope:

  • Avoid dealing with Option on non-nullible components
  • Iterate over direct slices from arrow buffers where possible
  • Optimization for matched-length joining iterator (this implementation is incorrect but close enough for this profiling)
  • Fix silly allocations of annotationInfo in the noop case.

Using the photogrammetry example as a stress-test:

Previous Baseline (0.8)
image

Before (main):
image

After:
image

Checklist

@jleibs jleibs added 🚀 performance Optimization, memory use, etc do-not-merge Do not merge this PR labels Aug 9, 2023
.unwrap()
.values()
.as_slice();
let data2: &[[f32; 3]] = bytemuck::cast_slice(data);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the main things we want to generate for fixed-sized-arrays of primitives.

jleibs added a commit that referenced this pull request Aug 15, 2023
…#2970)

### What

This implements 2 optimizations:
- The first is ArrowBuffer optimization returns an inner Buffer directly
when we know that the type itself it just an array of primitives. This
is useful for zero-copy returns for dense data such as Tensors.
- The second is the optimizations from:
#2954 . For this, we identify
cases where we know the inner arrays are not nullable and instead of
using validity-iterators map directly to slices.

Significant speedups for batch queries:

![image](https://github.com/rerun-io/rerun/assets/3312232/7ea1f3a2-a45a-4813-b82c-eaee55914c32)


TODO:
- [x] We should be able to check that the contents don't actually
contain a validity map with non-nulls and return a deserialization error
in that case.
 - [x] Add handling for other ArrowBuffer types.

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/2970) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/2970)
- [Docs
preview](https://rerun.io/preview/pr%3Ajleibs%2Fcodegen_optimizations/docs)
- [Examples
preview](https://rerun.io/preview/pr%3Ajleibs%2Fcodegen_optimizations/examples)
@jleibs
Copy link
Member Author

jleibs commented Sep 20, 2023

These have all now been addressed properly. Closing.

@jleibs jleibs closed this Sep 20, 2023
@jleibs jleibs deleted the jleibs/more_optimization branch June 14, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do not merge this PR 🚀 performance Optimization, memory use, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants