Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce codegen optimizations for primitives and fixed-sized-arrays #2970

Merged
merged 26 commits into from
Aug 15, 2023

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Aug 12, 2023

What

This implements 2 optimizations:

  • The first is ArrowBuffer optimization returns an inner Buffer directly when we know that the type itself it just an array of primitives. This is useful for zero-copy returns for dense data such as Tensors.
  • The second is the optimizations from: Proof-of-concept: Hand-crafted optimizations to pave the way forward for code-gen #2954 . For this, we identify cases where we know the inner arrays are not nullable and instead of using validity-iterators map directly to slices.

Significant speedups for batch queries:
image

TODO:

  • We should be able to check that the contents don't actually contain a validity map with non-nulls and return a deserialization error in that case.
  • Add handling for other ArrowBuffer types.

Checklist

@jleibs jleibs added codegen/idl 🚀 performance Optimization, memory use, etc labels Aug 14, 2023
@jleibs jleibs marked this pull request as ready for review August 14, 2023 14:35
Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ArrowBuffer::len is dangerously misnamed (even though it follows what arrow2 does), otherwise LGTM!

crates/re_types/src/arrow_buffer.rs Outdated Show resolved Hide resolved
crates/re_types_builder/src/codegen/rust/serializer.rs Outdated Show resolved Hide resolved
crates/re_types_builder/src/codegen/rust/serializer.rs Outdated Show resolved Hide resolved
crates/re_types_builder/src/codegen/rust/deserializer.rs Outdated Show resolved Hide resolved
crates/re_types_builder/src/codegen/rust/deserializer.rs Outdated Show resolved Hide resolved
crates/re_types/tests/validity.rs Outdated Show resolved Hide resolved
@@ -133,3 +133,20 @@ pub fn quote_fqname_as_type_path(fqname: impl AsRef<str>) -> TokenStream {
let expr: syn::TypePath = syn::parse_str(&fqname).unwrap();
quote!(#expr)
}

pub fn is_backed_by_arrow_buffer(typ: &DataType) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite unfortunate that we need this both on the arrow and on the idl type :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was bothered by that as well. I'd like to refactor the codegen to always use our own type-wrappers, but that turned into a bigger change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this mistake in C++ codegen for a while; going to our own type wherever possible (which is everywhere for c++ but won't work for Rust) made it a lot better :)

Comment on lines +995 to +996
let typ = arrow_registry.get(&obj.fqname);
let obj_field = &obj.fields[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not obj.fields[0].is_nullable, then you don't need the arrow registry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if it's a Struct or Union type we can't apply this optimization either. Renaming the check /adding documents to clarify: should_optimize_buffer_slice_deserialize

@Wumpf Wumpf mentioned this pull request Aug 15, 2023
3 tasks
@jleibs jleibs merged commit 0f60eb9 into main Aug 15, 2023
@jleibs jleibs deleted the jleibs/codegen_optimizations branch August 15, 2023 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/idl 🚀 performance Optimization, memory use, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants