-
Notifications
You must be signed in to change notification settings - Fork 373
Commit
…#2970) ### What This implements 2 optimizations: - The first is ArrowBuffer optimization returns an inner Buffer directly when we know that the type itself it just an array of primitives. This is useful for zero-copy returns for dense data such as Tensors. - The second is the optimizations from: #2954 . For this, we identify cases where we know the inner arrays are not nullable and instead of using validity-iterators map directly to slices. Significant speedups for batch queries: ![image](https://github.com/rerun-io/rerun/assets/3312232/7ea1f3a2-a45a-4813-b82c-eaee55914c32) TODO: - [x] We should be able to check that the contents don't actually contain a validity map with non-nulls and return a deserialization error in that case. - [x] Add handling for other ArrowBuffer types. ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/2970) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/2970) - [Docs preview](https://rerun.io/preview/pr%3Ajleibs%2Fcodegen_optimizations/docs) - [Examples preview](https://rerun.io/preview/pr%3Ajleibs%2Fcodegen_optimizations/examples)
- Loading branch information
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
use arrow2::buffer::Buffer; | ||
|
||
/// Convenience-wrapper around an arrow [`Buffer`] that is known to contain a | ||
/// a primitive type. | ||
/// | ||
/// The arrow2 [`Buffer`] object is internally reference-counted and can be | ||
/// easily converted back to a `&[T]` referencing the underlying storage. | ||
/// This avoids some of the lifetime complexities that would otherwise | ||
/// arise from returning a `&[T]` directly, but is significantly more | ||
/// performant than doing the full allocation necessary to return a `Vec<T>`. | ||
#[derive(Clone, Debug, Default, PartialEq)] | ||
pub struct ArrowBuffer<T>(pub Buffer<T>); | ||
|
||
impl<T> ArrowBuffer<T> { | ||
#[inline] | ||
/// The number of instances of T stored in this buffer. | ||
pub fn num_instances(&self) -> usize { | ||
// WARNING: If you are touching this code, make sure you know what len() actually does. | ||
// | ||
// There is ambiguity in how arrow2 and arrow-rs talk about buffer lengths, including | ||
// some incorrect documentation: https://github.com/jorgecarleitao/arrow2/issues/1430 | ||
// | ||
// Arrow2 `Buffer<T>` is typed and `len()` is the number of units of `T`, but the documentation | ||
// is currently incorrect. | ||
// Arrow-rs `Buffer` is untyped and len() is in bytes, but `ScalarBuffer`s are in units of T. | ||
self.0.len() | ||
} | ||
|
||
#[inline] | ||
pub fn is_empty(&self) -> bool { | ||
self.0.is_empty() | ||
} | ||
} | ||
|
||
impl<T> From<Vec<T>> for ArrowBuffer<T> { | ||
fn from(value: Vec<T>) -> Self { | ||
Self(value.into()) | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.