Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose ArchetypeView API to deal with raw arrow buffers when join is bypassed #3379

Closed
teh-cmc opened this issue Sep 20, 2023 · 0 comments · Fixed by #5574 or #5687
Closed

Expose ArchetypeView API to deal with raw arrow buffers when join is bypassed #3379

teh-cmc opened this issue Sep 20, 2023 · 0 comments · Fixed by #5574 or #5687
Assignees
Labels
🏹 arrow concerning arrow 🚀 performance Optimization, memory use, etc 📺 re_viewer affects re_viewer itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Sep 20, 2023

#3377 introduced an optimization to bypass the instance-key join when two rows share the same buffer of autogenerated instance keys.

That's a nice win, but we could effectively do infinitely better in many cases: since we know that we don't need to do any fancy data manipulation on the optimized path, we could just work directly with the underlying arrow buffer and not deserialize the data at all.

That would be a per-view kind of optimization.

Consider e.g. the colors in a point cloud: if we're on the optimized path, why bother deserializing them at all? just pass them to the renderer as-is.

@teh-cmc teh-cmc added 🏹 arrow concerning arrow 📺 re_viewer affects re_viewer itself 🚀 performance Optimization, memory use, etc labels Sep 20, 2023
@teh-cmc teh-cmc self-assigned this Mar 15, 2024
teh-cmc added a commit that referenced this issue Apr 8, 2024
This implements the new uncached latest-at APIs, and introduces some
basic types for promises.

Tests and benchmarks have been backported from `re_query`.

We already get a pretty decent improvement because the join process
(clamped-zip) is cheaper and we don't need to query for instance keys at
all:
```
group                         re_query                                re_query2
-----                         --------                                ---------
arrow_batch_points2/query     1.39      2.5±0.03µs 379.7 MElem/sec    1.00  1810.6±23.62ns 526.7  MElem/sec
arrow_mono_points2/query      1.44   1082.7±8.66µs 902.0 KElem/sec    1.00    753.6±9.28µs 1295.9 KElem/sec
```

- Fixes #3379
- Part of #1893  

Here's an example/guide of using the new API:
```rust
// First, get the raw results for this query.
//
// Raw here means that these results are neither deserialized, nor resolved/converted.
// I.e. this corresponds to the raw `DataCell`s, straight from our datastore.
let results: LatestAtResults = re_query2::latest_at(
    &store,
    &query,
    &entity_path.into(),
    MyPoints::all_components().iter().cloned(), // no generics!
);

// Then, grab the raw results for each individual components.
//
// This is still raw data, but now a choice has been made regarding the nullability of the
// _component batch_ itself (that says nothing about its _instances_!).
//
// * `get_required` returns an error if the component batch is missing
// * `get_optional` returns an empty set of results if the component if missing
// * `get` returns an option
let points: &LatestAtComponentResults = results.get_required::<MyPoint>()?;
let colors: &LatestAtComponentResults = results.get_optional::<MyColor>();
let labels: &LatestAtComponentResults = results.get_optional::<MyLabel>();

// Then comes the time to resolve/convert and deserialize the data.
// These steps have to be done together for efficiency reasons.
//
// Both the resolution and deserialization steps might fail, which is why this returns a `Result<Result<T>>`.
// Use `PromiseResult::flatten` to simplify it down to a single result.
//
// A choice now has to be made regarding the nullability of the _component batch's instances_.
// Our IDL doesn't support nullable instances at the moment -- so for the foreseeable future you probably
// shouldn't be using anything but `iter_dense`.

let points = match points.iter_dense::<MyPoint>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let colors = match colors.iter_dense::<MyColor>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let labels = match labels.iter_sparse::<MyLabel>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

// With the data now fully resolved/converted and deserialized, the joining logic can be
// applied.
//
// In most cases this will be either a clamped zip, or no joining at all.

let color_default_fn = || MyColor::from(0xFF00FFFF);
let label_default_fn = || None;

let results =
    clamped_zip_1x2(points, colors, color_default_fn, labels, label_default_fn).collect_vec();
```


---

Part of a PR series to completely revamp the data APIs in preparation
for the removal of instance keys and the introduction of promises:
- #5573
- #5574
- #5581
- #5605
- #5606
- #5633
- #5673
- #5679
- #5687
- #5755
- TODO
- TODO

Builds on top of the static data PR series:
- #5534
teh-cmc added a commit that referenced this issue Apr 8, 2024
Static-aware, key-less, component-based, cached latest-at APIs.

The overall structure of this new cache is very similar to what we had
before. Effectively it is just an extremely simplified version of
`re_query_cache`.

This introduces a new temporary `re_query_cache2` crate, which won't
ever be published.
It will replace the existing `re_query_cache` crate once all the
necessary features have been backported.

- Fixes #3232
- Fixes #4733
- Fixes #4734
- Part of #3379
- Part of #1893  

Example:
```rust
let caches = re_query_cache2::Caches::new(&store);

// First, get the results for this query.
//
// They might or might not already be cached. We won't know for sure until we try to access
// each individual component's data below.
let results: CachedLatestAtResults = caches.latest_at(
    &store,
    &query,
    &entity_path.into(),
    MyPoints::all_components().iter().cloned(), // no generics!
);

// Then, grab the results for each individual components.
// * `get_required` returns an error if the component batch is missing
// * `get_optional` returns an empty set of results if the component if missing
// * `get` returns an option
//
// At this point we still don't know whether they are cached or not. That's the next step.
let points: &CachedLatestAtComponentResults = results.get_required::<MyPoint>()?;
let colors: &CachedLatestAtComponentResults = results.get_optional::<MyColor>();
let labels: &CachedLatestAtComponentResults = results.get_optional::<MyLabel>();

// Then comes the time to resolve/convert and deserialize the data.
// These steps have to be done together for efficiency reasons.
//
// Both the resolution and deserialization steps might fail, which is why this returns a `Result<Result<T>>`.
// Use `PromiseResult::flatten` to simplify it down to a single result.
//
// A choice now has to be made regarding the nullability of the _component batch's instances_.
// Our IDL doesn't support nullable instances at the moment -- so for the foreseeable future you probably
// shouldn't be using anything but `iter_dense`.
//
// This is the step at which caching comes into play.
//
// If the data has already been accessed with the same nullability characteristics in the
// past, then this will just grab the pre-deserialized, pre-resolved/pre-converted result from
// the cache.
//
// Otherwise, this will trigger a deserialization and cache the result for next time.

let points = match points.iter_dense::<MyPoint>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let colors = match colors.iter_dense::<MyColor>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let labels = match labels.iter_sparse::<MyLabel>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

// With the data now fully resolved/converted and deserialized, the joining logic can be
// applied.
//
// In most cases this will be either a clamped zip, or no joining at all.

let color_default_fn = || {
    static DEFAULT: MyColor = MyColor(0xFF00FFFF);
    &DEFAULT
};
let label_default_fn = || None;

let results =
    clamped_zip_1x2(points, colors, color_default_fn, labels, label_default_fn).collect_vec();
```


---

Part of a PR series to completely revamp the data APIs in preparation
for the removal of instance keys and the introduction of promises:
- #5573
- #5574
- #5581
- #5605
- #5606
- #5633
- #5673
- #5679
- #5687
- #5755
- TODO
- TODO

Builds on top of the static data PR series:
- #5534
teh-cmc added a commit that referenced this issue Apr 8, 2024
This implements the new uncached range APIs.

Latest-at & range queries are now much more similar than before and
share a lot of nice traits.

Tests have been backported from `re_query`.

Here's an example/guide of using the new API:
```rust
// First, get the raw results for this query.
//
// Raw here means that these results are neither deserialized, nor resolved/converted.
// I.e. this corresponds to the raw `DataCell`s, straight from our datastore.
let results: RangeResults = re_query2::range(
    &store,
    &query,
    &entity_path.into(),
    MyPoints::all_components().iter().cloned(), // no generics!
);

// Then, grab the raw results for each individual components.
//
// This is still raw data, but now a choice has been made regarding the nullability of the
// _component batch_ itself (that says nothing about its _instances_!).
//
// * `get_required` returns an error if the component batch is missing
// * `get_optional` returns an empty set of results if the component if missing
// * `get` returns an option
let all_points: &RangeComponentResults = results.get_required(MyPoint::name())?;
let all_colors: &RangeComponentResults = results.get_optional(MyColor::name());
let all_labels: &RangeComponentResults = results.get_optional(MyLabel::name());

let all_indexed_points = izip!(
    all_points.iter_indices(),
    all_points.iter_dense::<MyPoint>(&resolver)
);
let all_indexed_colors = izip!(
    all_colors.iter_indices(),
    all_colors.iter_sparse::<MyColor>(&resolver)
);
let all_indexed_labels = izip!(
    all_labels.iter_indices(),
    all_labels.iter_sparse::<MyLabel>(&resolver)
);

let all_frames = range_zip_1x2(all_indexed_points, all_indexed_colors, all_indexed_labels);

// Then comes the time to resolve/convert and deserialize the data, _for each timestamp_.
// These steps have to be done together for efficiency reasons.
//
// Both the resolution and deserialization steps might fail, which is why this returns a `Result<Result<T>>`.
// Use `PromiseResult::flatten` to simplify it down to a single result.
//
// A choice now has to be made regarding the nullability of the _component batch's instances_.
// Our IDL doesn't support nullable instances at the moment -- so for the foreseeable future you probably
// shouldn't be using anything but `iter_dense`.
eprintln!("results:");
for ((data_time, row_id), points, colors, labels) in all_frames {
    let points = match points.flatten() {
        PromiseResult::Pending => {
            // Handle the fact that the data isn't ready appropriately.
            continue;
        }
        PromiseResult::Ready(data) => data,
        PromiseResult::Error(err) => return Err(err.into()),
    };

    let colors = if let Some(colors) = colors {
        match colors.flatten() {
            PromiseResult::Pending => {
                // Handle the fact that the data isn't ready appropriately.
                continue;
            }
            PromiseResult::Ready(data) => data,
            PromiseResult::Error(err) => return Err(err.into()),
        }
    } else {
        vec![]
    };
    let color_default_fn = || Some(MyColor::from(0xFF00FFFF));

    let labels = if let Some(labels) = labels {
        match labels.flatten() {
            PromiseResult::Pending => {
                // Handle the fact that the data isn't ready appropriately.
                continue;
            }
            PromiseResult::Ready(data) => data,
            PromiseResult::Error(err) => return Err(err.into()),
        }
    } else {
        vec![]
    };
    let label_default_fn = || None;

    // With the data now fully resolved/converted and deserialized, the joining logic can be
    // applied.
    //
    // In most cases this will be either a clamped zip, or no joining at all.

    let results = clamped_zip_1x2(points, colors, color_default_fn, labels, label_default_fn)
        .collect_vec();
    eprintln!("{data_time:?} @ {row_id}:\n    {results:?}");
}
```

- Fixes #3379
- Part of #1893  

---

Part of a PR series to completely revamp the data APIs in preparation
for the removal of instance keys and the introduction of promises:
- #5573
- #5574
- #5581
- #5605
- #5606
- #5633
- #5673
- #5679
- #5687
- #5755
- TODO
- TODO

Builds on top of the static data PR series:
- #5534
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow 🚀 performance Optimization, memory use, etc 📺 re_viewer affects re_viewer itself
Projects
None yet
1 participant