Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataStore changelog 4: add standalone "Custom StoreView" example #4206

Merged
merged 4 commits into from
Nov 15, 2023

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Nov 12, 2023

Standalone example of how to implement and register custom StoreViews, even from external code.


DataStore changelog PR series:

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested demo.rerun.io (if applicable)
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG

@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself do-not-merge Do not merge this PR exclude from changelog PRs with this won't show up in CHANGELOG.md 🔩 data model labels Nov 13, 2023
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_3_store_views branch from c29f8a2 to 65dfa8e Compare November 13, 2023 16:59
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_4_custom_storeview branch from c0cd7fe to d171205 Compare November 13, 2023 17:00
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_3_store_views branch from 65dfa8e to 838d2c5 Compare November 13, 2023 17:08
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_4_custom_storeview branch from d171205 to 0994a17 Compare November 13, 2023 17:08
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_3_store_views branch from 838d2c5 to 79e1d31 Compare November 13, 2023 18:07
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_4_custom_storeview branch from 0994a17 to 0fc5593 Compare November 13, 2023 18:08
@teh-cmc teh-cmc marked this pull request as ready for review November 14, 2023 08:08
@Wumpf Wumpf self-requested a review November 14, 2023 12:06
Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't realize how this is a great usefacing feature until I opened this PR! :)

Can't build objectron locally on this branch right now, getting the usual env import build failure from wasm-bindgen. Hoping that this is resolved in the final PR..?

}

fn on_events(&mut self, events: &[StoreEvent]) {
print!("\x1B[2J\x1B[1;1H"); // terminal clear + cursor reset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this will work on windows. It should. But I'm nervous. Will check later and fix for you if necessary :)

}

impl StoreView for Orchestrator {
fn name(&self) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have commented this on an earlier pr, but practically speaking it seems that returning strings is kinda annoying. What do you think about fn name('a self) -> &'a str?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier comment on your earlier comment!

teh-cmc added a commit that referenced this pull request Nov 15, 2023
…4202)

The upcoming `StoreView` works in global scope: by registering a view
you subscribe to changes to _all_ `DataStore`s, including those that are
yet to be created.
This is very powerful as it allows views & triggers implementers to
build cross-recording indices as well as be notified as soon as new
recordings come in and go out.

But it means that `StoreEvent`s must indicate which `DataStore` they
originate from, which isn't possible today since the stores themselves
don't know who they are to begin with.
This trivial PR plumbs the `StoreId` all the way through so `DataStore`s
know about their own ID.

Also made `StoreGeneration` account for the GC counter while I was at
it.

---

Requires:
- #4215 

`DataStore` changelog PR series:
- #4202
- #4203
- #4205
- #4206
- #4208
- #4209
teh-cmc added a commit that referenced this pull request Nov 15, 2023
Introduces `StoreEvent`, an event that describes the atomic unit of
change in the Rerun `DataStore`: a row has been added to or removed from
the store.

`StoreEvent`s are fired on both the insertion and garbage collection
paths, enabling listeners to build arbitrary, always up-to-date views &
trigger systems.

```rust
/// The atomic unit of change in the Rerun [`DataStore`].
///
/// A [`StoreEvent`] describes the changes caused by the addition or deletion of a
/// [`re_log_types::DataRow`] in the store.
///
/// Methods that mutate the [`DataStore`], such as [`DataStore::insert_row`] and [`DataStore::gc`],
/// return [`StoreEvent`]s that describe the changes.
///
/// Refer to field-level documentation for more details and check out [`StoreDiff`] for a precise
/// definition of what an event involves.
#[derive(Debug, Clone, PartialEq)]
pub struct StoreEvent {
    /// Which [`DataStore`] sent this event?
    pub store_id: StoreId,

    /// What was the store's generation when it sent that event?
    pub store_generation: StoreGeneration,

    /// Monotonically increasing ID of the event.
    ///
    /// This is on a per-store basis.
    ///
    /// When handling a [`StoreEvent`], if this is the first time you process this [`StoreId`] and
    /// the associated `event_id` is not `1`, it means you registered late and missed some updates.
    pub event_id: u64,

    /// What actually changed?
    ///
    /// Refer to [`StoreDiff`] for more information.
    pub diff: StoreDiff,
}

/// Describes an atomic change in the Rerun [`DataStore`]: a row has been added or deleted.
///
/// From a query model standpoint, the [`DataStore`] _always_ operates one row at a time:
/// - The contents of a row (i.e. its columns) are immutable past insertion, by virtue of
///   [`RowId`]s being unique and non-reusable.
/// - Similarly, garbage collection always removes _all the data_ associated with a row in one go:
///   there cannot be orphaned columns. When a row is gone, all data associated with it is gone too.
///
/// Refer to field-level documentation for more information.
#[derive(Debug, Clone, PartialEq)]
pub struct StoreDiff {
    /// Addition or deletion?
    ///
    /// The store's internals are opaque and don't necessarily reflect the query model (e.g. there
    /// might be data in the store that cannot by reached by any query).
    ///
    /// A [`StoreDiff`] answers a logical question: "does there exist a query path which can return
    /// data from that row?".
    pub kind: StoreDiffKind,

    /// What's the row's [`RowId`]?
    ///
    /// [`RowId`]s are guaranteed to be unique within a single [`DataStore`].
    ///
    /// Put another way, the same [`RowId`] can only appear twice in a [`StoreDiff`] event:
    /// one addition and (optionally) one deletion (in that order!).
    pub row_id: RowId,

    /// The [`TimePoint`] associated with that row.
    ///
    /// Since insertions and deletions both work on a row-level basis, this is guaranteed to be the
    /// same value for both the insertion and deletion events (if any).
    pub timepoint: TimePoint,

    /// The [`EntityPath`] associated with that row.
    ///
    /// Since insertions and deletions both work on a row-level basis, this is guaranteed to be the
    /// same value for both the insertion and deletion events (if any).
    pub entity_path: EntityPath,

    /// All the [`DataCell`]s associated with that row.
    ///
    /// Since insertions and deletions both work on a row-level basis, this is guaranteed to be the
    /// same set of values for both the insertion and deletion events (if any).
    pub cells: IntMap<ComponentName, DataCell>,
}
```


---

`DataStore` changelog PR series:
- #4202
- #4203
- #4205
- #4206
- #4208
- #4209
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_3_store_views branch from d5b3373 to eb01c98 Compare November 15, 2023 09:40
Base automatically changed from cmc/store_changelog_3_store_views to main November 15, 2023 09:53
teh-cmc added a commit that referenced this pull request Nov 15, 2023
Introducing the `StoreView` trait and registration system, allowing
anybody to subscribe to `DataStore` changes, even from external code.

`StoreView`s global scope: by registering a view you subscribe to
changes to _all_ `DataStore`s, including those that are yet to be
created.
This is very powerful as it allows views & triggers implementers to
build cross-recording indices as well as be notified as soon as new
recordings come in and go out.

```rust
/// A [`StoreView`] subscribes to atomic changes in one or more [`DataStore`]s through [`StoreEvent`]s.
///
/// [`StoreView`]s can be used to build both secondary indices and trigger systems.
pub trait StoreView: std::any::Any + Send + Sync {
    /// Arbitrary name for the view.
    ///
    /// Does not need to be unique.
    fn name(&self) -> String;

    /// Workaround for downcasting support, simply return `self`:
    /// ```ignore
    /// fn as_any(&self) -> &dyn std::any::Any {
    ///     self
    /// }
    /// ```
    fn as_any(&self) -> &dyn std::any::Any;

    /// Workaround for downcasting support, simply return `self`:
    /// ```ignore
    /// fn as_any_mut(&mut self) -> &mut dyn std::any::Any {
    ///     self
    /// }
    /// ```
    fn as_any_mut(&mut self) -> &mut dyn std::any::Any;

    /// The core of this trait: get notified of changes happening in one or more [`DataStore`]s.
    ///
    /// This will be called automatically by the [`DataStore`] itself if the view has been
    /// registered: [`DataStore::register_view`].
    /// Or you might want to feed it [`StoreEvent`]s manually, depending on your use case.
    ///
    /// ## Example
    ///
    /// ```ignore
    /// fn on_events(&mut self, events: &[StoreEvent]) {
    ///     use re_arrow_store::StoreDiffKind;
    ///     for event in events {
    ///         match event.kind {
    ///             StoreDiffKind::Addition => println!("Row added: {}", event.row_id),
    ///             StoreDiffKind::Deletion => println!("Row removed: {}", event.row_id),
    ///         }
    ///     }
    /// }
    /// ```
    fn on_events(&mut self, events: &[StoreEvent]);
}
```


---

`DataStore` changelog PR series:
- #4202
- #4203
- #4205
- #4206
- #4208
- #4209
@teh-cmc teh-cmc force-pushed the cmc/store_changelog_4_custom_storeview branch from 0fc5593 to 5834e90 Compare November 15, 2023 09:57
@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Nov 15, 2023
@teh-cmc teh-cmc merged commit 412f6c0 into main Nov 15, 2023
30 of 31 checks passed
@teh-cmc teh-cmc deleted the cmc/store_changelog_4_custom_storeview branch November 15, 2023 10:05
teh-cmc added a commit that referenced this pull request Nov 15, 2023
This is mostly preliminary work for #4209, which makes this PR a bit
weird. Basically just trying to offload complexity from #4209.

`TimesPerTimeline` as well as `TimeHistogramPerTimeline` are now living
on their own and are maintained as `StoreView`s, i.e. they react to
changes to the `DataStore` rather than constructing alternate truths.

This is the first step towards turning the `EntityTree` giga-structure
into an event-driven view in the next PR.

---

`DataStore` changelog PR series:
- #4202
- #4203
- #4205
- #4206
- #4208
- #4209
teh-cmc added a commit that referenced this pull request Nov 15, 2023
Turns the `EntityTree` giga-datastructure into a `StoreView`, meaning it
now reacts to `StoreEvent`s rather than creating alternate truths.

This introduces the notion of cascading side-effects, and more
specifically `ClearCascade`s.
When the `EntityTree` reacts to changes in the store, this might cause
cascading effects (e.g. pending clears), that in turn need to write back
to the store, which in turn sends more events to react to!
The cycle is guaranteed finite because "clears don't get cleared"!

Cascading side-effects have an interesting requirement: they need to log
their cascaded data using a `RowId` _similar_ to the one used in the
original event that caused the cascade (so they get GC'd at roughly the
same pace).
"Similar" in this cases means that their `TUID` shares the same
timestamp and that the new `RowId` is strictly greater than the old one.

`PathOp` has finally been annihilated.

According to our new "Clears" & "Time Histograms" test suites, this
behaves exactly like the `main` branch.


---

`DataStore` changelog PR series:
- #4202
- #4203
- #4205
- #4206
- #4208
- #4209
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔩 data model exclude from changelog PRs with this won't show up in CHANGELOG.md ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants