Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable aggregation behaviors for time series plots #4271

Closed
teh-cmc opened this issue Nov 20, 2023 · 1 comment · Fixed by #4865
Closed

Configurable aggregation behaviors for time series plots #4271

teh-cmc opened this issue Nov 20, 2023 · 1 comment · Fixed by #4865
Assignees
Labels
💬 discussion 🚀 performance Optimization, memory use, etc ui concerns graphical user interface
Milestone

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Nov 20, 2023

Context

With query caching support coming soon ™️, the next bottleneck for displaying many and/or large time series plots is going to be actually rendering them.

While native support for plots in re_renderer seems like the obvious path going forwards in the long term, it'll take a lot of work to get there.
In the meantime, egui_plot is still our best bet.

The plot view currently works by following these rough steps:

  1. Query all the necessary data according to the current visible history query.
    Very very costly as of today, but about to be orders of magnitude cheaper with the introduction of query caching.
  2. Iterate through the data in order to generate the appropriate egui_plot primitives (points & lines).
    This is generally relatively fast in practice, although it can get very costly for degenerate cases (e.g. all points in the plot have different attributes).
  3. Tessellate the egui_plot primitives.
    This happens on the CPU and takes about 3ms for 100k points on my machine IIRC.
  4. Rendering.
    Plain old GPU rendering of the generated triangles... but keep in mind: there can be a lot of overdraw!

Proposal

The proposal is to introduce the notion of aggregation functions (MAX, AVG...) to our plot view.
If you squint at it, you can see aggregation functions as a kind of deterministic, user-controlled LOD mechanism.

The idea is straightforward:

  1. Introduce a way of asking the plot what's the range of a tick on the X axis at the current zoom level.
  • Might or might not exist already, I don't know.
  • Because it's all immediate mode, there's the usual chicken and egg problem, but I'm sure we'll find a way.
  1. Pre-aggregate the query results based on the range retrieved in step 1, so we get a single value per visible tick.
    We're already iterating through all results anyhow, so this won't add much cost.
  2. Only tessellate/render the pre-aggregated results.
    Should bring the rendering costs to near 0.

The aggregation function used would be configurable via a spaceview setting (blueprint!).
We would provide all the usual suspects: NONE, MIN, MAX, AVG, P90, P95, P99, P999.

NONE matches today's behavior: you get a faithful albeit potentially very messy and very slow representation of your data.
Anything else is a tradeoff in accurary in favor of performance/visibility. The hover UI would reflect that.

This feature would still be useful even once we switch to re_renderer-powered plots.

TBD

  • Heuristics: it would probably make sense to default the plots to something other than NONE if there more than N plots or P points in the recording.
  • How do secondary attributes (color, radius, scattered...) aggregate?
  • How does aggregation behave with continuous/decimal tick ranges?
@teh-cmc teh-cmc added 💬 discussion ui concerns graphical user interface 🚀 performance Optimization, memory use, etc labels Nov 20, 2023
@Wumpf
Copy link
Member

Wumpf commented Nov 20, 2023

great writeup!
I'd love if the answer would be that aggregations are down to be done on the GPU on the fly. But that incurs extra memory "upload" costs, doesn't fix pressure on the queries/query cache and is pretty cumbersome to pull off without compute shaders.

@teh-cmc teh-cmc self-assigned this Nov 24, 2023
@nikolausWest nikolausWest added this to the 0.13 milestone Jan 15, 2024
teh-cmc added a commit that referenced this issue Jan 25, 2024
⚠️ [Try it
live!](https://app.rerun.io/pr/4865/index.html?url=https://storage.googleapis.com/rerun-builds/pull_request/4865/plot_gauss2.rrd)
:warning:

Make it so users can configure an aggregation strategy in the rare case
where they either have so much data or are so zoomed out that most of
their plot results in an overdraw blurb.

Because this builds on top of the range cache, the data is neatly laid
out in a memory slice already so this is very cheap to compute.

In my tests, the `MinMax` strategy has worked so well that I've decided
to make it the default in the end... That might be controversial
:no_mouth:.

`Off` vs. `MinMax`, using the [new gaussian walk
benchmark](#4903):
![image
(26)](https://github.com/rerun-io/rerun/assets/2910679/1811becb-d213-44bb-87ea-0e4a7fa058ad)
![image
(27)](https://github.com/rerun-io/rerun/assets/2910679/b8d66c92-8719-4de5-a3cb-72c2ea4b1e96)
 


- Fixes #4271 
- DNR: requires #4856
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💬 discussion 🚀 performance Optimization, memory use, etc ui concerns graphical user interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants