Configurable aggregation behaviors for time series plots #4271

teh-cmc · 2023-11-20T12:09:05Z

Context

With query caching support coming soon ™️, the next bottleneck for displaying many and/or large time series plots is going to be actually rendering them.

While native support for plots in re_renderer seems like the obvious path going forwards in the long term, it'll take a lot of work to get there.
In the meantime, egui_plot is still our best bet.

The plot view currently works by following these rough steps:

Query all the necessary data according to the current visible history query.
Very very costly as of today, but about to be orders of magnitude cheaper with the introduction of query caching.
Iterate through the data in order to generate the appropriate egui_plot primitives (points & lines).
This is generally relatively fast in practice, although it can get very costly for degenerate cases (e.g. all points in the plot have different attributes).
Tessellate the egui_plot primitives.
This happens on the CPU and takes about 3ms for 100k points on my machine IIRC.
Rendering.
Plain old GPU rendering of the generated triangles... but keep in mind: there can be a lot of overdraw!

Proposal

The proposal is to introduce the notion of aggregation functions (MAX, AVG...) to our plot view.
If you squint at it, you can see aggregation functions as a kind of deterministic, user-controlled LOD mechanism.

The idea is straightforward:

Introduce a way of asking the plot what's the range of a tick on the X axis at the current zoom level.

Might or might not exist already, I don't know.
Because it's all immediate mode, there's the usual chicken and egg problem, but I'm sure we'll find a way.

Pre-aggregate the query results based on the range retrieved in step 1, so we get a single value per visible tick.
We're already iterating through all results anyhow, so this won't add much cost.
Only tessellate/render the pre-aggregated results.
Should bring the rendering costs to near 0.

The aggregation function used would be configurable via a spaceview setting (blueprint!).
We would provide all the usual suspects: NONE, MIN, MAX, AVG, P90, P95, P99, P999.

NONE matches today's behavior: you get a faithful albeit potentially very messy and very slow representation of your data.
Anything else is a tradeoff in accurary in favor of performance/visibility. The hover UI would reflect that.

This feature would still be useful even once we switch to re_renderer-powered plots.

TBD

Heuristics: it would probably make sense to default the plots to something other than NONE if there more than N plots or P points in the recording.
How do secondary attributes (color, radius, scattered...) aggregate?
How does aggregation behave with continuous/decimal tick ranges?

The text was updated successfully, but these errors were encountered:

Wumpf · 2023-11-20T12:17:50Z

great writeup!
I'd love if the answer would be that aggregations are down to be done on the GPU on the fly. But that incurs extra memory "upload" costs, doesn't fix pressure on the queries/query cache and is pretty cumbersome to pull off without compute shaders.

⚠️ [Try it live!](https://app.rerun.io/pr/4865/index.html?url=https://storage.googleapis.com/rerun-builds/pull_request/4865/plot_gauss2.rrd) :warning: Make it so users can configure an aggregation strategy in the rare case where they either have so much data or are so zoomed out that most of their plot results in an overdraw blurb. Because this builds on top of the range cache, the data is neatly laid out in a memory slice already so this is very cheap to compute. In my tests, the `MinMax` strategy has worked so well that I've decided to make it the default in the end... That might be controversial :no_mouth:. `Off` vs. `MinMax`, using the [new gaussian walk benchmark](#4903): ![image (26)](https://github.com/rerun-io/rerun/assets/2910679/1811becb-d213-44bb-87ea-0e4a7fa058ad) ![image (27)](https://github.com/rerun-io/rerun/assets/2910679/b8d66c92-8719-4de5-a3cb-72c2ea4b1e96) - Fixes #4271 - DNR: requires #4856

teh-cmc added 💬 discussion ui concerns graphical user interface 🚀 performance Optimization, memory use, etc labels Nov 20, 2023

teh-cmc self-assigned this Nov 24, 2023

jleibs mentioned this issue Nov 28, 2023

Allow (some) plots to aggregate / downsample data automatically based on zoom-level. #2015

Closed

nikolausWest added this to the 0.13 milestone Jan 15, 2024

teh-cmc mentioned this issue Jan 18, 2024

Configurable dynamic plot aggregation based on zoom-level #4865

Merged

4 tasks

teh-cmc closed this as completed in #4865 Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable aggregation behaviors for time series plots #4271

Configurable aggregation behaviors for time series plots #4271

teh-cmc commented Nov 20, 2023

Wumpf commented Nov 20, 2023 •

edited

Loading

Configurable aggregation behaviors for time series plots #4271

Configurable aggregation behaviors for time series plots #4271

Comments

teh-cmc commented Nov 20, 2023

Context

Proposal

TBD

Wumpf commented Nov 20, 2023 • edited Loading

Wumpf commented Nov 20, 2023 •

edited

Loading