Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf and memory issues for >kHz time series data #5904

Closed
trueb2 opened this issue Apr 10, 2024 · 3 comments
Closed

Perf and memory issues for >kHz time series data #5904

trueb2 opened this issue Apr 10, 2024 · 3 comments
Labels
🪳 bug Something isn't working 🐑🐑 duplicate This issue or pull request already exists 🚀 performance Optimization, memory use, etc 📈 plot Plots, charts, graphs, timeseries, …

Comments

@trueb2
Copy link

trueb2 commented Apr 10, 2024

Rerun looks great! I have many hours of 1+ kHz IMU data to analyze, so I am hoping rerun can help with that. I know it is early days of support for the time series visualizations support for kHz data, but I saw https://www.rerun.io/blog/fast-plots and figured it would be good to provide some more data.

To Reproduce
Steps to reproduce the behavior:

  1. Install rerun cargo install rerun
  2. Create a new Rust project (Cargo.toml and main.rs below) ✅
  3. Start the viewer rerun
  4. Stream 3.3khz IMU accelerometer data cargo run --release
  5. See slow plotting and very high memory usage (>100x increase in RAM data size)
  6. ~1 hour to plot and 22GiB used for 16MIB parquet file

Expected behavior
I would expect memory usage 1-10x greater than the accelerometer in the data frame. Here I had 5M rows of ts, x, y, z (32 bytes total), so I expected between 150MiB and 1.5GiB.

Screenshots
Screenshot 2024-04-10 at 5 23 43 PM

Desktop (please complete the following information):

  • OS: macOS Sonoma 14.3.1

Rerun version
rerun-cli 0.15.0 [rustc 1.77.1 (7cf61ebde 2024-03-27), LLVM 17.0.6] aarch64-apple-darwin

Additional context
Here is the code for the streaming. I can provide larger or smaller IMU accelerometer data at 416, 833, 1.6k, 3.3k, or 6.6kHz if desired. To bog down the UI, I used about ~25min of data or about 15MiB of parquet.

The dependencies from Cargo.toml

[package]
name = "imu-xyz"
version = "0.1.0"
edition = "2021"

[dependencies]
itertools = "0.12.1"
polars = { version = "0.38.3", features = ["parquet", "lazy", "timezones"] }
rerun = "0.15.0"

The streaming code from main.rs

use itertools::izip;
use polars::prelude::*;
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    // Load data from a parquet file
    let df = LazyFrame::scan_parquet("stream-0x30.pq", Default::default())?.collect()?;
    println!("{:?}", df);

    // Use Rerun to load accelerometer data from parquet
    let recorder = rerun::RecordingStreamBuilder::new("imu_xyz").connect()?;

    let ts = df.column("ts")?;
    let x = df.column("c1")?;
    let y = df.column("c2")?;
    let z = df.column("c3")?;
    for row in izip!(
        ts.datetime()?.into_no_null_iter(),
        x.i64()?.into_no_null_iter(),
        y.i64()?.into_no_null_iter(),
        z.i64()?.into_no_null_iter()
    ) {
        let (ts, x, y, z) = row;
        recorder.set_time_nanos("ts", ts * 1000);
        recorder.log("imu/x", &rerun::Scalar::new(x as f64))?;
        recorder.log("imu/y", &rerun::Scalar::new(y as f64))?;
        recorder.log("imu/z", &rerun::Scalar::new(z as f64))?;
    }
    Ok(())
}

The stream-0x30.pq (in a .zip)
stream-0x30.pq.zip

imu-xyz % cargo run --release
    Finished release [optimized] target(s) in 0.72s
     Running `target/release/imu-xyz`
shape: (4_809_375, 4)
┌────────────────────────────────┬─────┬─────┬──────┐
│ ts                             ┆ c1  ┆ c2  ┆ c3   │
│ ---                            ┆ --- ┆ --- ┆ ---  │
│ datetime[μs, UTC]              ┆ i64 ┆ i64 ┆ i64  │
╞════════════════════════════════╪═════╪═════╪══════╡
│ 2024-04-10 21:11:27.020704 UTC ┆ 0   ┆ 0   ┆ 460  │
│ 2024-04-10 21:11:27.021003 UTC ┆ 0   ┆ 0   ┆ 2323 │
│ 2024-04-10 21:11:27.021302 UTC ┆ 0   ┆ 0   ┆ 4384 │
│ 2024-04-10 21:11:27.021601 UTC ┆ 0   ┆ 0   ┆ 4125 │
│ 2024-04-10 21:11:27.021900 UTC ┆ 0   ┆ -1  ┆ 3409 │
│ …                              ┆ …   ┆ …   ┆ …    │
│ 2024-04-10 21:35:24.780093 UTC ┆ 30  ┆ -61 ┆ 4028 │
│ 2024-04-10 21:35:24.780392 UTC ┆ 30  ┆ -61 ┆ 4020 │
│ 2024-04-10 21:35:24.780691 UTC ┆ 30  ┆ -62 ┆ 4031 │
│ 2024-04-10 21:35:24.780990 UTC ┆ 29  ┆ -62 ┆ 4025 │
│ 2024-04-10 21:35:24.781289 UTC ┆ 29  ┆ -62 ┆ 4015 │
└────────────────────────────────┴─────┴─────┴──────┘
@trueb2 trueb2 added 👀 needs triage This issue needs to be triaged by the Rerun team 🪳 bug Something isn't working labels Apr 10, 2024
@emilk emilk added 🚀 performance Optimization, memory use, etc 📈 plot Plots, charts, graphs, timeseries, … and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Apr 11, 2024
@teh-cmc
Copy link
Member

teh-cmc commented Apr 11, 2024

rr.Scalar isn't really designed for large scalar series, rather it is designed for real-time use cases that are mostly concerned with the last few minutes of data (using the "Visible time range" feature).
The reason for this is that rr.Scalar has access to all the fancy time-related features in Rerun, and those features require a lot of extra memory and CPU overhead to be made possible.

We're always working on improving the performance of rr.Scalar, but it fundamentally isn't compatible with large series with millions of points.

We do have an open proposal for a ScalarChart though, which would be a more feature-limited version of Scalar, with basically no memory nor CPU overhead (please add a 👍 if you're interested in something like this, it helps us prioritize!):

@trueb2
Copy link
Author

trueb2 commented Apr 11, 2024

Thanks for the insight! I saw an issue regarding audio, but many wearable sensor data streams like biopotential and accelerometer are sub-10kHz and have few columns.

I am interested in streaming data for multimodal research streams similar to the linked issue. The common factor is typically that data was sent over BLE, so data streams are typically transferred at <400kbps. In the future, I hope that data from multiple sensors could be logged (using a python or rust script) to rerun for easy visualization.

If you would like some more data please let me know. Thanks for all your work on this project!

@emilk
Copy link
Member

emilk commented Apr 23, 2024

Thanks for the report!

We've summarized this in a tracking issue now, with some concrete issues for moving forward that we'll start working on soon:

@emilk emilk closed this as not planned Won't fix, can't repro, duplicate, stale Apr 23, 2024
@emilk emilk added the 🐑🐑 duplicate This issue or pull request already exists label Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working 🐑🐑 duplicate This issue or pull request already exists 🚀 performance Optimization, memory use, etc 📈 plot Plots, charts, graphs, timeseries, …
Projects
None yet
Development

No branches or pull requests

3 participants