-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK DataLoader
s 5: customizable (external) loaders for Rust
#5351
Conversation
DataLoader
s 5: customizable (external) loaders for Rust
1489296
to
33c9161
Compare
6aac240
to
d61342b
Compare
Exposes raw `DataLoader`s to the Python SDK through 2 new methods: `RecordingStream.log_file_from_path` & `RecordingStream.log_file_from_contents`. Everything streams asynchronously, as expected. There's no way to configure the behavior of the `DataLoader` at all, yet. That comes next. ```python rr.log_file_from_path(filepath) ``` ![image](https://github.com/rerun-io/rerun/assets/2910679/0fe2d39c-f069-44a6-b836-e31001b3adaa) --- Part of series of PR to expose configurable `DataLoader`s to our SDKs: - #5327 - #5328 - #5330 - #5337 - #5351 - #5355
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "would you kindly" nature of the arg struct is annoying me a bit, but what can we do other than "handling whatever real users throw at it" :) 🤷
fn timepoint_from_args(args: &Args) -> anyhow::Result<rerun::TimePoint> { | ||
let mut timepoint = rerun::TimePoint::default(); | ||
{ | ||
for time_str in &args.time { | ||
let Some((timeline_name, time)) = time_str.split_once('=') else { | ||
continue; | ||
}; | ||
timepoint.insert( | ||
rerun::Timeline::new_temporal(timeline_name), | ||
time.parse::<i64>()?.into(), | ||
); | ||
} | ||
|
||
for seq_str in &args.sequence { | ||
let Some((seqline_name, seq)) = seq_str.split_once('=') else { | ||
continue; | ||
}; | ||
timepoint.insert( | ||
rerun::Timeline::new_sequence(seqline_name), | ||
seq.parse::<i64>()?.into(), | ||
); | ||
} | ||
} | ||
Ok(timepoint) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like somthing we may want to provide as a public utility? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the whole external DataLoader
API surface as small as possible for now -- we know we're going to break things a million times; I'd rather break less...
/// optional prefix for all entity paths | ||
#[argh(option)] | ||
entity_path_prefix: Option<String>, | ||
|
||
/// optional mark data to be logged as timeless | ||
#[argh(switch)] | ||
timeless: bool, | ||
|
||
/// optional timestamps to log at (e.g. `--time sim_time=1709203426`) (repeatable) | ||
#[argh(option)] | ||
time: Vec<String>, | ||
|
||
/// optional sequences to log at (e.g. `--sequence sim_frame=42`) (repeatable) | ||
#[argh(option)] | ||
sequence: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if there's a way for data loader authors to "inherit" these arguments. Can clap work with nested arg strucs..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do inheritance with clap
(we already do so for connect
, spawn
, etc).
But again I want to keep this whole thing as small and quick-n-dirty as possible, at least for now -- the whole point is to show that's it's just a dumb CLI app and there's no magic involved and one could do the same using any language / CLI library.
|
||
// --- | ||
|
||
/// Recommended settings for the [`DataLoader`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the recommended
method on this, but I'm confused/not convinced that this thing as a whole is "recommended settings". Why not just DataLoaderSettings
and then each field elaborates what effect it expects (already the case)? I get that we're defensive here and want to make clear that a data loader that ignores the settings is still a valid data loader and these are all "would you kindly please"-arguments, but I find baking that into the name goes a bit too far and is more confusing than helping.
... anyways, not a hill I gonna die on so if you prefer this as is keep it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's something more we could write about the (lack of) consequences if a data loader ignores the args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we can rename to DataLoaderSettings
.
I wonder if there's something more we could write about the (lack of) consequences if a data loader ignores the args
We've already done so in a bunch of places, including the guide.
rec = rec.recording_id(recording_id); | ||
}; | ||
|
||
// The most important part of this: log to standard output so the Rerun Viewer can ingest it! | ||
rec.stdout()? | ||
}; | ||
|
||
rec.log_timeless( | ||
rerun::EntityPath::from_file_path(&args.filepath), | ||
// TODO(#3841): really need those send APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is example code to be consumed by users it would be nice(r) to write about what they can expect to improve here in the future inead :)
Exposes raw `DataLoader`s to the C++ SDK through 2 new methods: `RecordingStream::log_file_from_path` & `RecordingStream::log_file_from_contents`. Everything streams asynchronously, as expected. There's no way to configure the behavior of the `DataLoader` at all, yet. That comes next. :warning: I'm carrying bytes around in a `string_view` (see `rr_bytes`) and related, and the internet has been unhelpful to tell e whether I should or shouldn't. ```cpp rec.log_file_from_path(filepath); ``` ![image](https://github.com/rerun-io/rerun/assets/2910679/1933f08b-1b2b-483b-9576-ce2e0421cb62) --- Part of series of PR to expose configurable `DataLoader`s to our SDKs: - #5327 - #5328 - #5330 - #5337 - #5351 - #5355 --------- Co-authored-by: Andreas Reich <[email protected]>
Introduces `RecordingStream::clone_weak`, so `DataLoader` threads started from the SDK don't prevent the recording from flushing once `main` goes out of scope, all the while making sure that `DataLoader`s run to completion. ![image](https://github.com/rerun-io/rerun/assets/2910679/94bf7b16-cf9b-40ce-a190-d255328af3f8) - Mitigates #5335 --- Part of series of PR to expose configurable `DataLoader`s to our SDKs: - #5327 - #5328 - #5330 - #5337 - #5351 - #5355
e50f9d3
to
c220ab0
Compare
Introduces the new `DataLoaderSettings` business to Python and update examples accordingly (`external_data_loader` & `log_file`). ```bash python examples/python/external_data_loader/main.py --recording-id this-one --entity-path-prefix a/b/c --time sim_time=1000 --time wall_time=1709204046 --sequence sim_frame=42 examples/python/dna/main.py | rerun - ``` ![image](https://github.com/rerun-io/rerun/assets/2910679/bfda567d-3d16-42cd-be8e-8b1a0767a784) Checks: - [x] external loader ran manually (`python loader | rerun`) - [x] external loader via rerun (`rerun xxx.py`) - [x] log_file with external loader (`log_file xxx.py`) --- Part of series of PR to expose configurable `DataLoader`s to our SDKs: - #5327 - #5328 - #5330 - #5337 - #5351 - #5355 - #5361
This makes the `log_file` APIs behave more like the standard `log` APIs; i.e. they inherit the state of their associated `RecordingStream` (app_id, rec_id, timepoint, etc...). Also inherit the application ID while we're at it. Makes the API much nicer to use _and_ much more consistent with the rest. Checks: - [x] external loader ran manually (`python loader | rerun`) - [x] external loader via rerun (`rerun xxx.py`) - [x] log_file with external loader (`log_file xxx.py`) - [x] external loader ran manually (`loader | rerun`) - [x] external loader via rerun (`rerun xxx.rs`) - [x] log_file with external loader (`log_file xxx.rs`) --- Part of series of PR to expose configurable `DataLoader`s to our SDKs: - #5327 - #5328 - #5330 - #5337 - #5351 - #5355 - #5379 - #5361 - #5388 --------- Co-authored-by: Andreas Reich <[email protected]>
Introduces the new `DataLoaderSettings` business to C++ and update examples accordingly (`external_data_loader` & `log_file`). ```bash ./build/debug/examples/cpp/log_file/example_log_file --recording-id this-one --entity-path-prefix a/b/c --time sim_time=1000 --time wall_time=1709204046 --sequence sim_frame=42 rerun_cpp/tests/main.cpp | rerun - ``` ![image](https://github.com/rerun-io/rerun/assets/2910679/b979a24c-29b6-473b-91b1-de3832bea436) Checks: - [x] external loader ran manually (`loader.exe | rerun`) - [x] external loader via rerun (`rerun xxx.cpp`) - [x] log_file with external loader (`log_file xxx.cpp`) --- Part of series of PR to expose configurable `DataLoader`s to our SDKs: - #5327 - #5328 - #5330 - #5337 - #5351 - #5355 - #5379 - #5361 - #5388
I guess that's good enough 🤷. I don't know, my brain has been completely friend by C++ non-sense all day. This includes a fix to make sure that a viewer that was spawned from the python SDK is still allowed to spawn dataloaders implemented in python (`RERUN_APP_ONLY` shenaniganeries). - Fixes #4526 --- Part of series of PR to expose configurable `DataLoader`s to our SDKs: - #5327 - #5328 - #5330 - #5337 - #5351 - #5355 - #5379 - #5361 - #5388
Adds new
RecommendedLoadSettings
that gets passed to allDataLoader
s -- builtin and external -- in order to customize their behaviors.This includes:
cargo r -p rerun-loader-rust-file -- run_wasm/src/main.rs --recording-id this-one --entity-path-prefix a/b/c --time sim_time=1000 --time wall_time=1709204046 --sequence sim_frame=42 | rerun -
Checks:
loader | rerun
)rerun xxx.rs
)log_file xxx.rs
)Part of series of PR to expose configurable
DataLoader
s to our SDKs:DataLoader
s 1: barebones Rust support #5327DataLoader
s 2: barebones Python support #5328DataLoader
s 3: barebones C and C++ support #5330DataLoader
s 4: working around shutdown brittleness #5337DataLoader
s 5: customizable (external) loaders for Rust #5351DataLoader
s 6: customizable (external) loaders for Python #5355DataLoader
s 7: stateful file logging #5379DataLoader
s 8: customizable (external) loaders for C++ #5361DataLoader
s 9: polish, docs, etc #5388Checklist
main
build: app.rerun.ionightly
build: app.rerun.io