-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand python APIs to support new data API concepts #7455
Comments
Awesome writeup!
One other possible direction to go here: What if you actually express the range or latest at query as an operation on the TimeColumnDescriptor? At that point you know what type it has in the store and can make it ergonomic. It's also symmetric in some way with how we handle other columns. It also avoids mistakes like doing (this is kind of a half baked idea so likely mega annoying in some obvious way) |
I think it half-solves the problem, but it still doesn't actually handle I'm hesitant to pull in something like https://pypi.org/project/custom-literals/, but that's of course the kind of behavior that would really be nice. |
Something I'm wondering about is how we handle multiple recording id's here. Multiple rrds could all have the same recording is, so something like this makes sense: recording = rr.data.load_recording("first.rrd", "second.rrd") However, we can't know up front if those files contain one or two recording ids. How do we handle that?
The same goes for application id and any future user defined ids |
Reminder: we still need an API for filtering all-empty columns. Examples: unused transform components, indicator components, etc. from the |
…he new query property (#7516) ### What This PR introduces a new `DataframeQueryV2` view property archetype which models the query according to the new dataframe API design (#7455) and the feature we actually want to support in the dataframe view (#7497). At this point, the new archetype is **NOT** used yet. It just lives alongside the previous iteration, which is still used by the actual view. The swap will occur later. <hr> Part of a series to address #6896 and #7498. All PRs: - #7515 - #7516 - #7527 - #7545 - #7551 - #7572 - #7573 ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7516?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7516?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide - [PR Build Summary](https://build.rerun.io/pr/7516) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`.
### What - First pass at implementing APIs for: #7455 - Introduces a new mechanism for directly exposing rust types into the python bridge via a .pyi definition Example notebook for testing ``` pixi run py-build-examples pixi run -e examples jupyter notebook tests/python/dataframe/examples.ipynb ``` ### Future work: - More docs / help strings - Remaining API features ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7357?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7357?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide - [PR Build Summary](https://build.rerun.io/pr/7357) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`.
Updated Proposal:
Improved concept definitions
.filter(TimeRange(start=..., end=...))
or maybe.filter_range(start=..., end=...))
view.using_index_values(self, values: ArrayLike)
.select(Timeline(), "Translation3D")
Python APIs
Original Proposal (archive):
Notes from exploration:
ComponentSelector
Proposals
Start with python refinement, and then back-propagate into rust if we like it.
Selections
The python
Dataset
object will internally track a set of columns that will be used for all queries along with anArc<ChunkStore>
.Introduce new
select_
variant APIs on theDataset
:dataset.select_entities(expr: str) -> Dataset
dataset.select_components(components: Sequence[ComponentLike]) -> Dataset
dataset.select_columns(column_selectors: : Sequence[ColumnSelector]) -> Dataset
Each of these has the potential to strictly filter/mutate the active set of descriptors relative to the previous step. I.e. first selection is from the complete set, each incremental selection only selects from the remaining set.
LatestAtQuery and RangeQuery
Our TimeType ambiguity continues to torment us.
The most ergonomic is clearly an API that looks like:
LatestAtQuery(timeline: str, at: int | float)
RangeQuery(timeline: str, min: int | float, max: int | float)
The big challenge here is that sane-looking APIs are ambiguous without knowledge of the timeline.
Concretely:
LatestAtQuery(timeline, 2.0)
needs to map to the TimeInt 2 if timeline is a Sequence, 2000000000 if timeline is Temporal and the user is thinking in seconds, and 2 if the timeline is temporal and the user is thinking in nanos.TODO: Still not sure what the right answer is here.
If we follow precedent from
TimeRangeBoundary
this ends up looking something:Choice A
Choice B, with some parameter-exploding could be simplified down to:
Choice C, diverging from what we do in TimeRangeBoundary:
Queries
Since the selection is now carried with the
Dataset
, you can now execute a query directly without providing columns.dataset.latest_at_query(latest_at: LatestAt)
dataset.range_query(range: Range, pov: ComponentSelector)
This means you can write a query like:
Column Naming
Selectors/Descriptors will be given a name.
This name will default to one of:
When specifying a component selector, users have the option to call
.rename_as()
to change the name of the component.These names are also valid INPUT to a ColumnSelector.
For example:
The text was updated successfully, but these errors were encountered: