Clean up Arrow extension hell, implement `RERUN:component_name` #3360

teh-cmc · 2023-09-19T08:38:23Z

I'll write something down at some point, but for the moment, refer to this thread.

A `TransportChunk` is a `Chunk` that is ready for transport and/or storage. It is very cheap to go from `Chunk` to a `TransportChunk` and vice-versa. A `TransportChunk` maps 1:1 to a native Arrow `RecordBatch`. It has a stable ABI, and can be cheaply send across process boundaries. `arrow2` has no `RecordBatch` type; we will get one once we migrate to `arrow-rs`. A `TransportChunk` is self-describing: it contains all the data _and_ metadata needed to index it into storage. We rely heavily on chunk-level and field-level metadata to communicate Rerun-specific semantics over the wire, e.g. whether some columns are already properly sorted. The Arrow metadata system is fairly limited -- it's all untyped strings --, but for now that seems good enough. It will be trivial to switch to something else later, if need be. - Fixes #1760 - Fixes #1692 - Fixes #3360 - Fixes #1696 --- Part of a PR series to implement our new chunk-based data model on the client-side (SDKs): - #6437 - #6438 - #6439 - #6440 - #6441

teh-cmc added 🏹 arrow concerning arrow ⛃ re_datastore affects the datastore itself labels Sep 19, 2023

This was referenced Oct 9, 2023

A Component's DataType should embed its metadata #1696

Closed

Tracking issue: Migrate from re_arrow2 to arrow #3741

Open

teh-cmc self-assigned this May 16, 2024

teh-cmc mentioned this issue May 27, 2024

Client-side chunks 2: introduce TransportChunk #6439

Merged

5 tasks

teh-cmc closed this as completed in #6439 May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up Arrow extension hell, implement `RERUN:component_name` #3360

Clean up Arrow extension hell, implement `RERUN:component_name` #3360

teh-cmc commented Sep 19, 2023

Clean up Arrow extension hell, implement RERUN:component_name #3360

Clean up Arrow extension hell, implement RERUN:component_name #3360

Comments

teh-cmc commented Sep 19, 2023

Clean up Arrow extension hell, implement `RERUN:component_name` #3360

Clean up Arrow extension hell, implement `RERUN:component_name` #3360