Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batching 4: retire MsgBundle + batching support in transport layer #1679

Merged
merged 8 commits into from
Mar 29, 2023

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Mar 22, 2023

This PR finally gets rid of MsgBundle entirely, and puts DataTable in charge or serialization/deserialization.
Since DataTable support batches, the transport layer as a whole now supports batches too (which doesn't mean that the SDKs nor the store do!).

A batch always contains the following control columns:

  • rerun.row_id: a RowId uniquely identifying every row in the batch,
  • rerun.timepoint: a TimePoint for each row,
  • rerun.entity_path: the EntityPath that each row relates to,
  • rerun.num_instances: the expected number of instances for each row.

We're not yet in a position to benefit from all the niceties that batching is suppose to offer us; for that we now need to implement the changes to the DataStore on one hand and to the SDKs on the other hand.

Regressions

The new batches carry more information than the old schema (entity_path, num_instances, row_id..) and so it is expected to see some regression in very-micro benchmarks such as arrow_mono_points:

Benchmark suite Current: a8c0983 Previous: f8a263c Ratio
mono_points_arrow/generate_message_bundles 44481992 ns/iter (± 1123759) 47092378 ns/iter (± 853262) 0.94
mono_points_arrow/generate_messages 178549277 ns/iter (± 1487091) 123513169 ns/iter (± 1380848) 1.45
mono_points_arrow/encode_log_msg 217471314 ns/iter (± 725662) 157619842 ns/iter (± 1814858) 1.38
mono_points_arrow/encode_total 444945548 ns/iter (± 2328275) 329033469 ns/iter (± 1723866) 1.35
mono_points_arrow/decode_log_msg 264431976 ns/iter (± 1105351) 180134730 ns/iter (± 875308) 1.47
mono_points_arrow/decode_message_bundles 92561122 ns/iter (± 1346869) 53748777 ns/iter (± 782433) 1.72
mono_points_arrow/decode_total 358486103 ns/iter (± 1978345) 231988721 ns/iter (± 1345301) 1.55

This gets dwarfed to oblivion when you enable batching on this very same benchmark:

-    fn generate_tables() -> Vec<DataTable> {
-        (0..NUM_POINTS)
-            .map(|i| {
-                DataTable::from_rows(
-                    MsgId::ZERO,
-                    [DataRow::from_cells2(
-                        MsgId::ZERO,
-                        entity_path!("points", Index::Sequence(i as _)),
-                        [build_frame_nr(0.into())],
-                        1,
-                        (build_some_point2d(1), build_some_colors(1)),
-                    )],
-                )
-            })
-            .collect()
-    }

+    fn generate_table() -> DataTable {
+        DataTable::from_rows(
+            MsgId::ZERO,
+            (0..NUM_POINTS).map(|i| {
+                DataRow::from_cells2(
+                    MsgId::ZERO,
+                    entity_path!("points", Index::Sequence(i as _)),
+                    [build_frame_nr(0.into())],
+                    1,
+                    (build_some_point2d(1), build_some_colors(1)),
+                )
+            }),
+        )
+    }

image (8)


On top of #1673
Part of #1619

  • Self-review
  • No bench regression
  • No test regression
  • No memory usage regression
  • api_demo_rs looking good
  • api_demo_py looking good
  • TODO all the things that fit into Tracking issue: end-to-end batching #1619
  • open issues for everything else

@teh-cmc teh-cmc changed the title batching 4: retiring MsgBundle + ready to send batches batching 4: retire MsgBundle + ready to send batches Mar 22, 2023
@teh-cmc teh-cmc changed the base branch from main to cmc/datastore/rows_and_tables March 22, 2023 23:25
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch 4 times, most recently from 9e4661e to bc97c77 Compare March 23, 2023 14:19
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch 8 times, most recently from cedfddc to d33cd0f Compare March 24, 2023 13:43
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch 2 times, most recently from 14693bf to f7e3e9f Compare March 24, 2023 14:27
@teh-cmc teh-cmc changed the title batching 4: retire MsgBundle + ready to send batches batching 4: retire MsgBundle + batching support Mar 24, 2023
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch from f7e3e9f to 37b92b3 Compare March 24, 2023 14:47
@teh-cmc teh-cmc added enhancement New feature or request 🐍 Python API Python logging API 🏹 arrow concerning arrow 🦀 Rust API Rust logging API ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Mar 24, 2023
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch 3 times, most recently from 89c0780 to 03d231d Compare March 25, 2023 11:57
crates/re_data_store/src/log_db.rs Outdated Show resolved Hide resolved
crates/re_data_store/src/log_db.rs Outdated Show resolved Hide resolved
crates/re_data_store/src/log_db.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/arrow_msg.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines -19 to -34
/// The same component were put in the same log message multiple times.
/// E.g. `with_component()` was called multiple times for `Point3D`.
/// We don't support that yet.
#[error(
"All component collections must have exactly one row (i.e. no batching), got {0:?} instead. Perhaps with_component() was called multiple times with the same component type?"
)]
MoreThanOneRow(Vec<(ComponentName, usize)>),

/// Some components had more or less instances than some other.
/// For example, there were `10` positions and `8` colors.
#[error(
"All component collections must share the same number of instances (i.e. row length) \
for a given row, got {0:?} instead"
)]
MismatchedRowLengths(Vec<(ComponentName, u32)>),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These things are now checked by Data{Cell,Row,Table}

crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
rerun_py/src/python_bridge.rs Outdated Show resolved Hide resolved
crates/re_viewer/src/ui/event_log_view.rs Show resolved Hide resolved
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch 2 times, most recently from d8d507c to 1fa6aa7 Compare March 27, 2023 08:22
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch from 1fa6aa7 to dc1c47f Compare March 27, 2023 08:23
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch 4 times, most recently from 34c2db8 to cdebe4e Compare March 27, 2023 10:24
Base automatically changed from cmc/datastore/rows_and_tables to main March 27, 2023 13:25
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch from cdebe4e to eefab45 Compare March 27, 2023 13:26
@teh-cmc teh-cmc removed 🐍 Python API Python logging API 🦀 Rust API Rust logging API labels Mar 27, 2023
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch 3 times, most recently from 2923d41 to 50d0477 Compare March 27, 2023 14:30
crates/re_data_store/examples/memory_usage.rs Outdated Show resolved Hide resolved
crates/re_data_store/examples/memory_usage.rs Outdated Show resolved Hide resolved
crates/re_log_types/benches/msg_encode_benchmark.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/arrow_msg.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/arrow_msg.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_table.rs Show resolved Hide resolved
crates/re_log_types/src/time_point/mod.rs Outdated Show resolved Hide resolved
crates/re_viewer/src/ui/data_ui/log_msg.rs Show resolved Hide resolved
crates/re_viewer/src/ui/data_ui/log_msg.rs Outdated Show resolved Hide resolved
crates/re_viewer/src/ui/event_log_view.rs Show resolved Hide resolved
@teh-cmc teh-cmc force-pushed the cmc/datastore/retire_bundles branch from 110dfa2 to 1827b2c Compare March 27, 2023 15:13
@teh-cmc teh-cmc changed the title batching 4: retire MsgBundle + batching support batching 4: retire MsgBundle + batching support in transport layer Mar 27, 2023
@teh-cmc teh-cmc marked this pull request as ready for review March 27, 2023 15:48
@emilk emilk self-requested a review March 28, 2023 07:25
crates/re_log_types/src/data_row.rs Outdated Show resolved Hide resolved
#[error("Trying to deserialize data that is missing a column present in the schema: {0:?}")]
MissingColumn(String),

#[error("Trying to deserialize data that cannot possibly be a column: {0:?}")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this error

Copy link
Member Author

@teh-cmc teh-cmc Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to

    #[error("Trying to deserialize column data that doesn't contain any ListArrays: {0:?}")]
    NotAColumn(String),

crates/re_log_types/src/data_table.rs Outdated Show resolved Hide resolved
/// let labels: &[_] = &[Label("hey".into())];
/// DataRow::from_cells2(MsgId::random(), "c", timepoint(2, 1), num_instances, (colors, labels))
/// };
/// let table = DataTable::from_rows(table_id, [row1, row2, row3]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1-indexing? This is not Lua!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeeh I do think 1-indexing makes more sense when talking about tables: an excel sheet starts a row 1 🙃 but yeah, let's go with what a Rust programmer would expect

crates/re_log_types/src/data_table.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_table.rs Show resolved Hide resolved
crates/re_log_types/src/data_table.rs Outdated Show resolved Hide resolved
@teh-cmc teh-cmc merged commit d3b459f into main Mar 29, 2023
@teh-cmc teh-cmc deleted the cmc/datastore/retire_bundles branch March 29, 2023 08:00
@teh-cmc teh-cmc mentioned this pull request Apr 17, 2023
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow enhancement New feature or request 🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants