-
Notifications
You must be signed in to change notification settings - Fork 648
feat: add proto serialization for FilteredReadExec #5954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
14527fa
cc80861
d0f24d7
3348756
cdf75dd
bd82778
2fd5742
6e4edc2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // SPDX-FileCopyrightText: Copyright The Lance Authors | ||
|
|
||
| syntax = "proto3"; | ||
|
|
||
| package lance.datafusion; | ||
|
|
||
| import "table_identifier.proto"; | ||
|
|
||
| message U64Range { | ||
| uint64 start = 1; | ||
| uint64 end = 2; | ||
| } | ||
|
|
||
| message ProjectionProto { | ||
| repeated int32 field_ids = 1; | ||
| bool with_row_id = 2; | ||
| bool with_row_addr = 3; | ||
| bool with_row_last_updated_at_version = 4; | ||
| bool with_row_created_at_version = 5; | ||
| BlobHandlingProto blob_handling = 6; | ||
| } | ||
|
|
||
| message BlobHandlingProto { | ||
| oneof mode { | ||
| // All blobs read as binary | ||
| bool all_binary = 1; | ||
| // Blobs as descriptions, other binary as binary (default) | ||
| bool blobs_descriptions = 2; | ||
| // All binary columns as descriptions | ||
| bool all_descriptions = 3; | ||
| // Specific blobs read as binary, rest as descriptions (non-blob binary stays binary) | ||
| FieldIdSet some_blobs_binary = 4; | ||
| // Specific columns as binary, all other binary as descriptions | ||
| FieldIdSet some_binary = 5; | ||
| } | ||
| } | ||
|
|
||
| message FieldIdSet { | ||
| repeated uint32 field_ids = 1; | ||
| } | ||
|
|
||
| message FilteredReadThreadingModeProto { | ||
| oneof mode { | ||
| uint64 one_partition_multiple_threads = 1; | ||
| uint64 multiple_partitions = 2; | ||
| } | ||
| } | ||
|
|
||
| // Serializable form of FilteredReadOptions. | ||
| message FilteredReadOptionsProto { | ||
| optional U64Range scan_range_before_filter = 1; | ||
| optional U64Range scan_range_after_filter = 2; | ||
| bool with_deleted_rows = 3; | ||
| optional uint32 batch_size = 4; | ||
| optional uint64 fragment_readahead = 5; | ||
| repeated uint64 fragment_ids = 6; | ||
| ProjectionProto projection = 7; | ||
| optional bytes refine_filter_substrait = 8; | ||
| optional bytes full_filter_substrait = 9; | ||
| FilteredReadThreadingModeProto threading_mode = 10; | ||
| optional uint64 io_buffer_size_bytes = 11; | ||
| // Arrow IPC schema for decoding Substrait filters (may be wider than projection). | ||
| optional bytes filter_schema_ipc = 12; | ||
| } | ||
|
|
||
| // Serializable form of FilteredReadPlan (planned/distributed mode). | ||
| // RowAddrTreeMap serialized via its built-in serialize_into/deserialize_from. | ||
| // Per-fragment filters are Substrait-encoded and deduplicated. | ||
| message FilteredReadPlanProto { | ||
| bytes row_addr_tree_map = 1; | ||
| optional U64Range scan_range_after_filter = 2; | ||
| // Arrow IPC schema for decoding Substrait filters (matches the schema used at encode time). | ||
| optional bytes filter_schema_ipc = 3; | ||
| // Per-fragment filter mapping. Key is fragment id, value is a list index into | ||
| // filter_expressions. Multiple fragments can share the same list index when | ||
| // they have the same filter, avoiding duplicate Substrait encoding. | ||
| map<uint32, uint32> fragment_filter_ids = 4; | ||
| // Deduplicated Substrait-encoded filter expressions. Each entry is referenced | ||
| // by one or more values in fragment_filter_ids. | ||
| repeated bytes filter_expressions = 5; | ||
| } | ||
|
|
||
| // Top-level wrapper for FilteredReadExec serialization. | ||
| message FilteredReadExecProto { | ||
| TableIdentifier table = 1; | ||
| FilteredReadOptionsProto options = 2; | ||
| // FilteredRead has two modes | ||
| // Plan-then-execute (distributed): The planner creates a FilteredReadPlan and sends it to a remote executor. | ||
| // Plan-and-execute (local): The executor creates the plan itself at execution time. | ||
| optional FilteredReadPlanProto plan = 3; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain why this is optional?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FilteredReadExec can run in two modes:
|
||
| // Note: FilteredReadExec.index_input (child ExecutionPlan) is NOT serialized here. | ||
| // DataFusion's PhysicalExtensionCodec handles child plans automatically: it walks | ||
| // the plan tree via children() / with_new_children(), serializes each node, and | ||
| // passes deserialized children back as the `inputs` parameter in try_decode. | ||
| // This means any ExecutionPlan in the tree (including index_input) must also | ||
| // implement try_encode/try_decode in the PhysicalExtensionCodec. | ||
| // TODO: implement serialize/deserialize for lance-specific index input ExecutionPlans. | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // SPDX-FileCopyrightText: Copyright The Lance Authors | ||
|
|
||
| syntax = "proto3"; | ||
|
|
||
| package lance.datafusion; | ||
|
|
||
| // Identifies a Lance dataset for remote reconstruction. | ||
| // | ||
| // Two modes: | ||
| // 1. uri + serialized_manifest (fast): remote executor skips manifest read. | ||
| // 2. uri + version + etag (lightweight): remote executor loads manifest from storage. | ||
| message TableIdentifier { | ||
| string uri = 1; | ||
| uint64 version = 2; | ||
| optional string manifest_etag = 3; | ||
| optional bytes serialized_manifest = 4; | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wjones127 do you think we need to serialize the manifest_etag and even send the serialized manifest itself? |
||
| } | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // SPDX-FileCopyrightText: Copyright The Lance Authors | ||
|
|
||
| use std::io::Result; | ||
|
|
||
| fn main() -> Result<()> { | ||
| println!("cargo:rerun-if-changed=protos"); | ||
|
|
||
| #[cfg(feature = "protoc")] | ||
| // Use vendored protobuf compiler if requested. | ||
| std::env::set_var("PROTOC", protobuf_src::protoc()); | ||
|
|
||
| let mut prost_build = prost_build::Config::new(); | ||
| prost_build.protoc_arg("--experimental_allow_proto3_optional"); | ||
| prost_build.enable_type_names(); | ||
| prost_build.compile_protos( | ||
| &[ | ||
| "./protos/table_identifier.proto", | ||
| "./protos/filtered_read.proto", | ||
| ], | ||
| &["./protos"], | ||
| )?; | ||
|
|
||
| Ok(()) | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../protos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR does it make sense to make this a
oneofoverbytesandrepeated U64Range? Then we could just use the smaller representation on serialization (easy computation).In my plan_splits PR I merged the
FilteredReadPlanandFilteredReadIntenralPlanbecause IIUC the only reason we have differentiation is that serializing a bitmap is smaller than ranges. I would challenge that under the intuition (or maybe "hope") that we have many dense splits where the seriaization is actually smaller for ranges (ex. all 4k values a range is 16 bytes (u64 start, u64 end) whereas a bitmap is 4k / 8 = 500 bytes). If this is the case, then we would do a series of inefficient serializations to move to bitmap for every query.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that having FilteredReadPlan as a outside plan is mainly not for serialization / deserialization. I keep FilteredReadPlan with external plan using RoaringBitmap for the following reasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't have a good sense for where the trade-offs are for the two approaches but it sounds like you've both thought about it. My preference is to move forward with what we've got written and revisit if serialization time seems to be a bottleneck?