-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11719: [Rust][Datafusion] support creating memory table with merged schema #9537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rged schema * Added `contains` method for `arrow::datatypes::Schema` and `arrow::datatypes::Field` * Relax batch schema validation using `contains` check when creating a MemTable in datafusion
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool change @houqp -- thanks.
I am not sure if DataFusion will work with different schemas without some additional modifications. Specifically, when the schemas are actually subsets of each other with different numbers of columns -- here is what I came up with: https://github.com/influxdata/influxdb_iox/blob/main/query/src/provider/adapter.rs#L44-L70
I think the contains logic makes sense, and is actually quite interesting -- in IOx, we have similar code to effectively merge schemas. This implements compatible definitions of merge.
rust/arrow/src/datatypes/schema.rs
Outdated
|
|
||
| /// Check to see if `self` is a superset of `other` schema Here are the comparision rules: | ||
| /// | ||
| /// * for every field `f` in other, the field in self with corresponding index should be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thank you for the clear comments
|
@alamb good call, I only assumed that logically makes sense, but never checked to see if it's actually implemented in datafusion myself. I have pushed a commit to check for fields count in the I am a fan of your SchemaAdapterStream implementation, looks like it would be useful to include the core of that logic in datafusion as well. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great @houqp - thank you. @nevi-me / @jorgecarleitao / @andygrove any comments?
| ], | ||
| )?; | ||
|
|
||
| match MemTable::try_new(schema2, vec![vec![batch]]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in merging @houqp -- it has been a busy week!
containsmethod forarrow::datatypes::Schemaandarrow::datatypes::Fieldcontainscheck when creating aMemTable in datafusion