-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10261: [Rust] [Breaking] Change List datatype to Box<Field> #8608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed all the changes in this PR and they look good to me -- they do what the PR says they do and the rationale for the change on https://issues.apache.org/jira/browse/ARROW-10261 makes sense too
One thought I had that might make the code slightly simpler would be to add a DataType::new_list() function, which would allow code like this:
DataType::List(Box::new(Field::new(
"array",
DataType::Boolean,
true,
))),
to be written like this
DataType::new_list(DataType::Boolean, true)
We can always add such a convenience function later, however. It is definitely not required in this PR
rust/arrow/src/ipc/convert.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is any reason to leave this commented out? As in why not remove the old version?
I have the same question for the other instances in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot about it(them), I was struggling with compiling due to lifetime issues, so I commented out those lines in case I needed to revert them. I've now removed them
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks a lot, @nevi-me !
This is part 1 of potentially 3 parts, it only changes the `arrow` crate `parquet` depends on `arrow`, and `datafusion` depends on both. `datafusion` will thus only compile after `parquet` is fixed.
69198da to
dd8178c
Compare
Yes @alamb, we can address it as a follow-up. There's more work that I also need to do on top of this, I've hardcoded "item" in a bunch of places, and on the Parquet side, there's some compatibility tests that I need to write. @jorgecarleitao happy that we merge this if/when tests pass, or would you like another reviewer to look at this? |
|
Ready to merge if the tests pass. I went through all the changes and they seem reasonable. Most issues I had were the same that @alamb suggested, and the rest seems straightforward given the changes on the DataType. I have a question related to why do we use a |
We need some indirection to break the potential infinite recursion error[E0072]: recursive type `datatypes::Field` has infinite size
--> arrow\src\datatypes.rs:188:1
|
188 | pub struct Field {
| ^^^^^^^^^^^^^^^^ recursive type has infinite size
189 | name: String,
190 | data_type: DataType,
| ------------------- recursive without indirection
|
= help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `datatypes::Field` representable |
This changes a list datatype to use
Box<Field>instead ofBox<DataType>.This change is needed in order to make both Parquet and IPC roundtrips pass.
The C++ implementation uses
Field, as it allows for preservice list field names and nullability.There are some implementation details which we did not cover in this PR, and will work on as a follow-up, and these are: