-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Update Union Array to add UnionMode, match latest Arrow Spec, and rename new -> unsafe new_unchecked()
#885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
arrow/src/array/array_union.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the API change
arrow/src/array/array_union.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the core format change -- the length of type_ids must match the length of the values (rather than skipping nulls)
arrow/src/array/data.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can now validate the buffer layout based on DataType!
Codecov Report
@@ Coverage Diff @@
## master #885 +/- ##
==========================================
+ Coverage 82.33% 82.35% +0.02%
==========================================
Files 169 169
Lines 49719 49788 +69
==========================================
+ Hits 40936 41003 +67
- Misses 8783 8785 +2
Continue to review full report at Codecov.
|
3ace1e2 to
ca55696
Compare
new -> unsafe new_unchecked()UnionMode, match latest Arrow Spec, and rename new -> unsafe new_unchecked()
|
Marking PRs that are over a month old as stale -- please let us know if there is additional work planned or if we should close them. |
|
(I do have some idea in my head I can finish this one up) |
8cde1ed to
a094eb7
Compare
a094eb7 to
b481581
Compare
|
Ok, I got the yak shaving bug and cleaned up this PR and marked it ready for review. 🙏 |
|
@jimexist @paddyhoran @nevi-me -- might one of you have time to review this PR? It revamps how UnionArray is supported to conform to the modern Arrow spec, and adds additional validation. I would like to get it in for the 7.0.0 release in a few weeks time |
|
Sorry I've been so quiet on Arrow. I'll find some time to review in the coming days. |
paddyhoran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @alamb
| /// are used to index into the `child_arrays`. | ||
| /// | ||
| /// The `value_offsets` `Buffer` is only provided in the case of a dense union, sparse unions | ||
| /// should use `None`. If provided the `value_offsets` `Buffer` should contain `i32` values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you removed the period at the end of this line by mistake, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in c8737a1
(Built on #810, so to see the changes proposed in this PR just look at the last commit)Which issue does this PR close?
Closes #814
Closes #85
Related to #817
Rationale for this change
Follow Arrow spec for
UnionArray, make it possible to validate with generic code added in #810, and conform to Rust safety conventions.See https://arrow.apache.org/docs/format/Columnar.html#union-layout for a description of the Union layout
What changes are included in this PR?
UnionArray::newtounsafe UnionArray::new_unchecked()to follow standard Rust safety conventionsUnionModetoDataType::Unionso the type carries information on expected formatAre there any user-facing changes?
UnionArray::newhas been renamed tounsafe UnionArray::new_unchecked()DenseorSparsewhen creating aDataType::UnionUnionArrayformat is different