-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11504: [Rust] Added checks to List DataType. #9425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sunchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorgecarleitao for pinging. Left a few comments.
| let values = data.child_data()[0].clone(); | ||
|
|
||
| if let Some(child) = Self::get_type(data.data_type()) { | ||
| assert_eq!(values.data_type(), child, "[Large]ListArray's child datatype does not correspond to the List's datatype"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: long line? I remember we enforce a limit of 100 characters.
|
|
||
| let values = data.child_data()[0].clone(); | ||
|
|
||
| if let Some(child) = Self::get_type(data.data_type()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a few tests to cover this? checking error message and all.
Also I'm not sure if assert_eq is good here: IMO assertion should only be used for checking internal logic that developer should follow and which are not exposed to the library users, but in this case it appears not. It's just a nit though since this is already used in multiple places before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. However, that requires a larger change as we would need to move from From to TryFrom, so for now I just want to avoid unsafe code by panicking everytime something may go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a mid-way proposal: a31a35a
Basically, to use normal rust handling, but the make the Into implementation expect the result
I actually think using asserts / panics directly (as in this PR) is also fine beacuse:
- it is an improvement over the current behavior (crash / undefined) to get useful error messages (even if it is in a panic :( )
- the use of
ArrayDatain my mind is also an implementation detail of anArrayso most users of Arrow shouldn't be interacting with this code at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I also tried using TryFrom directly and as @jorgecarleitao suspected there are many kernel implementations that rely in this being infallable.
| false | ||
| } | ||
|
|
||
| fn prefix() -> &'static str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we won't need prefix anymore with the new is_large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might still need it, we also use it for formatting in Display
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can drop it, yes. We can merge StringOffset, BinaryOffset and OffsetTrait in a single Trait with this, but I wanted to leave it to another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one way to remove prefix that does not go as far as @jorgecarleitao suggests to collapse the traits... 8e68e05
|
Nice! I recently ran into the same issue and the assertion about the nested datatypes would have saved me some time debugging. In my testcase an assert_eq was failing, but printed both sides exactly the same. |
|
@jhorstmann , I think that that shows another, separate issue: imo what we currently show in "debug" should be shown in "Display", and "Debug" should actually show the full structure, i.e. the one created by |
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No additional comments, other than @sunchao's
| false | ||
| } | ||
|
|
||
| fn prefix() -> &'static str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might still need it, we also use it for formatting in Display
d4608a9 to
356c300
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao do you need help with this PR? I can try and take some of @sunchao 's comments if that would help
|
@alamb that would definitely help. If you have the time, I would really appreciate. |
@jorgecarleitao I will put it on my queue for tomorrow |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is good to merge as is as it makes the code better than on master (less errors), though it can be further improved.
@sunchao when you get some time I would love your feedback / advice -- should we merge this PR as is? Would you suggest incorporating one/both of the approaches prototyped in #9508 (remove prefix and making a fallable version of ArrayData --> GenericListArray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2 - I think we can merge this as it is and solve the comments separately. Thanks!
BTW this needs rebase.
Codecov Report
@@ Coverage Diff @@
## master #9425 +/- ##
==========================================
+ Coverage 82.25% 82.33% +0.07%
==========================================
Files 244 245 +1
Lines 55685 56270 +585
==========================================
+ Hits 45806 46330 +524
- Misses 9879 9940 +61
Continue to review full report at Codecov.
|
|
@nevi-me good plan! |
| "generated_dictionary", | ||
| // "generated_duplicate_fieldnames", | ||
| "generated_interval", | ||
| "generated_large_batch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nevi-me I don't remember seeing this in the original PR -- was this change intended ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NM I see #9587 now
# Background: Left over cleanups suggested by from @sunchao on #9425 Broken out from #9508 # Rationale: This function is redundant with `OffsetSize::is_large` Closes #9690 from alamb/alamb/remove_prefix Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
…ead of `panic!` # Background: Left over cleanups suggested by from @sunchao on #9425 Broken out from #9508 # Rationale: Don't use panic! directly. However, since the caller of this function still calls `unwrap()`, I am not sure how much of an improvement this change really is. However it may set us up for a more `safe` future eventually Closes #9691 from alamb/alamb/fallable_list_conversion Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
# Background: Left over cleanups suggested by from @sunchao on apache/arrow#9425 Broken out from apache/arrow#9508 # Rationale: This function is redundant with `OffsetSize::is_large` Closes #9690 from alamb/alamb/remove_prefix Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
…ead of `panic!` # Background: Left over cleanups suggested by from @sunchao on apache/arrow#9425 Broken out from apache/arrow#9508 # Rationale: Don't use panic! directly. However, since the caller of this function still calls `unwrap()`, I am not sure how much of an improvement this change really is. However it may set us up for a more `safe` future eventually Closes #9691 from alamb/alamb/fallable_list_conversion Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
This PR fixes a bug on which
GenericListArrayis not validating the datatype passed on tofrom(ArrayData), causing all types of bugs, such as undefined behavior in interpreting the offset buffer.This PR adds this validation, panicking if the DataType does not match.
This PR also fixes casting from and to Lists, which was creating an
ArrayDataout of spec.