-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix missing utf8 check for conversion from BinaryViewArray to StringViewArray #9158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| #[should_panic(expected = "invalid utf-8 sequence")] | ||
| #[test] | ||
| fn invalid_array_data() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test also fails on main but I wanted to make it super clear you can't build an invalid Utf8ViewArray with the ArrayDataBuilder (as expected)
| let views = ScalarBuffer::new(views, offset, len); | ||
| Self { | ||
| data_type: T::DATA_TYPE, | ||
| data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhorstmann noted that reusing data_type here might be faster as it avoids a call to DataType::drop 🤷
| fn from(data: ArrayData) -> Self { | ||
| let (_data_type, len, nulls, offset, mut buffers, _child_data) = data.into_parts(); | ||
| let (data_type, len, nulls, offset, mut buffers, _child_data) = data.into_parts(); | ||
| assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the equivalent check in GenericByteArray:
arrow-rs/arrow-array/src/array/byte_array.rs
Line 545 in 90839df
| assert_eq!( |
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Verified test fails if the assert is removed.
Co-authored-by: Martin Hilton <mhilton@influxdata.com>
|
One conflict! |
|
Conflict resolved so merging! |
…iewArray (apache#9158) # Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - closes apache#9157 # Rationale for this change @jhorstmann found it is possible to bypass utf8 validation by abusing the ArrayData APIs # What changes are included in this PR? 1. Add an assert to prevent the bypass 2. Add tests # Are these changes tested? Yes, new unit tests are added # Are there any user-facing changes? error if APIs are misused --------- Co-authored-by: Martin Hilton <mhilton@influxdata.com>
Which issue does this PR close?
BinaryViewArrayto StringViewArray #9157Rationale for this change
@jhorstmann found it is possible to bypass utf8 validation by abusing the ArrayData APIs
What changes are included in this PR?
Are these changes tested?
Yes, new unit tests are added
Are there any user-facing changes?
error if APIs are misused