-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Create empty buffer for a buffer specified in the C Data Interface with length zero #8009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me. Thank you @viirya
I don't understand the need for all the force_validate feature flag checking, though I think that could be cleaned up as a follow on PR
| Some(buf) => Ok(buf), | ||
| Some(buf) => { | ||
| // External libraries may use a dangling pointer for a buffer with length 0. | ||
| // We respect the array length specified in the C Data Interface. Actually, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. I guess in my mind "if the length is incorrect we can't make a good buffer" is "obvious" but maybe it is not for other readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to say that if there is a question on if the length is zero, but the pointer isn't a dangling pointer. Previously we simply take any non-null pointer as valid one and didn't consider the length. We ignored the possibility of a dangling pointer.
| } | ||
|
|
||
| #[test] | ||
| #[cfg(not(feature = "force_validate"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this can't work with force_validate
I ran the test without this line and it seems to work fine:
diff --git a/arrow-array/src/ffi.rs b/arrow-array/src/ffi.rs
index 2ee2fd379e..5b05bcd81d 100644
--- a/arrow-array/src/ffi.rs
+++ b/arrow-array/src/ffi.rs
@@ -621,7 +621,6 @@ mod tests_to_then_from_ffi {
}
#[test]
- #[cfg(not(feature = "force_validate"))]
fn test_decimal_round_trip() -> Result<()> {
// create an array natively
let original_array = [Some(12345_i128), Some(-12345_i128), None]
@@ -1523,7 +1522,6 @@ mod tests_from_ffi {
}
#[test]
- #[cfg(not(feature = "force_validate"))]
fn test_empty_string_with_non_zero_offset() -> Result<()> {
use super::ImportedArrowArray;
use arrow_buffer::{MutableBuffer, OffsetBuffer};
@@ -1677,7 +1675,6 @@ mod tests_from_ffi {
}
#[test]
- #[cfg(not(feature = "force_validate"))]
fn test_utf8_view_ffi_from_dangling_pointer() {
let empty = GenericByteViewBuilder::<StringViewType>::new().finish();
let buffers = empty.data_buffers().to_vec();cargo test -p arrow-array --features=ffitest result: ok. 185 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 24.13s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If force_validate is enabled, when we call GenericByteViewArray::new_unchecked to create a Utf8View array, even it is "unchecked" one, but force_validate will force to validate the views buffer. But in this test the ScalarBuffer is a invalid one (it uses an unaligned dangling pointer), so there will be a runtime error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the test without this line and it seems to work fine:
cargo test -p arrow-array --features=ffi
Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works. This new test only works when force_validate is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works.
Right, so force_validate is not enabled
This new test only works when force_validate is not enabled.
Isn't that same? The test works when force_validate is not enabled 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Yea, the test works when force_validate is not enabled. So if you run with cargo test -p arrow-array --features=ffi , it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, did you want to ask why I added #[cfg(not(feature = "force_validate"))]? It is because our CI runs arrow-array tests with force_validate enabled. If I don't add this line, CI will run the test and fail. So I must add the line to pass CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I was confused
Co-authored-by: Andrew Lamb <[email protected]>
|
Thanks again @viirya |
|
Thank you @alamb |
Which issue does this PR close?
Rationale for this change
The failure was described in the issue. In short, the buffer pointer of an empty buffer array exported from polars is a dangling pointer not aligned. But currently we take the raw pointer from C Data Interface and check its alignment before interpreting it as
ScalarBuffer. Thus it causes the failure case.What changes are included in this PR?
This patch changes FFI module to create an empty buffer for exported buffer with length zero. As we never dereference the dangling pointer, seems it's not necessary to require the alignment for it.
For non empty buffers, we still keep the alignment check.
Are these changes tested?
Added a unit test with necessary utility functions.
Are there any user-facing changes?
No