Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jul 28, 2025

Which issue does this PR close?

Rationale for this change

The failure was described in the issue. In short, the buffer pointer of an empty buffer array exported from polars is a dangling pointer not aligned. But currently we take the raw pointer from C Data Interface and check its alignment before interpreting it as ScalarBuffer. Thus it causes the failure case.

What changes are included in this PR?

This patch changes FFI module to create an empty buffer for exported buffer with length zero. As we never dereference the dangling pointer, seems it's not necessary to require the alignment for it.

For non empty buffers, we still keep the alignment check.

Are these changes tested?

Added a unit test with necessary utility functions.

Are there any user-facing changes?

No

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. Thank you @viirya

I don't understand the need for all the force_validate feature flag checking, though I think that could be cleaned up as a follow on PR

Some(buf) => Ok(buf),
Some(buf) => {
// External libraries may use a dangling pointer for a buffer with length 0.
// We respect the array length specified in the C Data Interface. Actually,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. I guess in my mind "if the length is incorrect we can't make a good buffer" is "obvious" but maybe it is not for other readers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to say that if there is a question on if the length is zero, but the pointer isn't a dangling pointer. Previously we simply take any non-null pointer as valid one and didn't consider the length. We ignored the possibility of a dangling pointer.

}

#[test]
#[cfg(not(feature = "force_validate"))]
Copy link
Contributor

@alamb alamb Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this can't work with force_validate

I ran the test without this line and it seems to work fine:

diff --git a/arrow-array/src/ffi.rs b/arrow-array/src/ffi.rs
index 2ee2fd379e..5b05bcd81d 100644
--- a/arrow-array/src/ffi.rs
+++ b/arrow-array/src/ffi.rs
@@ -621,7 +621,6 @@ mod tests_to_then_from_ffi {
     }

     #[test]
-    #[cfg(not(feature = "force_validate"))]
     fn test_decimal_round_trip() -> Result<()> {
         // create an array natively
         let original_array = [Some(12345_i128), Some(-12345_i128), None]
@@ -1523,7 +1522,6 @@ mod tests_from_ffi {
     }

     #[test]
-    #[cfg(not(feature = "force_validate"))]
     fn test_empty_string_with_non_zero_offset() -> Result<()> {
         use super::ImportedArrowArray;
         use arrow_buffer::{MutableBuffer, OffsetBuffer};
@@ -1677,7 +1675,6 @@ mod tests_from_ffi {
     }

     #[test]
-    #[cfg(not(feature = "force_validate"))]
     fn test_utf8_view_ffi_from_dangling_pointer() {
         let empty = GenericByteViewBuilder::<StringViewType>::new().finish();
         let buffers = empty.data_buffers().to_vec();
cargo test -p arrow-array --features=ffi
test result: ok. 185 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 24.13s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If force_validate is enabled, when we call GenericByteViewArray::new_unchecked to create a Utf8View array, even it is "unchecked" one, but force_validate will force to validate the views buffer. But in this test the ScalarBuffer is a invalid one (it uses an unaligned dangling pointer), so there will be a runtime error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test without this line and it seems to work fine:
cargo test -p arrow-array --features=ffi

Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works. This new test only works when force_validate is not enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works.

Right, so force_validate is not enabled

This new test only works when force_validate is not enabled.

Isn't that same? The test works when force_validate is not enabled 😕

Copy link
Member Author

@viirya viirya Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Yea, the test works when force_validate is not enabled. So if you run with cargo test -p arrow-array --features=ffi , it works.

Copy link
Member Author

@viirya viirya Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, did you want to ask why I added #[cfg(not(feature = "force_validate"))]? It is because our CI runs arrow-array tests with force_validate enabled. If I don't add this line, CI will run the test and fail. So I must add the line to pass CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I was confused

@alamb alamb merged commit 499de7d into apache:main Jul 29, 2025
31 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 29, 2025

Thanks again @viirya

@viirya
Copy link
Member Author

viirya commented Jul 29, 2025

Thank you @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory pointer from external source (e.g, FFI) is not aligned for empty list[str]

2 participants