Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/arrow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,10 @@ jobs:
- name: Test arrow-schema
run: cargo test -p arrow-schema --all-features
- name: Test arrow-array
run: cargo test -p arrow-array --all-features
run: |
cargo test -p arrow-array --all-features
# Disable feature `force_validate`
cargo test -p arrow-array --features=ffi
- name: Test arrow-select
run: cargo test -p arrow-select --all-features
- name: Test arrow-cast
Expand Down
37 changes: 36 additions & 1 deletion arrow-array/src/ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,17 @@ impl ImportedArrowArray<'_> {
.map(|index| {
let len = self.buffer_len(index, variadic_buffer_lens, &self.data_type)?;
match unsafe { create_buffer(self.owner.clone(), self.array, index, len) } {
Some(buf) => Ok(buf),
Some(buf) => {
// External libraries may use a dangling pointer for a buffer with length 0.
// We respect the array length specified in the C Data Interface. Actually,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. I guess in my mind "if the length is incorrect we can't make a good buffer" is "obvious" but maybe it is not for other readers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to say that if there is a question on if the length is zero, but the pointer isn't a dangling pointer. Previously we simply take any non-null pointer as valid one and didn't consider the length. We ignored the possibility of a dangling pointer.

// if the length is incorrect, we cannot create a correct buffer even if
// the pointer is valid.
if buf.is_empty() {
Ok(MutableBuffer::new(0).into())
} else {
Ok(buf)
}
}
None if len == 0 => {
// Null data buffer, which Rust doesn't allow. So create
// an empty buffer.
Expand Down Expand Up @@ -1296,9 +1306,15 @@ mod tests_to_then_from_ffi {

#[cfg(test)]
mod tests_from_ffi {
#[cfg(not(feature = "force_validate"))]
use std::ptr::NonNull;
use std::sync::Arc;

#[cfg(feature = "force_validate")]
use arrow_buffer::{bit_util, buffer::Buffer};
#[cfg(not(feature = "force_validate"))]
use arrow_buffer::{bit_util, buffer::Buffer, ScalarBuffer};

use arrow_data::transform::MutableArrayData;
use arrow_data::ArrayData;
use arrow_schema::{DataType, Field};
Expand Down Expand Up @@ -1660,6 +1676,25 @@ mod tests_from_ffi {
}
}

#[test]
#[cfg(not(feature = "force_validate"))]
Copy link
Contributor

@alamb alamb Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this can't work with force_validate

I ran the test without this line and it seems to work fine:

diff --git a/arrow-array/src/ffi.rs b/arrow-array/src/ffi.rs
index 2ee2fd379e..5b05bcd81d 100644
--- a/arrow-array/src/ffi.rs
+++ b/arrow-array/src/ffi.rs
@@ -621,7 +621,6 @@ mod tests_to_then_from_ffi {
     }

     #[test]
-    #[cfg(not(feature = "force_validate"))]
     fn test_decimal_round_trip() -> Result<()> {
         // create an array natively
         let original_array = [Some(12345_i128), Some(-12345_i128), None]
@@ -1523,7 +1522,6 @@ mod tests_from_ffi {
     }

     #[test]
-    #[cfg(not(feature = "force_validate"))]
     fn test_empty_string_with_non_zero_offset() -> Result<()> {
         use super::ImportedArrowArray;
         use arrow_buffer::{MutableBuffer, OffsetBuffer};
@@ -1677,7 +1675,6 @@ mod tests_from_ffi {
     }

     #[test]
-    #[cfg(not(feature = "force_validate"))]
     fn test_utf8_view_ffi_from_dangling_pointer() {
         let empty = GenericByteViewBuilder::<StringViewType>::new().finish();
         let buffers = empty.data_buffers().to_vec();
cargo test -p arrow-array --features=ffi
test result: ok. 185 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 24.13s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If force_validate is enabled, when we call GenericByteViewArray::new_unchecked to create a Utf8View array, even it is "unchecked" one, but force_validate will force to validate the views buffer. But in this test the ScalarBuffer is a invalid one (it uses an unaligned dangling pointer), so there will be a runtime error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test without this line and it seems to work fine:
cargo test -p arrow-array --features=ffi

Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works. This new test only works when force_validate is not enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works.

Right, so force_validate is not enabled

This new test only works when force_validate is not enabled.

Isn't that same? The test works when force_validate is not enabled 😕

Copy link
Member Author

@viirya viirya Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Yea, the test works when force_validate is not enabled. So if you run with cargo test -p arrow-array --features=ffi , it works.

Copy link
Member Author

@viirya viirya Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, did you want to ask why I added #[cfg(not(feature = "force_validate"))]? It is because our CI runs arrow-array tests with force_validate enabled. If I don't add this line, CI will run the test and fail. So I must add the line to pass CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I was confused

fn test_utf8_view_ffi_from_dangling_pointer() {
let empty = GenericByteViewBuilder::<StringViewType>::new().finish();
let buffers = empty.data_buffers().to_vec();
let nulls = empty.nulls().cloned();

// Create a dangling pointer to a view buffer with zero length.
let alloc = Arc::new(1);
let buffer = unsafe { Buffer::from_custom_allocation(NonNull::<u8>::dangling(), 0, alloc) };
let views = unsafe { ScalarBuffer::new_unchecked(buffer) };

let str_view: GenericByteViewArray<StringViewType> =
unsafe { GenericByteViewArray::new_unchecked(views, buffers, nulls) };
let imported = roundtrip_byte_view_array(str_view);
assert_eq!(imported.len(), 0);
assert_eq!(&imported, &empty);
}

#[test]
fn test_round_trip_byte_view() {
fn test_case<T>()
Expand Down
23 changes: 23 additions & 0 deletions arrow-buffer/src/buffer/scalar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,19 @@ impl<T: ArrowNativeType> ScalarBuffer<T> {
buffer.slice_with_length(byte_offset, byte_len).into()
}

/// Unsafe function to create a new [`ScalarBuffer`] from a [`Buffer`].
/// Only use for testing purpose.
///
/// # Safety
///
/// This function is unsafe because it does not check if the `buffer` is aligned
pub unsafe fn new_unchecked(buffer: Buffer) -> Self {
Self {
buffer,
phantom: Default::default(),
}
}

/// Free up unused memory.
pub fn shrink_to_fit(&mut self) {
self.buffer.shrink_to_fit();
Expand Down Expand Up @@ -99,6 +112,16 @@ impl<T: ArrowNativeType> ScalarBuffer<T> {
pub fn ptr_eq(&self, other: &Self) -> bool {
self.buffer.ptr_eq(&other.buffer)
}

/// Returns the number of elements in the buffer
pub fn len(&self) -> usize {
self.buffer.len() / std::mem::size_of::<T>()
}

/// Returns if the buffer is empty
pub fn is_empty(&self) -> bool {
self.len() == 0
}
}

impl<T: ArrowNativeType> Deref for ScalarBuffer<T> {
Expand Down
Loading