-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve C Data Interface and Add Integration Testing Entrypoints #5080
Conversation
For now I have tested this locally against the C++ implementation, using these changes to the Archery machinery: |
arrow-integration-testing/src/lib.rs
Outdated
let imported_schema = unsafe { std::ptr::replace(c_schema, FFI_ArrowSchema::empty()) }; | ||
let imported_schema = Schema::try_from(&imported_schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern would probably benefit from an API function. It's easy to get it wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, left some comments
Ok, I've now checked this works properly against C++, Go, Java and C#. Will mark ready for review. |
@tustvold Is it possible to query the amount of memory allocated by Arrow Rust? It is not mandatory but allows checking for memory leaks during integration testing. |
Not easily afaik, you could register a global allocator that supports this, but nothing out of the box. Rust's custom allocators are not yet stable |
37f87cd
to
ddb35fb
Compare
Ok, I see. We can revisit this later. |
@tustvold What would you think of adding a macro like this to ease error reporting? |
No objections from me, although FYI #2725 tracks the broader issue that the error variants in arrow-rs are not particularly meaningful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, mostly just minor nits, although I think we should probably make from_ffi and from_ffi_and_data_type unsafe before merging
arrow-integration-testing/src/lib.rs
Outdated
Schema::new(fields).with_metadata(schema.metadata().clone()) | ||
} | ||
|
||
struct PartialArrowFile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could get a doc comment explaining how this differs from ArrowFile? Perhaps LazyArrowFile might be a better name??
Alternatively I wonder if we could just make ArrowFile lazy, and have member functions for decoding a batch to RecordBatch by index or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed to LazyArrowFile, which indeed sounds better. As for ArrowFile, it's marked public, is it ok to change its API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This crate isn't published, so you can go wild 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did the suggested changes.
ddb35fb
to
4deebe6
Compare
@@ -357,9 +357,11 @@ impl Iterator for ArrowArrayStreamReader { | |||
} | |||
|
|||
let schema_ref = self.schema(); | |||
// NOTE: this parses the FFI_ArrowSchema again on each iterator call; | |||
// should probably use from_ffi_and_data_type() instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a separate issue should be opened for this. @tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #5103
674736f
to
69c2715
Compare
I think I addressed all review comments now. |
Thanks once again 👍 |
…n Rust (#38799) ### Rationale for this change Arrow Rust has added entrypoints for C Data Interface integration testing, so this can now be enabled on our side: apache/arrow-rs#5080 ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #38798 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ting on Rust (apache#38799) ### Rationale for this change Arrow Rust has added entrypoints for C Data Interface integration testing, so this can now be enabled on our side: apache/arrow-rs#5080 ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38798 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Arrow needs to be upgraded alongside pyo3 because they both link to python. Arrow's upgrade to pyo3 is in master but not yet released as of v51. apache/arrow-rs#5566 NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Arrow needs to be upgraded alongside pyo3 because they both link to python. Arrow's upgrade to pyo3 is in master but not yet released as of v51. apache/arrow-rs#5566 NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Arrow needs to be upgraded alongside pyo3 because they both link to python. NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Arrow needs to be upgraded alongside pyo3 because they both link to python. NOTE: Arrow marked `ffi::from_ffi` unsafe. apache/arrow-rs#5080
Which issue does this PR close?
Closes #4867.
Rationale for this change
Allow Rust to participate in integration testing for the C Data Interface, as explained here:
https://arrow.apache.org/docs/dev/format/Integration.html#example-c-data-interface
What changes are included in this PR?
Add C-compatible entrypoints for the C Data Interface integration testing machinery.
Add support for exporting and importing Intervals over the C Data Interface.
Allow importing from a FFI_ArrowArray and an existing DataType.
Fix handling of
null_count
field in the C ArrowArray struct.Are there any user-facing changes?
No.