-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Avoid a clone when creating StringArray/BinaryArray from ArrayData #9160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// | ||
| /// - buffer must contain valid arrow offsets ( [`OffsetBuffer`] ) for the | ||
| /// given length and offset. | ||
| unsafe fn get_offsets_from_buffer<O: ArrowNativeType>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a version of get_offsets immediately above here that takes the Buffers rather than ArrayData. Once I have ported all the arrays, I will remove get_offsets
| value_data, | ||
| data_type: T::DATA_TYPE, | ||
| nulls: data.nulls().cloned(), | ||
| data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also saves calling Drop on DataType and a clone of Nulls -- probably not measurable but faster 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes a lot of Drop can be surprisingly slow
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @alamb!
…pache#9160) # Which issue does this PR close? - Part of apache#9061 - broken out of apache#9058 # Rationale for this change Let's make arrow-rs the fastest we can and the fewer allocations the better # What changes are included in this PR? Apply pattern from apache#9114 # Are these changes tested? Existing tests # Are there any user-facing changes? No
Which issue does this PR close?
make_array) #9061ArrayData(speed up reading from Parquet reader) #9058Rationale for this change
Let's make arrow-rs the fastest we can and the fewer allocations the better
What changes are included in this PR?
Apply pattern from #9114
Are these changes tested?
Existing tests
Are there any user-facing changes?
No