-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Minor: try and avoid an allocation creating GenericByteViewArray from ArrayData
#9156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GenericByteViewArray from ArrayData
| impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T> { | ||
| fn from(data: ArrayData) -> Self { | ||
| let (_data_type, len, nulls, offset, mut buffers, _child_data) = data.into_parts(); | ||
| let views = buffers.remove(0); // need to maintain order of remaining buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is that remove(0) copies all the buffers and then Arc::from allocates/copies it again. Using Arc::from_iter builds the results directly
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one nit.
I guess it would be hard to measure a benchmark impact from the change, without going to an extreme case?
| // first buffer is views | ||
| let views = buffers[0].clone(); | ||
| // remaining buffers are data buffers | ||
| let buffers = Arc::from_iter(buffers.into_iter().skip(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| // first buffer is views | |
| let views = buffers[0].clone(); | |
| // remaining buffers are data buffers | |
| let buffers = Arc::from_iter(buffers.into_iter().skip(1)); | |
| // first buffer is views, remaining buffers are data buffers | |
| let buffers = buffers.into_iter(); | |
| let views = buffers.next().unwrap(); // safety: never empty | |
| let buffers = Arc::from_iter(buffers); |
(question tho -- why do we know buffers is never empty?)
Also -- it would be fewer lines of code to fold those two values into the constructor call?
let buffers = buffers.into_iter();
Self {
data_type: T::DATA_TYPE,
views: buffers.next().unwrap(),
buffers: Arc::from_iter(buffers),There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(question tho -- why do we know buffers is never empty?)
Basically because the ArrayData is validated for each Arrow type -- and since it is validated by
arrow-rs/arrow-data/src/data.rs
Line 353 in f122d77
| new_self.validate_data()?; |
there must always be at least buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's about what I figured, but I couldn't find where the validation actually happened. Thanks for the pointer!
Yeah, thank you. I have found benchmarking anything related to allocations quite tricky |
Good call -- done in 707bf41 |
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks everyone |
…om `ArrayData` (apache#9156) # Which issue does this PR close? - part of apache#9061 - follow on apache#9114 # Rationale for this change @scovich noted in apache#9114 (comment) that calling `Vec::remove` does an extra copy and that `Vec::from` doesn't actually reuse the allocation the way I thought it did # What changes are included in this PR? Build the Arc for buffers directly # Are these changes tested? BY existing tests # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. -->
Which issue does this PR close?
make_array) #9061make_arrayforStructArrayandGenericByteViewArray#9114Rationale for this change
@scovich noted in #9114 (comment) that calling
Vec::removedoes an extra copy and thatVec::fromdoesn't actually reuse the allocation the way I thought it didWhat changes are included in this PR?
Build the Arc for buffers directly
Are these changes tested?
BY existing tests
Are there any user-facing changes?