-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Avoid a clone when creating RunEndArray from ArrayData
#9189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
843036f to
e19ec7e
Compare
| .try_into() | ||
| .expect("RunArray data should have exactly two child arrays"); | ||
|
|
||
| // deconstruct the run ends child array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what the current code does too (reaches into the child array data and gets the first buffer)
The current code also checks that there is exactly 1 buffer in the child array
| }; | ||
|
|
||
| let values = make_array(data.child_data()[1].clone()); | ||
| let [run_end_child, values_child]: [ArrayData; 2] = child_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a nice way to check the length and destructure the ArrayData in one command
https://stackoverflow.com/questions/29570607/is-there-a-good-way-to-convert-a-vect-to-an-array
| RunEndBuffer::new_unchecked(scalar, data.offset(), data.len()) | ||
| }; | ||
|
|
||
| let values = make_array(data.child_data()[1].clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is real clone of ArrayData , which allocates a Vec, which is no longer done by this PR
| let values = make_array(values_child); | ||
| Self { | ||
| data_type: data.data_type().clone(), | ||
| data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also avoids a DataType::drop which is not likely to make a large difference but is still something
| .try_into() | ||
| .expect("Run ends should have exactly one buffer"); | ||
| let scalar = ScalarBuffer::from(run_end_buffer); | ||
| let run_ends = unsafe { RunEndBuffer::new_unchecked(scalar, offset, len) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note the previous code also uses unsafe to create a RunEndBuffer (which is valid during construction of ArrayData)
|
Thank you @Jefffrey |
Which issue does this PR close?
make_array) #9061ArrayData(speed up reading from Parquet reader) #9058Rationale for this change
Let's make arrow-rs the fastest we can and the fewer allocations the better
What changes are included in this PR?
Apply pattern from #9114
Are these changes tested?
Existing tests
Are there any user-facing changes?
No