Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 15, 2026

Which issue does this PR close?

Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the better

What changes are included in this PR?

Apply pattern from #9114

Are these changes tested?

Existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 15, 2026
.try_into()
.expect("RunArray data should have exactly two child arrays");

// deconstruct the run ends child array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what the current code does too (reaches into the child array data and gets the first buffer)

The current code also checks that there is exactly 1 buffer in the child array

};

let values = make_array(data.child_data()[1].clone());
let [run_end_child, values_child]: [ArrayData; 2] = child_data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a nice way to check the length and destructure the ArrayData in one command

https://stackoverflow.com/questions/29570607/is-there-a-good-way-to-convert-a-vect-to-an-array

RunEndBuffer::new_unchecked(scalar, data.offset(), data.len())
};

let values = make_array(data.child_data()[1].clone());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is real clone of ArrayData , which allocates a Vec, which is no longer done by this PR

let values = make_array(values_child);
Self {
data_type: data.data_type().clone(),
data_type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also avoids a DataType::drop which is not likely to make a large difference but is still something

.try_into()
.expect("Run ends should have exactly one buffer");
let scalar = ScalarBuffer::from(run_end_buffer);
let run_ends = unsafe { RunEndBuffer::new_unchecked(scalar, offset, len) };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note the previous code also uses unsafe to create a RunEndBuffer (which is valid during construction of ArrayData)

@alamb alamb marked this pull request as ready for review January 18, 2026 13:09
@alamb alamb merged commit 93ebd3a into apache:main Jan 21, 2026
26 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jan 21, 2026

Thank you @Jefffrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants