Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 13, 2026

Which issue does this PR close?

Rationale for this change

@scovich noted in #9114 (comment) that calling Vec::remove does an extra copy and that Vec::from doesn't actually reuse the allocation the way I thought it did

What changes are included in this PR?

Build the Arc for buffers directly

Are these changes tested?

BY existing tests

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 13, 2026
@alamb alamb changed the title Minor: try and avoid an allocation Minor: try and avoid an allocation creating GenericByteViewArray from ArrayData Jan 13, 2026
impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T> {
fn from(data: ArrayData) -> Self {
let (_data_type, len, nulls, offset, mut buffers, _child_data) = data.into_parts();
let views = buffers.remove(0); // need to maintain order of remaining buffers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is that remove(0) copies all the buffers and then Arc::from allocates/copies it again. Using Arc::from_iter builds the results directly

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit.

I guess it would be hard to measure a benchmark impact from the change, without going to an extreme case?

Comment on lines 993 to 996
// first buffer is views
let views = buffers[0].clone();
// remaining buffers are data buffers
let buffers = Arc::from_iter(buffers.into_iter().skip(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// first buffer is views
let views = buffers[0].clone();
// remaining buffers are data buffers
let buffers = Arc::from_iter(buffers.into_iter().skip(1));
// first buffer is views, remaining buffers are data buffers
let buffers = buffers.into_iter();
let views = buffers.next().unwrap(); // safety: never empty
let buffers = Arc::from_iter(buffers);

(question tho -- why do we know buffers is never empty?)

Also -- it would be fewer lines of code to fold those two values into the constructor call?

let buffers = buffers.into_iter();
Self {
    data_type: T::DATA_TYPE,
    views: buffers.next().unwrap(),
    buffers: Arc::from_iter(buffers),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(question tho -- why do we know buffers is never empty?)

Basically because the ArrayData is validated for each Arrow type -- and since it is validated by

new_self.validate_data()?;

there must always be at least buffer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's about what I figured, but I couldn't find where the validation actually happened. Thanks for the pointer!

@alamb
Copy link
Contributor Author

alamb commented Jan 13, 2026

I guess it would be hard to measure a benchmark impact from the change, without going to an extreme case?

Yeah, thank you. I have found benchmarking anything related to allocations quite tricky

@alamb
Copy link
Contributor Author

alamb commented Jan 13, 2026

Also -- it would be fewer lines of code to fold those two values into the constructor call?

Good call -- done in 707bf41

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks @alamb and @scovich

@alamb alamb merged commit 517b553 into apache:main Jan 14, 2026
26 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jan 14, 2026

Thanks everyone

@alamb alamb deleted the alamb/more_less_alloc branch January 14, 2026 12:02
Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
…om `ArrayData` (apache#9156)

# Which issue does this PR close?

- part of apache#9061
- follow on apache#9114


# Rationale for this change

@scovich noted in
apache#9114 (comment) that
calling `Vec::remove` does an extra copy and that `Vec::from` doesn't
actually reuse the allocation the way I thought it did


# What changes are included in this PR?

Build the Arc for buffers directly

# Are these changes tested?

BY existing tests

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants