Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to get memory size of array slice #3501

Merged
merged 4 commits into from
Jan 10, 2023
Merged

Add a function to get memory size of array slice #3501

merged 4 commits into from
Jan 10, 2023

Conversation

askoa
Copy link
Contributor

@askoa askoa commented Jan 9, 2023

Which issue does this PR close?

Closes #3407

Rationale for this change

See issue description.

What changes are included in this PR?

Function to calculate buffer size of array slice.

Are there any user-facing changes?

User will get a new API that'll calculate the buffer size of array slice.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 9, 2023
arrow-data/src/data.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth noting that this may overestimate for sliced ListArray and StructArray, but I'm not sure if we are being smart when writing those yet. Either way I think this is a good start

@viirya may know??

@viirya
Copy link
Member

viirya commented Jan 10, 2023

For StructArray, its layout is empty so this basically computes on all its children. Looks like when slicing StructArray, we already did correctly for its child data? Then seems okay.

I'm not sure if there is an issue on ListArray. But seems it is same as other types where have one offset buffer + child data?

@tustvold
Copy link
Contributor

The way we propagate the offsets into children is not guaranteed, and is actually an incorrect special case I wish to remove, but yes ignoring this edge case, the logic here is fine.

As for lists, theoretically the IPC writer could find the min offset in the list slice, and then shift everything down - as is done for StringArray, etc... Not sure if we do this yet...

@tustvold tustvold merged commit a8276c0 into apache:master Jan 10, 2023
@ursabot
Copy link

ursabot commented Jan 10, 2023

Benchmark runs are scheduled for baseline = b4abb75 and contender = a8276c0. a8276c0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@viirya
Copy link
Member

viirya commented Jan 10, 2023

As for lists, theoretically the IPC writer could find the min offset in the list slice, and then shift everything down - as is done for StringArray, etc... Not sure if we do this yet...

Oh, I see. I remember I did something on buffer slice in IPC writer. I may need to re-look it to see if we do it for list slice.

@alamb
Copy link
Contributor

alamb commented Jan 11, 2023

Wow -- thank you @askoa . This is great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArrayDataget_slice_memory_size or similar
5 participants