-
Notifications
You must be signed in to change notification settings - Fork 95
GH-709: Correct length calculation of value buffers of variable-sized arrays #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
|
This PR should have the 'bug-fix' label, but I don't seem to be able to apply that myself. |
|
arrow-java doesn't support slicing in general, so for offset-based arrays, other code downstream may not work properly if the first offset is nonzero. I think the longer term fix is to either detect this and copy or decide to properly implement slicing. That said fixing an out-of-bounds is always good. |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Do you mind opening a new issue to link this PR to? The original issue has more discussion. In particular I think the cross-language integration tests need to be improved to cover this case
|
I've created a new issue and update the summary and description of this PR. |
|
Thanks, the CI failures here should not be related but let me dig into what's going on - I may ask you to rebase |
### What changes were proposed in this pull request? This pr aims to upgrade `arrow-java` from 18.2.0 to 18.3.0. ### Why are the changes needed? The new version bring some bug fixes, like: - apache/arrow-java#627 - apache/arrow-java#654 - apache/arrow-java#656 - apache/arrow-java#693 - apache/arrow-java#705 - apache/arrow-java#707 - apache/arrow-java#722 In addition, the new version introduces a cascading upgrade for flatbuffers-java([ from 24.3.25 to 25.1.24 ](apache/arrow-java#600)) the full release note as follows: - https://github.com/apache/arrow-java/releases/tag/v18.3.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Acitons ### Was this patch authored or co-authored using generative AI tooling? No Closes #50892 from LuciferYang/arrow-java-18.3.0. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This pr aims to upgrade `arrow-java` from 18.2.0 to 18.3.0. ### Why are the changes needed? The new version bring some bug fixes, like: - apache/arrow-java#627 - apache/arrow-java#654 - apache/arrow-java#656 - apache/arrow-java#693 - apache/arrow-java#705 - apache/arrow-java#707 - apache/arrow-java#722 In addition, the new version introduces a cascading upgrade for flatbuffers-java([ from 24.3.25 to 25.1.24 ](apache/arrow-java#600)) the full release note as follows: - https://github.com/apache/arrow-java/releases/tag/v18.3.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Acitons ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50892 from LuciferYang/arrow-java-18.3.0. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What's Changed
For variable-size binary layout arrays, BufferImportTypeVisitor currently derives the length of the value buffer by calculating the difference between the last and first offset. When the first offset is not zero, this is actually incorrect and leads to out of bounds errors when attempting to read values from the imported array.
Instead, BufferImportTypeVisitor should simply use the last offset value as the length of the value buffer. This PR makes that change.
Just FYI, I bumped into this issue when attempting to import an array originating from DataFusion. A test query of the form
SELECT column1 FROM VALUES ('a'), ('b'), ('c'), ('d') LIMIT 2 OFFSET 1;returns a slice of the full set of values. The values buffer contains all the original values, and the offsets buffer contains 1 and 2 as values to handle the offset from the query.Closes #709 .