Skip to content

Conversation

@pepijnve
Copy link
Contributor

@pepijnve pepijnve commented Apr 10, 2025

What's Changed

For variable-size binary layout arrays, BufferImportTypeVisitor currently derives the length of the value buffer by calculating the difference between the last and first offset. When the first offset is not zero, this is actually incorrect and leads to out of bounds errors when attempting to read values from the imported array.

Instead, BufferImportTypeVisitor should simply use the last offset value as the length of the value buffer. This PR makes that change.

Just FYI, I bumped into this issue when attempting to import an array originating from DataFusion. A test query of the form SELECT column1 FROM VALUES ('a'), ('b'), ('c'), ('d') LIMIT 2 OFFSET 1; returns a slice of the full set of values. The values buffer contains all the original values, and the offsets buffer contains 1 and 2 as values to handle the offset from the query.

Closes #709 .

@github-actions

This comment has been minimized.

@pepijnve
Copy link
Contributor Author

This PR should have the 'bug-fix' label, but I don't seem to be able to apply that myself.

@lidavidm lidavidm added the bug-fix PRs that fix a big. label Apr 10, 2025
@github-actions github-actions bot added this to the 18.3.0 milestone Apr 10, 2025
@lidavidm
Copy link
Member

arrow-java doesn't support slicing in general, so for offset-based arrays, other code downstream may not work properly if the first offset is nonzero. I think the longer term fix is to either detect this and copy or decide to properly implement slicing. That said fixing an out-of-bounds is always good.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Do you mind opening a new issue to link this PR to? The original issue has more discussion. In particular I think the cross-language integration tests need to be improved to cover this case

@pepijnve pepijnve changed the title GH-74: Correct length calculation of value buffers of variable-sized arrays GH-709: Correct length calculation of value buffers of variable-sized arrays Apr 11, 2025
@pepijnve
Copy link
Contributor Author

I've created a new issue and update the summary and description of this PR.

@lidavidm
Copy link
Member

Thanks, the CI failures here should not be related but let me dig into what's going on - I may ask you to rebase

@lidavidm lidavidm merged commit 74e8981 into apache:main Apr 14, 2025
22 of 29 checks passed
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request May 15, 2025
### What changes were proposed in this pull request?
This pr aims to upgrade `arrow-java` from 18.2.0 to 18.3.0.

### Why are the changes needed?
The new version bring some bug fixes, like:

- apache/arrow-java#627
- apache/arrow-java#654
- apache/arrow-java#656
- apache/arrow-java#693
- apache/arrow-java#705
- apache/arrow-java#707
- apache/arrow-java#722

In addition, the new version introduces a cascading upgrade for flatbuffers-java([ from 24.3.25 to 25.1.24 ](apache/arrow-java#600))

the full release note as follows:
- https://github.com/apache/arrow-java/releases/tag/v18.3.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Acitons

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50892 from LuciferYang/arrow-java-18.3.0.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
### What changes were proposed in this pull request?
This pr aims to upgrade `arrow-java` from 18.2.0 to 18.3.0.

### Why are the changes needed?
The new version bring some bug fixes, like:

- apache/arrow-java#627
- apache/arrow-java#654
- apache/arrow-java#656
- apache/arrow-java#693
- apache/arrow-java#705
- apache/arrow-java#707
- apache/arrow-java#722

In addition, the new version introduces a cascading upgrade for flatbuffers-java([ from 24.3.25 to 25.1.24 ](apache/arrow-java#600))

the full release note as follows:
- https://github.com/apache/arrow-java/releases/tag/v18.3.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Acitons

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50892 from LuciferYang/arrow-java-18.3.0.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PRs that fix a big.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accessing values from imported 'C data interface' array can result in out of bounds reads

2 participants