-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-40339: [Java] StringView Initial Implementation #40340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
93cdde7 to
4526423
Compare
5942a63 to
4c36aaf
Compare
daae6dc to
184ed70
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments. I'm not done looking at this yet.
java/vector/src/main/java/org/apache/arrow/vector/AbstractVariableWidthVector.java
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/AddOrGetResult.java
Outdated
Show resolved
Hide resolved
| setCapacity(len, false); | ||
| System.arraycopy(utf8, start, bytes, 0, len); | ||
| this.length = len; | ||
| super.set(utf8, start, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're deferring to super, either this should be @Override or we just shouldn't define this in the first place.
|
|
||
| @Override | ||
| public Boolean visit(BaseVariableWidthViewVector left, Void value) { | ||
| throw new UnsupportedOperationException("View vectors are not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really so difficult to implement that we can't include it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the difficulty, I actually thought about the size of this PR. It is already over 3000 LoC. I was merely trying move possible things to separate PRs.
| } | ||
|
|
||
| @Override | ||
| public Integer visit(ArrowType.BinaryView type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to have to be overhauled, since the buffer count is not fixed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, I have created an issue for this. How should we proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to look at how/where this is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, I kept this unsupported and I already have created an issue to follow up. I am also not sure about this. But if we think about the layout, and keep the growing external buffer as K variable, the base minimum number of buffers can be defined as 2, which is validity and view buffer. But I am not sure if that actually make sense as layout should have that external buffer represented in some way. Or can we include a callback-like thing to get the precise value? But that is again a bit odd as layout supposed to be like an enum-like class.
Not exactly sure what to do here.
// TODO: #40934
|
@lidavidm Thanks for the quick comments. I will address these. |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vibhatha can you go over the base implementation again with the spec? It seems to be following the offsets buffer implementation still but none of that applies to view vectors.
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
|
@lidavidm the offset usage is incorrect here, it's not needed. I need to fix that. Thanks a lot for catching that. |
4aeb9d2 to
88d7818
Compare
|
@lidavidm I added a few more tests, covering the overwrites for short/long rellocate/no-reallocate with |
|
@github-actions crossbow submit java |
|
Revision: 88d7818 Submitted crossbow builds: ursacomputing/crossbow @ actions-d19dec4add |
|
@lidavidm the |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit a8c4f86. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 15 possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change StringView implementation in Java. This PR only includes the core implementation of StringView ### What changes are included in this PR? - [X] Adding ViewVarBinaryVector - [X] Adding ViewVarCharVector - [X] Adding corresponding test cases in the given scope - [X] Including required implementation extensions with not supported warnings - [X] Interface for Holders ### Non Goals of this PR - [ ] apache#40937 - [ ] apache#40936 - [ ] apache#40932 - [ ] apache#40943 - [ ] apache#40944 - [ ] apache#40942 - [ ] https://github.com/apache/arrow/issues/40945 - [ ] https://github.com/apache/arrow/issues/40941 - [ ] https://github.com/apache/arrow/issues/40946 ### Are these changes tested? Yes. Existing test cases on `VarCharVector` and `VarBinaryVector` are verified with view implementations and additional test cases have also been added to check view functionality. And explitly tests have been added to evaluate the view functionality with `ViewVarCharVector` ### Are there any user-facing changes? Yes, this introduces a new API and some public methods have been included in an interface so that it can be extended to write custom functionality like done for views. * GitHub Issue: apache#40339 Lead-authored-by: Vibhatha Abeykoon <[email protected]> Co-authored-by: vibhatha <[email protected]> Co-authored-by: Vibhatha Lakmal Abeykoon <[email protected]> Signed-off-by: David Li <[email protected]>
### Rationale for this change StringView implementation in Java. This PR only includes the core implementation of StringView ### What changes are included in this PR? - [X] Adding ViewVarBinaryVector - [X] Adding ViewVarCharVector - [X] Adding corresponding test cases in the given scope - [X] Including required implementation extensions with not supported warnings - [X] Interface for Holders ### Non Goals of this PR - [ ] apache#40937 - [ ] apache#40936 - [ ] apache#40932 - [ ] apache#40943 - [ ] apache#40944 - [ ] apache#40942 - [ ] https://github.com/apache/arrow/issues/40945 - [ ] https://github.com/apache/arrow/issues/40941 - [ ] https://github.com/apache/arrow/issues/40946 ### Are these changes tested? Yes. Existing test cases on `VarCharVector` and `VarBinaryVector` are verified with view implementations and additional test cases have also been added to check view functionality. And explitly tests have been added to evaluate the view functionality with `ViewVarCharVector` ### Are there any user-facing changes? Yes, this introduces a new API and some public methods have been included in an interface so that it can be extended to write custom functionality like done for views. * GitHub Issue: apache#40339 Lead-authored-by: Vibhatha Abeykoon <[email protected]> Co-authored-by: vibhatha <[email protected]> Co-authored-by: Vibhatha Lakmal Abeykoon <[email protected]> Signed-off-by: David Li <[email protected]>
Rationale for this change
StringView implementation in Java. This PR only includes the core implementation of StringView
What changes are included in this PR?
Non Goals of this PR
ViewVarCharVector#40937ViewVarBinaryVector#40936RangeEqualsVisitorfor StringView #40943TypeEqualsVisitorfor StringView #40944Are these changes tested?
Yes. Existing test cases on
VarCharVectorandVarBinaryVectorare verified with view implementations and additional test cases have also been added to check view functionality. And explitly tests have been added to evaluate the view functionality withViewVarCharVectorAre there any user-facing changes?
Yes, this introduces a new API and some public methods have been included in an interface so that it can be extended to write custom functionality like done for views.