Skip to content

Conversation

@vibhatha
Copy link
Collaborator

@vibhatha vibhatha commented Mar 4, 2024

Rationale for this change

StringView implementation in Java. This PR only includes the core implementation of StringView

What changes are included in this PR?

  • Adding ViewVarBinaryVector
  • Adding ViewVarCharVector
  • Adding corresponding test cases in the given scope
  • Including required implementation extensions with not supported warnings
  • Interface for Holders

Non Goals of this PR

Are these changes tested?

Yes. Existing test cases on VarCharVector and VarBinaryVector are verified with view implementations and additional test cases have also been added to check view functionality. And explitly tests have been added to evaluate the view functionality with ViewVarCharVector

Are there any user-facing changes?

Yes, this introduces a new API and some public methods have been included in an interface so that it can be extended to write custom functionality like done for views.

@vibhatha vibhatha changed the title GH-40339: StringView Initial Implementation for Java GH-40339: [Java] StringView Initial Implementation Mar 12, 2024
@vibhatha vibhatha marked this pull request as ready for review April 2, 2024 10:14
@vibhatha vibhatha requested a review from lidavidm as a code owner April 2, 2024 10:14
@vibhatha vibhatha self-assigned this Apr 2, 2024
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments. I'm not done looking at this yet.

setCapacity(len, false);
System.arraycopy(utf8, start, bytes, 0, len);
this.length = len;
super.set(utf8, start, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're deferring to super, either this should be @Override or we just shouldn't define this in the first place.


@Override
public Boolean visit(BaseVariableWidthViewVector left, Void value) {
throw new UnsupportedOperationException("View vectors are not supported.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really so difficult to implement that we can't include it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the difficulty, I actually thought about the size of this PR. It is already over 3000 LoC. I was merely trying move possible things to separate PRs.

}

@Override
public Integer visit(ArrowType.BinaryView type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to have to be overhauled, since the buffer count is not fixed anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, I have created an issue for this. How should we proceed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to look at how/where this is used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, I kept this unsupported and I already have created an issue to follow up. I am also not sure about this. But if we think about the layout, and keep the growing external buffer as K variable, the base minimum number of buffers can be defined as 2, which is validity and view buffer. But I am not sure if that actually make sense as layout should have that external buffer represented in some way. Or can we include a callback-like thing to get the precise value? But that is again a bit odd as layout supposed to be like an enum-like class.

Not exactly sure what to do here.

// TODO: #40934

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 2, 2024
@vibhatha
Copy link
Collaborator Author

vibhatha commented Apr 2, 2024

@lidavidm Thanks for the quick comments. I will address these.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vibhatha can you go over the base implementation again with the spec? It seems to be following the offsets buffer implementation still but none of that applies to view vectors.

@vibhatha
Copy link
Collaborator Author

vibhatha commented Apr 4, 2024

@vibhatha can you go over the base implementation again with the spec? It seems to be following the offsets buffer implementation still but none of that applies to view vectors.

Sure @lidavidm I will check again.

@vibhatha
Copy link
Collaborator Author

vibhatha commented Apr 4, 2024

@lidavidm the offset usage is incorrect here, it's not needed. I need to fix that. Thanks a lot for catching that.

@vibhatha
Copy link
Collaborator Author

@lidavidm I added a few more tests, covering the overwrites for short/long rellocate/no-reallocate with set and with setSafe just simply checking the accuracy since buffer level evaluation would be not very practical for setSafe.

@vibhatha
Copy link
Collaborator Author

@github-actions crossbow submit java

@github-actions
Copy link

Revision: 88d7818

Submitted crossbow builds: ursacomputing/crossbow @ actions-d19dec4add

Task Status
java-jars GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@vibhatha
Copy link
Collaborator Author

@lidavidm the java-jars fails, and it seems like the issue is with a C++ failure i.e. arrow-filesystem-test seems to be failing.

@vibhatha
Copy link
Collaborator Author

#41371 (You can open a new issue for a CI failure. :-)

@kou it seems to be passing now. Was this fixed in another PR?

@kou
Copy link
Member

kou commented Apr 26, 2024

#41379

@lidavidm lidavidm merged commit a8c4f86 into apache:main Apr 29, 2024
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Apr 29, 2024
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit a8c4f86.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 15 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha added a commit to vibhatha/arrow that referenced this pull request May 25, 2024
### Rationale for this change 

StringView implementation in Java. This PR only includes the core implementation of StringView

### What changes are included in this PR?

- [X] Adding ViewVarBinaryVector
- [X] Adding ViewVarCharVector
- [X] Adding corresponding test cases in the given scope
- [X] Including required implementation extensions with not supported warnings
- [X] Interface for Holders

### Non Goals of this PR

- [ ] apache#40937
- [ ] apache#40936
- [ ] apache#40932
- [ ] apache#40943
- [ ] apache#40944
- [ ] apache#40942
- [ ] https://github.com/apache/arrow/issues/40945
- [ ] https://github.com/apache/arrow/issues/40941
- [ ] https://github.com/apache/arrow/issues/40946

### Are these changes tested?

Yes. Existing test cases on `VarCharVector` and `VarBinaryVector` are verified with view implementations and additional test cases have also been added to check view functionality. And explitly tests have been added to evaluate the view functionality with `ViewVarCharVector`

### Are there any user-facing changes?

Yes, this introduces a new API and some public methods have been included in an interface so that it can be extended to write custom functionality like done for views. 

* GitHub Issue: apache#40339

Lead-authored-by: Vibhatha Abeykoon <[email protected]>
Co-authored-by: vibhatha <[email protected]>
Co-authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
### Rationale for this change 

StringView implementation in Java. This PR only includes the core implementation of StringView

### What changes are included in this PR?

- [X] Adding ViewVarBinaryVector
- [X] Adding ViewVarCharVector
- [X] Adding corresponding test cases in the given scope
- [X] Including required implementation extensions with not supported warnings
- [X] Interface for Holders

### Non Goals of this PR

- [ ] apache#40937
- [ ] apache#40936
- [ ] apache#40932
- [ ] apache#40943
- [ ] apache#40944
- [ ] apache#40942
- [ ] https://github.com/apache/arrow/issues/40945
- [ ] https://github.com/apache/arrow/issues/40941
- [ ] https://github.com/apache/arrow/issues/40946

### Are these changes tested?

Yes. Existing test cases on `VarCharVector` and `VarBinaryVector` are verified with view implementations and additional test cases have also been added to check view functionality. And explitly tests have been added to evaluate the view functionality with `ViewVarCharVector`

### Are there any user-facing changes?

Yes, this introduces a new API and some public methods have been included in an interface so that it can be extended to write custom functionality like done for views. 

* GitHub Issue: apache#40339

Lead-authored-by: Vibhatha Abeykoon <[email protected]>
Co-authored-by: vibhatha <[email protected]>
Co-authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants