Skip to content

Conversation

@vibhatha
Copy link
Contributor

@vibhatha vibhatha commented Aug 6, 2024

Rationale for this change

getBuffers method provides the capability to clear the buffers in the vector, this has not been properly tested while clear flag is not properly used in the implementation across various types of vectors.

What changes are included in this PR?

Updating the vector getBuffers method to use clear flag as expected and adding corresponding test cases.

Are these changes tested?

Yes, via existing test cases and new test cases.

Are there any user-facing changes?

Yes

@vibhatha vibhatha marked this pull request as ready for review August 6, 2024 11:07
@vibhatha vibhatha requested a review from lidavidm as a code owner August 6, 2024 11:07
@lidavidm
Copy link
Member

lidavidm commented Aug 9, 2024

I think this should be considered a breaking change

@lidavidm
Copy link
Member

lidavidm commented Aug 9, 2024

Also, are we certain the original behavior wasn't intentional? How is clear currently used?

@vibhatha
Copy link
Contributor Author

vibhatha commented Aug 9, 2024

I think this should be considered a breaking change

Yes it is. Sorry I didn't make it.

@vibhatha
Copy link
Contributor Author

vibhatha commented Aug 9, 2024

Also, are we certain the original behavior wasn't intentional? How is clear currently used?

This came up in the LargeListVIewVector review. Up to that point, I also didn't think about it that much, I was actually using the existing logic from other vectors. Clear is just used to determine whether or not to clear the vector but not the FieldVector assoicated with that vector.

@vibhatha
Copy link
Contributor Author

vibhatha commented Aug 9, 2024

If we allow this, we need to document and remind the user that when clear is true, there is a responsibility in clearing the retrieved buffers.

@lidavidm
Copy link
Member

lidavidm commented Aug 9, 2024

Also, are we certain the original behavior wasn't intentional? How is clear currently used?

This came up in the LargeListVIewVector review. Up to that point, I also didn't think about it that much, I was actually using the existing logic from other vectors. Clear is just used to determine whether or not to clear the vector but not the FieldVector assoicated with that vector.

Right, I pointed it out since it didn't make immediate sense. But I'm asking if the codebase gives a reason why clear works this way. It might not be obvious.

@vibhatha
Copy link
Contributor Author

vibhatha commented Aug 9, 2024

Let me investigate this a little more.

@vibhatha
Copy link
Contributor Author

Looking into this, I think, clearing the buffers of the child vector might release memory that is still in use elsewhere. And this might lead to unexpected behavior or errors as we are explicitly enforcing a buffer clear on the extracted ones. Also, I think the clear was only meant to be used to clear the vector's buffer not its child buffers. But I feel like this is not precisely documented, as it only says "Return the underlying buffers associated with this vector".

@vibhatha
Copy link
Contributor Author

Reconsidering my position, I would like to propose a documentation update and reverting the change made for LargeListViewVector in addressing reviews in the previous PR.

@lidavidm
Copy link
Member

Sounds good to me. Thanks for looking!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm I didn't add the LargeList and LargeListView as LargeList wasn't there in the first place.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 13, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Aug 14, 2024
@vibhatha
Copy link
Contributor Author

@lidavidm I updated the PR with the doc change.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Aug 14, 2024
@lidavidm lidavidm merged commit 4d200dc into apache:main Aug 14, 2024
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Aug 14, 2024
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 4d200dc.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…ag usage (apache#43583)

### Rationale for this change

`getBuffers` method provides the capability to clear the buffers in the vector, this has not been properly tested while clear flag is not properly used in the implementation across various types of vectors. 

### What changes are included in this PR?

Updating the vector `getBuffers` method to use `clear` flag as expected and adding corresponding test cases. 

### Are these changes tested?

Yes, via existing test cases and new test cases. 

### Are there any user-facing changes?

Yes
* GitHub Issue: apache#43577

Authored-by: Vibhatha Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants