Skip to content

Conversation

@BryanCutler
Copy link
Member

This change allows the VectorSchemaRoot/FieldVectors to close more than once, even if the allocator has already been closed. Before, an empty ArrowBuf was created during closing which required the allocator to not be closed, however this empty buffer is not needed once the FieldVector has been closed.

@BryanCutler
Copy link
Member Author

cc @julienledem @elahrvivaz

Copy link
Contributor

@elahrvivaz elahrvivaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@wesm
Copy link
Member

wesm commented Jul 28, 2017

@jacques-n @StevenMPhillips @siddharthteotia do you see any issues with this?

@siddharthteotia
Copy link
Contributor

@BryanCutler

I would like to know why is there a need to close a vector twice. The current change is structured such that following works and the last statement doesn't raise an error.

vector.close()
allocator.close()
vector.close()

Once close() operation is invoked on the vector and allocator, I expect the resources associated to be garbage collected properly. Under what conditions, do we still the 3rd statement?

Thanks,
Siddharth

data = null;
}
super.close();
data = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer doing this change differently. Maybe by allowing allocator to return an empty buffer even if closed. This is because it makes bugs/issues much easier to understand than getting an NPE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds fine. As far as I can this would not cause any issues.

@BryanCutler
Copy link
Member Author

BryanCutler commented Jul 31, 2017

I would like to know why is there a need to close a vector twice.

@siddharthteotia , this came from a discussion in Spark here. The root/allocator are being used in an iterator and are closed normally after the last iteration, however there are some ways this wouldn't happen (say if a task was cancelled), so they are also closed in a callback to prevent any leaks. Right now an extra flag is being used, which is fine but it would be cleaner to just allow them to call close a second time and not do anything.

}

private ArrowBuf createEmpty() {
assertOpen();
Copy link
Member Author

@BryanCutler BryanCutler Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this did not seem necessary either, because an allocator will always hold an instance of an empty buffer. Does this seem ok?

@BryanCutler
Copy link
Member Author

BryanCutler commented Aug 1, 2017

@jacques-n , I made the change to use an empty buffer instead of assigning null. This essentially made close() the same as clear() and could be removed. Now in the super class BaseValueVector.close() will invoke clear() to release any buffer and assign data to an empty buffer.

It also seems to me that all Nullable vector would now have a close() method that is identical to clear(). Is that maybe something to remove and cleanup here also? Should there be any different meaning to closing vs. clearing a vector?

@BryanCutler BryanCutler force-pushed the java-vectorSchemaRoot-close-twice-ARROW-1283 branch from 9dd1f07 to 2921d84 Compare August 3, 2017 18:23
}

// TODO: Nullable vectors extend BaseDataValueVector but do not use the data field
// We should fix the inheritance tree
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddharthteotia I believe you resolved this comment in #892. Does this PR look ok to you?

@wesm
Copy link
Member

wesm commented Aug 4, 2017

The CI failure is unrelated to these changes. Can this be merged?

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@asfgit asfgit closed this in f9d9833 Aug 7, 2017
@BryanCutler
Copy link
Member Author

Thanks @wesm!

@BryanCutler BryanCutler deleted the java-vectorSchemaRoot-close-twice-ARROW-1283 branch November 7, 2017 23:50
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
This change allows the VectorSchemaRoot/FieldVectors to close more than once, even if the allocator has already been closed.  Before, an empty ArrowBuf was created during closing which required the allocator to not be closed, however this empty buffer is not needed once the FieldVector has been closed.

Author: Bryan Cutler <[email protected]>

Closes apache#898 from BryanCutler/java-vectorSchemaRoot-close-twice-ARROW-1283 and squashes the following commits:

2921d84 [Bryan Cutler] removed resolved comment
3b3718b [Bryan Cutler] Merge remote-tracking branch 'upstream/master' into java-vectorSchemaRoot-close-twice-ARROW-1283
e992fc7 [Bryan Cutler] BaseDataValueVector.close will now just clear, which releases previous and assigns an empty buffer
8ecfce2 [Bryan Cutler] Merge remote-tracking branch 'upstream/master' into java-vectorSchemaRoot-close-twice-ARROW-1283
ca38d3d [Bryan Cutler] use clear to release data, ensure that an empty buffer is never allocated again after closing
10ff7c3 [Bryan Cutler] Added regression test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants