-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1283: [JAVA] Allow VectorSchemaRoot to close more than once #898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1283: [JAVA] Allow VectorSchemaRoot to close more than once #898
Conversation
…ated again after closing
…Root-close-twice-ARROW-1283
elahrvivaz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
|
@jacques-n @StevenMPhillips @siddharthteotia do you see any issues with this? |
|
I would like to know why is there a need to close a vector twice. The current change is structured such that following works and the last statement doesn't raise an error. vector.close() Once close() operation is invoked on the vector and allocator, I expect the resources associated to be garbage collected properly. Under what conditions, do we still the 3rd statement? Thanks, |
| data = null; | ||
| } | ||
| super.close(); | ||
| data = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer doing this change differently. Maybe by allowing allocator to return an empty buffer even if closed. This is because it makes bugs/issues much easier to understand than getting an NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that sounds fine. As far as I can this would not cause any issues.
@siddharthteotia , this came from a discussion in Spark here. The root/allocator are being used in an iterator and are closed normally after the last iteration, however there are some ways this wouldn't happen (say if a task was cancelled), so they are also closed in a callback to prevent any leaks. Right now an extra flag is being used, which is fine but it would be cleaner to just allow them to call close a second time and not do anything. |
…s and assigns an empty buffer
| } | ||
|
|
||
| private ArrowBuf createEmpty() { | ||
| assertOpen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this did not seem necessary either, because an allocator will always hold an instance of an empty buffer. Does this seem ok?
|
@jacques-n , I made the change to use an empty buffer instead of assigning null. This essentially made It also seems to me that all Nullable vector would now have a |
…Root-close-twice-ARROW-1283
9dd1f07 to
2921d84
Compare
| } | ||
|
|
||
| // TODO: Nullable vectors extend BaseDataValueVector but do not use the data field | ||
| // We should fix the inheritance tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siddharthteotia I believe you resolved this comment in #892. Does this PR look ok to you?
|
The CI failure is unrelated to these changes. Can this be merged? |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Thanks @wesm! |
This change allows the VectorSchemaRoot/FieldVectors to close more than once, even if the allocator has already been closed. Before, an empty ArrowBuf was created during closing which required the allocator to not be closed, however this empty buffer is not needed once the FieldVector has been closed. Author: Bryan Cutler <[email protected]> Closes apache#898 from BryanCutler/java-vectorSchemaRoot-close-twice-ARROW-1283 and squashes the following commits: 2921d84 [Bryan Cutler] removed resolved comment 3b3718b [Bryan Cutler] Merge remote-tracking branch 'upstream/master' into java-vectorSchemaRoot-close-twice-ARROW-1283 e992fc7 [Bryan Cutler] BaseDataValueVector.close will now just clear, which releases previous and assigns an empty buffer 8ecfce2 [Bryan Cutler] Merge remote-tracking branch 'upstream/master' into java-vectorSchemaRoot-close-twice-ARROW-1283 ca38d3d [Bryan Cutler] use clear to release data, ensure that an empty buffer is never allocated again after closing 10ff7c3 [Bryan Cutler] Added regression test
This change allows the VectorSchemaRoot/FieldVectors to close more than once, even if the allocator has already been closed. Before, an empty ArrowBuf was created during closing which required the allocator to not be closed, however this empty buffer is not needed once the FieldVector has been closed.