-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-18014: [Java] Implement copy functions for vectors and Table #14389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@davisusanibar If you wouldn't mind providing a review it would be appreciated. |
|
|
|
@lidavidm Are you familiar with the error shown in the Windows Flight tests: |
|
This has been known for a while, but I've never been able to reproduce it. |
|
The resolution would probably be to figure out how to enable allocator logging in CI so that we can properly debug this |
OK. thanks |
|
See ARROW-18034 / ARROW-18035 |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one typo spotted in a doc comment
| * | ||
| * @param columnName The name of the vector to copy | ||
| * @return A copy of the Vector with the given name | ||
| * @throws IllegalArgumentException if the name is not the name of a vector in the table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the exception class is different from what's actually thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix the exception class issue. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| assertEquals(2, copy.getValueCount()); | ||
| assertEquals(0, copy.getNullCount()); | ||
| for (int i = 0; i < t.getRowCount(); i++) { | ||
| assertEquals(original.getObject(i), copy.getObject(i)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily an issue for this PR, but there is a VectorValueComparator and perhaps it would be good to integrate it as FieldVector#equals if it isn't already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Thank you
|
I modified the documentation to reflect the correct Exception (It should be IllegalArgumentException.) This was mentioned incorrectly in many places in the Row class so the documentation was corrected there as well. Only documentation was touched in the latest commit |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
Seems tests need to be realigned too? |
| * @return A copy of the Vector with the given name | ||
| * @throws IllegalArgumentException if the name is not the name of a vector in the table. | ||
| */ | ||
| public FieldVector getVectorCopy(String columnName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a doubt, Is there some base/abstract object that Table and VectorShemaRoot share?
For example, this method could be helpful in both sides Tables and VectorSchemaRoot
If by design this was created independently please let me know to consider that at the moment to read *.table package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a doubt, Is there some base/abstract object that
TableandVectorShemaRootshare?For example, this method could be helpful in both sides Tables and VectorSchemaRoot
There's no abstract class as VSR doesn't inherit from anything. I have considered adding an interface at some point that they both could implement to make it easier to swap one for the other.
If by design this was created independently please let me know to consider that at the moment to read *.table package
I'm not sure I understand this part of the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no abstract class as VSR doesn't inherit from anything. I have considered adding an interface at some point that they both could implement to make it easier to swap one for the other.
Ok, thank you.
I'm not sure I understand this part of the comment.
This is related with the 1st question that initially VSR and Table are growing independently and at some point they both could implement the same contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks.
Given the possibility of someday having a shared interface, I have tried to write similar methods using the same method signature as much as possible.
|
Benchmark runs are scheduled for baseline = ea32825 and contender = 0e94da9. 0e94da9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This PR bumps Apache Arrow version from 10.0.0 to 11.0.0. Main changes related to PyAmber: ## Java/Scala side: - Distribute Apple M1 compatible JNI libraries via mavencentral ([#14472](apache/arrow#14472)). - Improve performance by short-circuiting null checks when comparing non null field types ([#15106](apache/arrow#15106)). - Extend Table copy functionality, and support returning copies of individual vectors ([#14389](apache/arrow#14389)). - Several enhancements to dictionary encoding ([#14891](apache/arrow#14891), ([#14902](apache/arrow#14902), ([#14874](apache/arrow#14874)). - Extend Table to support additional vector types ([#14573](apache/arrow#14573)). - Enhance and simplify handling of allocation management by integrating C Data into allocator hierarchy ([#14506](apache/arrow#14506)). ## Python side: - PyArrow now requires pandas >= 1.0 ([ARROW-18173](https://issues.apache.org/jira/browse/ARROW-18173)). - Added support for the [DataFrame Interchange Protocol](https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html) for pyarrow.Table ([GH-33346](apache/arrow#33346)). - Support for custom metadata of record batches in the IPC read and write APIs ([ARROW-16430](https://issues.apache.org/jira/browse/ARROW-16430)). - The Time32Scalar, Time64Scalar, Date32Scalar and Date64Scalar classes got a .value attribute to access the underlying integer value, similar to the other date-time related scalars ([ARROW-18264](https://issues.apache.org/jira/browse/ARROW-18264)). - Casting to string is now supported for duration ([ARROW-15822](https://issues.apache.org/jira/browse/ARROW-15822)) and decimal ([ARROW-17458](https://issues.apache.org/jira/browse/ARROW-17458)) types, which also means those can now be written to CSV. ## Issues fixed: - Now Do_action (from Python server back to Java Client) is returning a stream of results properly, and it alerts when the results are not fully consumed by the client. Such results will be used to send the flow control credits back from the Python side. We limit the results to be exact 1 for now, although it can be a stream. - Fix a bug in the Python proxy server, when unregistered action is invoked, it should not parse and return the results.
…pache#14389) Copy methods for the table and for individual vectors are provided with tests Authored-by: Larry White <[email protected]> Signed-off-by: David Li <[email protected]>
Copy methods for the table and for individual vectors are provided with tests