-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1785: [Format/C++/Java] Remove VectorLayout from serialized schemas #1297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I can take a look at this. My understanding is we want to remove vector layout from metadata and assume buffer ordering (validity, data) and (validity, offset, data) in readers, is that correct? |
|
Right. Unfortunately the Java implementation (including the JSON reader/writer) is a bit intertwined with the Flatbuffers |
|
Gee, also the concurrent patches that touch reader/writer is getting a bit out of control. And this one. cc @siddharthteotia @BryanCutler. |
|
I think here I would like to touch the bare minimum of Java code to remove the dependence on the Flatbuffers. I can try to do this today so not impact the java-vector-refactor branch |
|
This is a fairly invasive refactor. I don't have the skills or time to do the Java work, someone else is going to have to help. Having the Java object model tightly coupled to the Flatbuffers schemas is not great. I hope we can fix that |
|
I can take a look at Java side next week.
…On Sat, Nov 11, 2017 at 5:26 PM Wes McKinney ***@***.***> wrote:
This is a fairly invasive refactor. I don't have the skills or time to do
the Java work, someone else is going to have to help. Having the Java
object model tightly coupled to the Flatbuffers schemas is not great. I
hope we can fix that
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1297 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAwbrLP9Sfnc8pTU3npjsVjhBQRuJucVks5s1h8LgaJpZM4QYr61>
.
|
|
@wesm I took a look today. The change does't seem to be too bad. However, json reader probably needs the new vector class hierarchy (BaseNullableFixedWidthVector, BaseNullableVariableWidthVector) to determine the expected layout. So it's probably best either wait until java-refactor gets merged to master or do it on java-refactor branch. cc @siddharthteotia |
|
Cool, since the merge seems likely this week, we can just wait. thanks! |
|
I'll rebase this, then I think we can do the Java work and merge this |
|
Rebased |
|
Can someone look at this early this coming week? We should not release 0.8.0 without this |
|
@wesm, I am at pydata today but I will try to take a look later today or tomorrow. |
cdefbe3 to
e5ae07e
Compare
|
@siddharthteotia can you review this? This is the last format-related change required for 0.8.0 |
|
Looks like the java side of changes are messed up because of rebase commits? I am seeing code changes (like removal of non-nullable vectors) and those are not the java changes made by this PR. It is hard to make out what are we doing on the JAVA side as far this PR is concerned. I can see that ArrowVectorType and VectorLayout have been removed and we are using BufferType and BufferLayout instead. Is that correct? I believe that these objects are still part of serialized schema -- it's just that names have changed. Instead of ArrowVectorType, we have BufferType and instead of VectorLayout we have BufferLayout. Is this the correct summary of JAVA side changes? |
|
@siddharthteotia
Other changes to Java are to Reader/Writer classes 0- they no longer read/write type layout when reading/writing schema. |
|
Rebased, apologies for the diff noise |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment. From an Java API perspective these changes appear non-disruptive
| listWriter.allocate(); | ||
|
|
||
| /* the dataVector that backs a listVector will also be a | ||
| /* the dataBuffer that backs a listVector will also be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some of these code comment changes aren't quite right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are likely done be the IDE, I will double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here wesm#3
…evel. Rename VectorLayout to BufferLayout.
|
+1 |
|
Thanks all!! |
…emas What's here so far removes this code from the Flatbuffers schema and the C++ implementation. This logic is a little bit entangled on the Java side, I need some help from @icexelloss or @BryanCutler or @julienledem or someone else to handle the Java refactoring. It might be easiest to preserve the ArrowVectorType/TypeLayout/VectorLayout objects for now but simply remove the Flatbuffers dependency (we do need the names of the vectors in the JSON files used for integration testing) cc @trxcllnt since we may want to roll in the JS changes in this patch Author: Li Jin <[email protected]> Author: Wes McKinney <[email protected]> Author: Li Jin <[email protected]> Closes apache#1297 from wesm/ARROW-1785 and squashes the following commits: c1e7ea9 [Li Jin] Fix comment 43ff4e3 [Li Jin] Remove Json annotation 95e8736 [Li Jin] (Refactor) Move TypeLayout and VectorLayout from ipc.message to top level. Rename VectorLayout to BufferLayout. 31cbd48 [Li Jin] Fix TypeLayout.java 890a08d [Li Jin] Integration test passing 2024db5 [Wes McKinney] Remove VectorLayout from Flatbuffers, C++ implementation
What's here so far removes this code from the Flatbuffers schema and the C++ implementation. This logic is a little bit entangled on the Java side, I need some help from @icexelloss or @BryanCutler or @julienledem or someone else to handle the Java refactoring. It might be easiest to preserve the ArrowVectorType/TypeLayout/VectorLayout objects for now but simply remove the Flatbuffers dependency (we do need the names of the vectors in the JSON files used for integration testing)
cc @trxcllnt since we may want to roll in the JS changes in this patch