Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Nov 9, 2017

What's here so far removes this code from the Flatbuffers schema and the C++ implementation. This logic is a little bit entangled on the Java side, I need some help from @icexelloss or @BryanCutler or @julienledem or someone else to handle the Java refactoring. It might be easiest to preserve the ArrowVectorType/TypeLayout/VectorLayout objects for now but simply remove the Flatbuffers dependency (we do need the names of the vectors in the JSON files used for integration testing)

cc @trxcllnt since we may want to roll in the JS changes in this patch

@trxcllnt
Copy link
Contributor

trxcllnt commented Nov 9, 2017

@wesm I'll regenerate the JS flatbuffers locally, and add to #1294. I have a bunch of little questions about the layouts for each type, but I'll ping you over email/slack to get the details.

@icexelloss
Copy link
Contributor

I can take a look at this. My understanding is we want to remove vector layout from metadata and assume buffer ordering (validity, data) and (validity, offset, data) in readers, is that correct?

@wesm
Copy link
Member Author

wesm commented Nov 10, 2017

Right. Unfortunately the Java implementation (including the JSON reader/writer) is a bit intertwined with the Flatbuffers

@icexelloss
Copy link
Contributor

Gee, also the concurrent patches that touch reader/writer is getting a bit out of control.

#1290
#1259

And this one. cc @siddharthteotia @BryanCutler.

@wesm
Copy link
Member Author

wesm commented Nov 11, 2017

I think here I would like to touch the bare minimum of Java code to remove the dependence on the Flatbuffers. I can try to do this today so not impact the java-vector-refactor branch

@wesm
Copy link
Member Author

wesm commented Nov 11, 2017

This is a fairly invasive refactor. I don't have the skills or time to do the Java work, someone else is going to have to help. Having the Java object model tightly coupled to the Flatbuffers schemas is not great. I hope we can fix that

@icexelloss
Copy link
Contributor

icexelloss commented Nov 12, 2017 via email

@icexelloss
Copy link
Contributor

@wesm I took a look today. The change does't seem to be too bad. However, json reader probably needs the new vector class hierarchy (BaseNullableFixedWidthVector, BaseNullableVariableWidthVector) to determine the expected layout.

So it's probably best either wait until java-refactor gets merged to master or do it on java-refactor branch. cc @siddharthteotia

@wesm
Copy link
Member Author

wesm commented Nov 14, 2017

Cool, since the merge seems likely this week, we can just wait. thanks!

@wesm
Copy link
Member Author

wesm commented Nov 16, 2017

I'll rebase this, then I think we can do the Java work and merge this

@wesm
Copy link
Member Author

wesm commented Nov 16, 2017

Rebased

@wesm
Copy link
Member Author

wesm commented Nov 26, 2017

Can someone look at this early this coming week? We should not release 0.8.0 without this

@icexelloss
Copy link
Contributor

@wesm, I am at pydata today but I will try to take a look later today or tomorrow.

@wesm wesm force-pushed the ARROW-1785 branch 2 times, most recently from cdefbe3 to e5ae07e Compare November 29, 2017 21:51
@wesm wesm changed the title WIP ARROW-1785: [Format/C++/Java] Remove VectorLayout from serialized schemas ARROW-1785: [Format/C++/Java] Remove VectorLayout from serialized schemas Nov 29, 2017
@wesm
Copy link
Member Author

wesm commented Nov 29, 2017

@siddharthteotia can you review this? This is the last format-related change required for 0.8.0

@siddharthteotia
Copy link
Contributor

siddharthteotia commented Dec 1, 2017

Looks like the java side of changes are messed up because of rebase commits? I am seeing code changes (like removal of non-nullable vectors) and those are not the java changes made by this PR. It is hard to make out what are we doing on the JAVA side as far this PR is concerned.

I can see that ArrowVectorType and VectorLayout have been removed and we are using BufferType and BufferLayout instead. Is that correct?

I believe that these objects are still part of serialized schema -- it's just that names have changed. Instead of ArrowVectorType, we have BufferType and instead of VectorLayout we have BufferLayout. Is this the correct summary of JAVA side changes?

@icexelloss
Copy link
Contributor

@siddharthteotia VectorLayout and TypeLayout are no longer part of the serialized schema.

VectorLayout and ArrowVectorType are renamed to BufferLayout and BufferType and they are just internal classes now.

Other changes to Java are to Reader/Writer classes 0- they no longer read/write type layout when reading/writing schema.

@wesm
Copy link
Member Author

wesm commented Dec 2, 2017

Rebased, apologies for the diff noise

Copy link
Member Author

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment. From an Java API perspective these changes appear non-disruptive

listWriter.allocate();

/* the dataVector that backs a listVector will also be a
/* the dataBuffer that backs a listVector will also be a
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some of these code comment changes aren't quite right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are likely done be the IDE, I will double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here wesm#3

@siddharthteotia
Copy link
Contributor

+1

@wesm
Copy link
Member Author

wesm commented Dec 4, 2017

Thanks all!!

@wesm wesm closed this in 611a4b9 Dec 4, 2017
@wesm wesm deleted the ARROW-1785 branch December 4, 2017 22:53
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…emas

What's here so far removes this code from the Flatbuffers schema and the C++ implementation. This logic is a little bit entangled on the Java side, I need some help from @icexelloss or @BryanCutler or @julienledem or someone else to handle the Java refactoring. It might be easiest to preserve the ArrowVectorType/TypeLayout/VectorLayout objects for now but simply remove the Flatbuffers dependency (we do need the names of the vectors in the JSON files used for integration testing)

cc @trxcllnt since we may want to roll in the JS changes in this patch

Author: Li Jin <[email protected]>
Author: Wes McKinney <[email protected]>
Author: Li Jin <[email protected]>

Closes apache#1297 from wesm/ARROW-1785 and squashes the following commits:

c1e7ea9 [Li Jin] Fix comment
43ff4e3 [Li Jin] Remove Json annotation
95e8736 [Li Jin] (Refactor) Move TypeLayout and VectorLayout from ipc.message to top level. Rename VectorLayout to BufferLayout.
31cbd48 [Li Jin] Fix TypeLayout.java
890a08d [Li Jin] Integration test passing
2024db5 [Wes McKinney] Remove VectorLayout from Flatbuffers, C++ implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants