-
Notifications
You must be signed in to change notification settings - Fork 2
Arrow integration #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow integration #13
Conversation
| val buf = allocator.buffer(numOfRows * field.dataType.defaultSize) | ||
| var nullCount = 0 | ||
| rows.foreach { row => | ||
| rows.zipWithIndex.foreach { case (row, index) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better just use a while loop here since zipWithIndex will iterate and copy the items in an array
var index = 0
while (index < rows.length) {
..
index += 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I am going to refactor this bit of code later to be more efficient. Do you want to wait until that is done or do you want to merge this first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this open for a day or so, so Xusen can have a look since he wrote the conversion code. If you want to update go ahead or I can before I merge, no biggie. I also would like to have fillWithArrow and getAndSetToArrow use the same data type cases to avoid duplication, but I can do that later.
| if (row.isNullAt(ordinal)) { | ||
| nullCount += 1 | ||
| validityMutator.set(index, 0) | ||
| fillArrow(buf, field.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, so the buffer must contain values at each "null" position? Is the case for StringType below done correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here are the specs of the arrow layout:
https://github.com/apache/arrow/blob/master/format/Layout.md#example-layout-int32-array
|
Thanks @icexelloss , looks good! Just a couple questions. CC @yinxusen , have any questions/comments? |
|
merged with a few minor changes to benchmark.py also. Thanks! |
…ark script Remove arrow-tools dependency changed zipWithIndex to while loop modified benchmark to work with Python2 timeit closes #13
…ark script Remove arrow-tools dependency changed zipWithIndex to while loop modified benchmark to work with Python2 timeit closes #13
Bryan,
Here are some changes that I made:
(1) Make ArrowSuite pass (fixing the validity map)
(2) Create benchmark.py to benchmark toPandas improvement
This compiles with the most recent arrow