Skip to content

Conversation

@icexelloss
Copy link
Collaborator

@icexelloss icexelloss commented Jan 6, 2017

Bryan,

Here are some changes that I made:
(1) Make ArrowSuite pass (fixing the validity map)
(2) Create benchmark.py to benchmark toPandas improvement

This compiles with the most recent arrow

val buf = allocator.buffer(numOfRows * field.dataType.defaultSize)
var nullCount = 0
rows.foreach { row =>
rows.zipWithIndex.foreach { case (row, index) =>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better just use a while loop here since zipWithIndex will iterate and copy the items in an array

var index = 0
while (index < rows.length) {
  ..
  index += 1
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I am going to refactor this bit of code later to be more efficient. Do you want to wait until that is done or do you want to merge this first?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this open for a day or so, so Xusen can have a look since he wrote the conversion code. If you want to update go ahead or I can before I merge, no biggie. I also would like to have fillWithArrow and getAndSetToArrow use the same data type cases to avoid duplication, but I can do that later.

if (row.isNullAt(ordinal)) {
nullCount += 1
validityMutator.set(index, 0)
fillArrow(buf, field.dataType)
Copy link
Owner

@BryanCutler BryanCutler Jan 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, so the buffer must contain values at each "null" position? Is the case for StringType below done correctly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler
Copy link
Owner

Thanks @icexelloss , looks good! Just a couple questions. CC @yinxusen , have any questions/comments?

@BryanCutler
Copy link
Owner

merged with a few minor changes to benchmark.py also. Thanks!

BryanCutler pushed a commit that referenced this pull request Jan 24, 2017
…ark script

Remove arrow-tools dependency

changed zipWithIndex to while loop

modified benchmark to work with Python2 timeit

closes #13
BryanCutler pushed a commit that referenced this pull request Feb 23, 2017
…ark script

Remove arrow-tools dependency

changed zipWithIndex to while loop

modified benchmark to work with Python2 timeit

closes #13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants