Skip to content

Conversation

@BryanCutler
Copy link
Member

What changes were proposed in this pull request?

This change simplifies the ArrowColumnVector ListArray accessor to use provided Arrow APIs available in v0.15.0 to calculate element indices.

Why are the changes needed?

This simplifies the code by avoiding manual calculations on the Arrow offset buffer and makes use of more stable APIs.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

@BryanCutler
Copy link
Member Author

This also avoid having to deal with an API change in the ArrowBuf class for Arrow 1.0.0 and will allow Arrow-Spark integration tests to run again in preparation for the 1.0.0 release, see apache/arrow#6316.

@BryanCutler BryanCutler requested a review from HyukjinKwon June 24, 2020 03:01
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also covers SPARK-31998?

int index = rowId * ListVector.OFFSET_WIDTH;
int start = offsets.getInt(index);
int end = offsets.getInt(index + ListVector.OFFSET_WIDTH);
int start = accessor.getElementStartIndex(rowId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya , I wasn't aware SPARK-31998 was made, but yes that was the motivation for this fix.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon HyukjinKwon changed the title [SPARK-32080][SQL] Simplify ArrowColumnVector ListArray accessor [SPARK-32080][SPARK-31998][SQL] Simplify ArrowColumnVector ListArray accessor Jun 24, 2020
@SparkQA
Copy link

SparkQA commented Jun 24, 2020

Test build #124451 has finished for PR 28915 at commit 59dd8ea.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 24, 2020

Test build #124470 has finished for PR 28915 at commit 59dd8ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

HyukjinKwon pushed a commit that referenced this pull request Jun 24, 2020
…accessor

### What changes were proposed in this pull request?

This change simplifies the ArrowColumnVector ListArray accessor to use provided Arrow APIs available in v0.15.0 to calculate element indices.

### Why are the changes needed?

This simplifies the code by avoiding manual calculations on the Arrow offset buffer and makes use of more stable APIs.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #28915 from BryanCutler/arrow-simplify-ArrowColumnVector-ListArray-SPARK-32080.

Authored-by: Bryan Cutler <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit df04107)
Signed-off-by: HyukjinKwon <[email protected]>
@BryanCutler
Copy link
Member Author

Thanks @HyukjinKwon and @viirya !

@BryanCutler BryanCutler deleted the arrow-simplify-ArrowColumnVector-ListArray-SPARK-32080 branch June 24, 2020 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants