Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 10, 2020

This is a refactor similar to #1175 that is intended to make the PartitionKey class used by Spark public. This PR uses a different approach that doesn't require creating sub-classes of PartitionKey and custom Accessor classes. Instead, the public PartitionKey class can partition any StructLike, and a reusable wrapper is added to adapt Spark's InternalRow to StructLike.

This approach separates concerns. Conversion to the internal representations used by Iceberg classes is handled by the row wrapper, which allows the same Accessor class to be used everywhere. That wrapper class could be used to pass rows to an expression evaluator or to PartitionKey, as in #983.

Copy link
Collaborator

@chenjunjiedada chenjunjiedada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, The Flink row wrapper should be very simple since it uses java native types.

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good to me, it don't need the public BasePartitionKey any more, and simplified the logics about Accessor. , Please merge it first, then I cloud provide the flink partitionKey.
Thanks.

@jeffchao
Copy link

Been following this from afar. +1

@danielcweeks
Copy link
Contributor

+1 LGTM

@danielcweeks danielcweeks merged commit 851c62e into apache:master Jul 13, 2020
rdblue added a commit to rdblue/iceberg that referenced this pull request Jul 29, 2020
* Add PartitionKey to the public API.

* Handle null values.
aokolnychyi pushed a commit to aokolnychyi/iceberg that referenced this pull request Aug 18, 2020
* Add PartitionKey to the public API.

* Handle null values.
cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020
* Add PartitionKey to the public API.

* Handle null values.
rdblue added a commit to rdblue/iceberg that referenced this pull request Aug 19, 2020
* Add PartitionKey to the public API.

* Handle null values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants