Flink: Add wrapper to adapt Row to StructLike #1175

openinx · 2020-07-07T07:56:28Z

This patch abstract the common codes of PartitionKey to the newly introduced class BasePartitionKey, and both spark PartitionKey and flink PartitionKey will extend this base class. I also provide the unit tests for flink PartitionKey.

spark/src/main/java/org/apache/iceberg/spark/source/PartitionKey.java

rdblue · 2020-07-10T00:49:11Z

@openinx, now that the RC for 0.9.0 is out, I should be able to pick back up on Flink reviews tomorrow. I'll probably start with this one since we need to clean this up. Thanks!

rdblue · 2020-07-10T22:19:59Z

@openinx, I opened an alternative to this PR, #1195. Please take a look.

This solution looks fairly clean for producing a PartitionKey for a specific format, but it requires building a subclass of PartitionKey for every row representation as well as new Accessor classes. I'd like to make it possible to reuse the existing PartitionKey class as well as the existing Accessor implementations (produced by Schema.accessorForField(id)) that are currently used for expression evaluation.

The approach I took in the other PR is to reuse the existing accessors, which accept a StructLike. To make that work, I just needed to add a wrapper class that adapts Spark's InternalRow to StructLike, and that converts Spark objects to Iceberg's internal representation. I think that's going to be a better long-term approach than multiple PartitionKey classes.

openinx · 2020-07-14T12:44:57Z

Ping @rdblue for reviewing.

rdblue · 2020-07-14T17:06:47Z

flink/src/main/java/org/apache/iceberg/flink/RowWrapper.java

+  }
+
+  private static PositionalGetter buildGetter(Type type) {
+    if (type instanceof Types.StructType) {


The objects returned by this wrapper need to be Iceberg's internal representation:

int for DateType: number of days from epoch

long for TimeType: number of microseconds from midnight

long for both TimestampType: number of microseconds from epoch

ByteBuffer for both fixed(L) and binary types

BigDecimal for decimal(P, S)

Because we Flink uses the same in-memory representation as Iceberg generics, this should use the same conversions that we use for Record.

Thanks for the details, we discussed about this thing in here, maybe you want to take a look :-) .

Yes, this needs to convert to the representation that internal classes use.

Iceberg's generic data model is intended for passing data to and from Java applications, which is why they use friendlier classes. It is up to data models like Iceberg generics or Flink's data model to convert to that representation. Iceberg core should modify data as little as possible.

rdblue · 2020-07-14T17:11:15Z

flink/src/test/java/org/apache/iceberg/flink/TestPartitionKey.java

+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestPartitionKey {


Can you add a test based on Spark's TestPartitionValues? That tests every supported type, null values, and different column orders.

rdblue · 2020-07-14T17:12:34Z

flink/src/test/java/org/apache/iceberg/flink/TestPartitionKey.java

+    partitionKey2.partition(rowWrapper.wrap(row));
+    Assert.assertEquals(1, partitionKey2.size());
+    Assert.assertEquals(200, (int) partitionKey2.get(0, Integer.class));
+    Assert.assertEquals(partitionKey2.toPath(), "structType.innerIntegerType=200");


Can you split each of these blocks into a separate test case? There are lots of different cases mixed together in this method. Mixing cases together makes it harder to see what is broken when tests fail because you don't get a picture of what is common across failed cases since many of them don't run.

…RUCT

openinx · 2020-07-15T10:03:55Z

Addressed all the comments, Pls take another look, thanks @rdblue .

rdblue · 2020-07-15T20:22:34Z

I reopened this to run tests against master with the CI fix. It's passing tests so I'll merge it. Thanks, @openinx!

openinx mentioned this pull request Jul 7, 2020

Implement the flink stream writer to accept the row data and emit the complete data files event to downstream #1145

Merged

openinx commented Jul 7, 2020

View reviewed changes

spark/src/main/java/org/apache/iceberg/spark/source/PartitionKey.java Outdated Show resolved Hide resolved

openinx mentioned this pull request Jul 9, 2020

Flink: Add the iceberg files committer to collect data files and commit to iceberg table. #1185

Merged

rdblue mentioned this pull request Jul 10, 2020

Add PartitionKey to the public API #1195

Merged

Flink: add flink row PartitionKey.

72730ae

openinx changed the title ~~Generilize the BasePartitionkey to abstract the common codes for spark and flink.~~ Flink: add flink row PartitionKey. Jul 14, 2020

Fix the checkstyle.

4bc4b84

rdblue reviewed Jul 14, 2020

View reviewed changes

rdblue changed the title ~~Flink: add flink row PartitionKey.~~ Flink: Add wrapper to adapt Row to StructLike Jul 14, 2020

openinx added 2 commits July 15, 2020 17:51

More unit tests & Add conversion for DATE, TIME, TIMESTAMP, FIXED, ST…

397acce

…RUCT

Remove the public modifier

6bc452c

rdblue closed this Jul 15, 2020

rdblue reopened this Jul 15, 2020

rdblue merged commit d1b6d16 into apache:master Jul 15, 2020

HotSushi pushed a commit to HotSushi/iceberg that referenced this pull request Jul 23, 2020

Flink: Add wrapper to adapt Row to StructLike (apache#1175)

184eecb

openinx deleted the generalize-partition-key branch August 1, 2020 13:08

cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020

Flink: Add wrapper to adapt Row to StructLike (apache#1175)

be8cd51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flink: Add wrapper to adapt Row to StructLike #1175

Flink: Add wrapper to adapt Row to StructLike #1175

Uh oh!

openinx commented Jul 7, 2020 •

edited

Loading

Uh oh!

Uh oh!

rdblue commented Jul 10, 2020

Uh oh!

rdblue commented Jul 10, 2020

Uh oh!

openinx commented Jul 14, 2020

Uh oh!

rdblue Jul 14, 2020

Uh oh!

openinx Jul 15, 2020

Uh oh!

rdblue Jul 15, 2020

Uh oh!

rdblue Jul 14, 2020

Uh oh!

rdblue Jul 14, 2020

Uh oh!

openinx commented Jul 15, 2020

Uh oh!

rdblue commented Jul 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Flink: Add wrapper to adapt Row to StructLike #1175

Flink: Add wrapper to adapt Row to StructLike #1175

Uh oh!

Conversation

openinx commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rdblue commented Jul 10, 2020

Uh oh!

rdblue commented Jul 10, 2020

Uh oh!

openinx commented Jul 14, 2020

Uh oh!

rdblue Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

openinx Jul 15, 2020

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 15, 2020

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

openinx commented Jul 15, 2020

Uh oh!

rdblue commented Jul 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

openinx commented Jul 7, 2020 •

edited

Loading