Flink: support to RowData partition. #1299

openinx · 2020-08-05T13:13:24Z

As we have decided to change from Row to RowData as internal row to read and write parquet/avro/orc files. So in theory we will update all those FlinkAvroReader, FlinkAvroWriter, FlinkParquetReaders, FlinkParquetWriters, TaskWriterFactory to use RowData.

In this patch, I've introduced a new RowDataWrapper to partition RowData. In future (Once one of the flink RowData readers and writers (parquet/avro/orc) is ready), we will remove the deprecated RowWrapper.

rdblue · 2020-08-08T21:04:05Z

flink/src/main/java/org/apache/iceberg/flink/RowDataWrapper.java

+  private static PositionalGetter<?> buildGetter(LogicalType logicalType, Type type) {
+    switch (type.typeId()) {
+      case STRING:
+        return (row, pos) -> row.getString(pos).toString();


Iceberg requires strings to be CharSequence, not necessarily String. So if you have UTF8 data, you can potentially just wrap it to produce a CharSequence rather than building an immutable JVM string.

Not a blocker, just something to keep in mind for the future.

rdblue · 2020-08-08T21:05:03Z

flink/src/main/java/org/apache/iceberg/flink/RowDataWrapper.java

+        return (row, pos) -> row.getDecimal(pos, decimalType.getPrecision(), decimalType.getScale()).toBigDecimal();
+
+      case TIME:
+        return (row, pos) -> (long) row.getInt(pos);


This needs to be in microseconds, not milliseconds. We should probably include a comment about it as well.

You are right. Flink's time type is milliseconds, here we need microseconds. Will add unit tests to address this bug.

rdblue · 2020-08-08T21:06:05Z

flink/src/main/java/org/apache/iceberg/flink/RowDataWrapper.java

+            LocalZonedTimestampType lzTs = (LocalZonedTimestampType) logicalType;
+            return (row, pos) -> {
+              TimestampData timestampData = row.getTimestamp(pos, lzTs.getPrecision());
+              return timestampData.getMillisecond() * 1000 + Math.floorDiv(timestampData.getNanoOfMillisecond(), 1000);


Nano of millisecond is always positive, right? In that case there is no need for floorDiv.

rdblue · 2020-08-08T21:10:44Z

flink/src/test/java/org/apache/iceberg/flink/TestRowDataPartitionKey.java

+    }
+  }
+
+  private static Object transform(Object value, Type type) {


This has the same time bug as the getter. I think it would be better to avoid deriving expected values and just hard-code them. That makes tests easier to read, and more reliable.

An alternative to hard-coding is to validate against a different object model. For example, you could generate the data with Iceberg generics, convert them to RowData, and then validate that the partitions produced from both object models match. That would catch the time bug and would also allow you to avoid needing to generate random RowData.

rdblue · 2020-08-08T21:13:51Z

Thanks @openinx! It mostly looks good, except for the time bug. I'd also like to update the tests so that they are more likely to catch similar bugs.

…rd and row.

openinx · 2020-08-10T14:57:32Z

flink/src/test/java/org/apache/iceberg/flink/RowDataConverter.java

+      case TIME:
+        // Iceberg's time is in microseconds, while flink's time is in milliseconds.
+        LocalTime localTime = (LocalTime) object;
+        return (int) TimeUnit.NANOSECONDS.toMillis(localTime.toNanoOfDay());


Here, we will truncate the localTime to be milliseconds, so we will erase the microseconds part. That's to say, the partition value will be different between Record and RowData because lost microseconds. Should we disable the TIME type as a partition key when using flink sink connector in case of partition value mismatching ?

The following unit tests indicate the above thing : https://github.com/apache/iceberg/pull/1299/files#diff-97304b05e2faea4a749031f514361a70R193

I don't think it is necessary to disable because any data written by Flink will necessarily be a millisecond-precision value. Partitioning is still correct with respect to the data that was written, because all of the data has millisecond values.

For the same data with time type, if flink write them into an iceberg table A, and hive MR or spark read it, in this case, there should be no problem. But for the same data set, both flink and spark write them into difference tables A and B, then there should be difference between table A and B because of lost microseconds. The differences sounds reasonable because of the different behavior from different compute engines.

rdblue · 2020-08-10T22:46:24Z

Looks good to me. Thanks @openinx!

This was referenced Aug 7, 2020

Flink: Refactor to replace Row type with RowData type in write path. #1305

Closed

Implement the flink stream writer to accept the row data and emit the complete data files event to downstream #1145

Merged

rdblue reviewed Aug 8, 2020

View reviewed changes

openinx added 4 commits August 10, 2020 11:16

Flink: support to RowData partition.

3f833fa

Add bool/long/timestamp type in the unit tests.

a96a4ac

Addressming the time bug and rewrite the unit tests by comparing reco…

2443b70

…rd and row.

Convert Record to RowData.

8500e2b

openinx commented Aug 10, 2020

View reviewed changes

rdblue merged commit 8de0a1a into apache:master Aug 10, 2020

cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020

Flink: Support partitioning RowData (apache#1299)

7d36be5

Flink: support to RowData partition. #1299

Flink: support to RowData partition. #1299

Uh oh!

Conversation

openinx commented Aug 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Aug 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Aug 8, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Aug 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rdblue Aug 8, 2020 •

edited

Loading