Flink: Integrate Iceberg catalog to Flink catalog #1182

JingsongLi · 2020-07-08T06:12:41Z

Like Spark 3, Flink also has Catalog interface, we can integrate Iceberg catalog to Flink catalog, iceberg as a Flink catalog, users can use Flink DDLs to manipulate iceberg metadata. And query iceberg tables directly.

The mapping between Flink database and Iceberg namespace:

Supplying a base namespace for a given catalog, so if you have a catalog that supports a 2-level namespace, you would supply the first level in the catalog configuration and the second level would be exposed as Flink databases.
The Iceberg table manages its partitions by itself. The partition of the Iceberg table is independent of the partition of Flink.

This PR solve #1170

This PR depends on:

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

rdblue · 2020-07-10T22:43:48Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+  }
+
+  /**
+   * TODO Implement DDL-string parser for PartitionSpec.


Can we add partitioning to the Flink DDL parser instead? That seems like a more appropriate place for it.

Otherwise, I'd recommend just using the PartitionSpecParser.fromJson method.

I prefer that adding partitioning to the Flink DDL parser. I'll modify the comments.
Using PartitionSpecParser.fromJson looks very difficult to use.

After some discussions with Flink developers, we can map Iceberg Partition Transform to Flink Computed Column and Partition. We can support it in future.

PartitionSpecParser.fromJson is how we serialize partition specs internally. It isn't great to expose it directly to users, but would at least make it possible to configure partitioning. If you have a different approach, that is much better!

How would the computed column and partition approach work?

A rough idea, Flink support computed column: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table

Flink DDL CREATE TABLE T (pk INT, ... dt STRING, year AS YEAR(dt), month AS MONTH(dt), d AS DAY(dt)) PARTITIONED BY(year, month, d) should be same to Spark DDL CREATE TABLE T (pk INT, ... dt STRING) PARTITIONED BY(YEAR(dt), MONTH(dt), DAY(dt)).

The computed columns are not stored in the real data, they are just virtual columns, which means we can map they to iceberg partition transforms of iceberg table in iceberg Flink Catalog.

Good idea, but there are a couple of things to watch out for:

Where possible, we avoid exposing the actual partition values, in order to maintain a separation between logical queries and physical layout. That way, the physical layout can change, but the logical queries will continue to work. In this case, we would need to make sure that the computed columns are tracked separately so that we don't drop the day column when the table gets converted to partitioning by hour.

Year, month, and day are functions with concrete behavior for Flink SQL, and Iceberg's partitioning may not align with that behavior. So we probably would not want to supply the data for those columns using Iceberg partition values. Instead, I think we should derive them from the dt field.

Good points.

For Flink SQL, computed columns are virtual columns, the source and sink can just ignore them, the source just produces columns without computed columns, the Flink core will generate computed columns for input records. For sink, Flink core just give the records without computed columns to connector sink.

I see, you mean https://iceberg.apache.org/evolution/#partition-evolution , the computed columns should be calculated by Flink core, iceberg should just deal with its physical logical.

There are three types of function: 1.hour,day,month,year are the same as Flink's functions. 2. For truncate, Flink also supports this function, but not support truncate with input type string and bytes, iceberg can provides catalog function (Catalog.getFunction), users can use iceberg_catalog.truncate to create computed column. 3. For bucket, Flink not support this function, so iceberg can provides catalog functions, users can directly use it.

rdblue · 2020-07-10T22:47:20Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+  @Override
+  public void alterPartition(
+      ObjectPath tablePath, CatalogPartitionSpec partitionSpec, CatalogPartition newPartition, boolean ignoreIfNotExists
+  ) throws CatalogException {


We prefer two options for formatting argument lists. Either aligned with the first argument:

public void alterPartition(ObjectPath tablePath, CatalogPartitionSpec partitionSpec, CatalogPartition newPartition, boolean ignoreIfNotExists) throws CatalogException { ... }

Or, indented by 2 indents (4 spaces) and aligned with that position:

public void alterPartition( ObjectPath tablePath, CatalogPartitionSpec partitionSpec, CatalogPartition newPartition, boolean ignoreIfNotExists) throws CatalogException { ... }

throws can be on the next line, indented to the same place.

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java

rdblue · 2020-07-10T22:52:13Z

flink/src/main/java/org/apache/iceberg/flink/FlinkSchemaUtil.java

+/**
+ * Converter between Flink types and Iceberg type.
+ * The conversion is not a 1:1 mapping that not allows back-and-forth conversion. So some information might get lost
+ * during the back-and-forth conversion.


Can you be more specific about this? What is a case where information is lost?

If I understand correctly, this is lossy because Iceberg doesn't represent some types that Flink supports, like CHAR(N). Is that right?

Iceberg to Flink: will loss UUID.
Flink to Iceberg: will loss precisions.

rdblue · 2020-07-10T22:54:12Z

flink/src/main/java/org/apache/iceberg/flink/TypeToFlinkType.java

+        return new VarCharType(VarCharType.MAX_LENGTH);
+      case UUID:
+        // UUID length is 16
+        return new CharType(16);


Char? Wouldn't this be fixed-length binary?

I thought UUID should be a Char with 36 precision because:

In Spark, UUID function returns StringType.

In Flink, UUID function returns CharType with 36 precision.

But you are right, in Orc and Parquet, UUID just be treated as a fixed-length binary.

I think either CHAR(36) or VARBINARY(16) would work, but not CHAR(16).

I choose fixed-length binary(16).

rdblue · 2020-07-10T22:56:51Z

flink/src/test/java/org/apache/iceberg/flink/FlinkCatalogTestBase.java

+    this.icebergNamespace = Namespace.of(ArrayUtils.concat(baseNamespace, new String[] { DATABASE }));
+  }
+
+  @After


Won't this close the catalog after every test method?

Yes, every test method will create a new catalog too.
But it seems we can reuse them by catalog name.

rdblue · 2020-07-10T22:59:15Z

flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java

+
+  @Test
+  public void testDropNonEmptyNamespace() {
+    Assume.assumeFalse("Hadoop catalog throws IOException: Directory is not empty.", isHadoopCatalog);


This sounds like a bug in the Hadoop catalog. Can we fix it instead of ignoring this test case?

I'll modify in this PR. Tell me if I need create a new PR for fixing hadoop catalog.

I think it would be better to fix the Hadoop catalog in a separate PR and leave this one with the Assume until it is merged.

I'll create it.

flink/src/test/java/org/apache/iceberg/flink/FlinkCatalogTestBase.java

flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java

rdblue · 2020-07-10T23:02:54Z

flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java

+
+    Assert.assertTrue("Namespace should exist", validationNamespaceCatalog.namespaceExists(icebergNamespace));
+
+    Assert.assertEquals("Should not list any tables", 0, tEnv.listTables().length);


Should this call SHOW TABLES?

In Flink 1.10, not support DDL SHOW TABLES. It is supported in 1.11.

flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java

rdblue · 2020-07-10T23:07:33Z

flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogDatabase.java

+
+  @Test
+  public void testCreateNamespaceWithLocation() throws Exception {
+    Assume.assumeFalse("HadoopCatalog does not support namespace locations", isHadoopCatalog);


Do we need a test to validate that the CREATE DATABASE statement fails for Hadoop?

rdblue · 2020-07-10T23:11:45Z

Thanks @JingsongLi, this looks close. I just had a few questions.

JingsongLi · 2020-07-13T05:44:04Z

Thanks @rdblue for your review, I have addressed your comments.

JingsongLi · 2020-07-13T06:48:02Z

I updated the branch, but the changes were not synchronized to this PR. It seems something wrong in github...

rdblue · 2020-07-13T22:36:51Z

@JingsongLi, I think this needs to be rebased now that #1180 is in.

Also, should we get #1174 updated for the comments here so we can merge them separately? If we can, I'd prefer to make the commits smaller.

JingsongLi · 2020-07-14T02:36:11Z

@JingsongLi, I think this needs to be rebased now that #1180 is in.

Also, should we get #1174 updated for the comments here so we can merge them separately? If we can, I'd prefer to make the commits smaller.

Yes, I think we can, I'll update #1174 , create PR for HadoopCatalog bug, create PR for Flink 1.11..

rdblue · 2020-07-20T21:51:40Z

flink/src/test/java/org/apache/iceberg/flink/FlinkTestBase.java

+  protected static ConcurrentMap<String, Catalog> flinkCatalogs;
+
+  @BeforeClass
+  public static void startMetastoreAndSpark() {


Nit: the method names weren't updated.

rdblue · 2020-07-20T21:53:07Z

Thanks for the updates, @JingsongLi! This looks good to me. I'll merge it.

JingsongLi · 2020-07-21T02:01:38Z

Thanks @rdblue for your patient review~

rdblue added this to the Flink Sink milestone Jul 9, 2020