Flink: Add unit test to write CDC events by SQL. #1978

openinx · 2020-12-23T14:49:48Z

To verify that flink sql could write cdc/upsert events to iceberg table correctly, I made few changes to let it work:

Mapping the flink CREATE TABLE DDL's primary key to iceberg's equality field columns, that means when creating a flink table with primary key, it will create an iceberg table with the equality.field.columns properties indicating which columns are equality columns. Apache Flink does not support altering primary key in DDL, so we don't support alter equality field columns in SQL, will need to use the iceberg java API to accomplish schema evolution.
I exposed the iceberg format v2 to end users in this patch because I found that the iceberg catalog has no way to share the TableTestBase. It's not the correct time to expose format v2, so I will try to find another way to make the unit tests work.
Provide a basic unit test to verify the sql job work, it will need more work to add more unit tests, still in-progress.

rdblue · 2020-12-28T18:26:52Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

-    }
+  private static List<String> toEqualityColumns(TableSchema schema) {
+    List<String> equalityColumns = Lists.newArrayList();
+    schema.getPrimaryKey().ifPresent(uniqueConstraint -> equalityColumns.addAll(uniqueConstraint.getColumns()));


It seems like we should track primary key columns in the table format rather than in properties. If we are going to add primary key columns, then we should add it before v2 so that v2 writers are required to not drop it.

rdblue · 2020-12-28T18:34:21Z

flink/src/test/java/org/apache/iceberg/flink/TestFlinkTableSinkV2.java

+  }
+
+  @Test
+  public void testSqlChangeLogOnIdKey() throws Exception {


These tests look the same as the non-SQL tests. Is there a way to set up the test and expected data for both?

I read the source code of the flink cdc test class and found that it used docker to start a mysql service to get the cdc data. It seems that there really is no good way to get the cdc data in flink.

rdblue · 2020-12-28T18:52:18Z

flink/src/test/java/org/apache/iceberg/flink/TestFlinkTableSinkV2.java

+                                List<List<Row>> inputRowsPerCheckpoint,
+                                List<List<Record>> expectedRecordsPerCheckpoint) throws Exception {
+    String partitionByCause = partitioned ? "PARTITIONED BY (data)" : "";
+    sql("CREATE TABLE %s (id INT, data VARCHAR, PRIMARY KEY(%s) NOT ENFORCED) %s WITH %s",


Instead of creating v2 tables using the table property, I think it would be better to load the table just after creation and update it using the API. That way you can test with a v2 table, but we don't need to change any of the core classes or expose the format before it is finalized.

OK, I saw there's an upgradeToFormatVersion API to do the upgrade in TestSparkReaderDeletes.java, I can use the similar way.

rdblue · 2020-12-28T18:53:44Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

-    if (schema.getPrimaryKey().isPresent()) {
-      throw new UnsupportedOperationException("Creating table with primary key is not supported yet.");
-    }
+  private static List<String> toEqualityColumns(TableSchema schema) {


In the format, I think that equality columns should be tracked by ID rather than by name so that renaming columns doesn't break the primary key metadata.

Yes, it should track ID here. I was thinking that it could be a temporary solution before introducing the primary key specification. Considering the renaming issue, it seems not a good idea to track column names in iceberg properties.

I will create another PR to introduce the primary key specification based on the discussion from here.

rdblue · 2020-12-28T18:54:32Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    List<String> equalityFieldColumns = toEqualityColumns(table.getSchema());
+    if (!equalityFieldColumns.isEmpty()) {
+      properties.put(TableProperties.EQUALITY_FIELD_COLUMNS,
+          org.apache.commons.lang.StringUtils.join(equalityFieldColumns, ","));


We try to avoid using StringUtils and other commons classes because we don't want to leak the dependency. Could you use Guava's Joiner instead?

Sounds great. I did not realize that there's a Joiner in Guava. Thanks.

rdblue · 2020-12-28T18:54:54Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    String concatColumns = PropertyUtil.propertyAsString(table.properties(),
+        TableProperties.EQUALITY_FIELD_COLUMNS,
+        TableProperties.DEFAULT_EQUALITY_FIELD_COLUMNS);
+    String[] columns = org.apache.commons.lang3.StringUtils.split(concatColumns, ",");


Same here. Could you use Splitter instead?

zhangjun0x01 · 2021-02-08T10:05:49Z

Now,IcebergTableSink only implements AppendStreamTableSink, not implements the UpsertStreamTableSink, but in flink 1.12, the two interface have deprecated, if we want to use flink sql write the CDC data to iceberg, I think we need to refactor the IcebergTableSink with new interface DynamicTableSink first,I have did part of this refactor work, and after it is completed, I will open a pr

openinx · 2021-09-07T08:32:49Z

Closed via #2410.

github-actions bot added core flink labels Dec 23, 2020

openinx self-assigned this Dec 23, 2020

openinx marked this pull request as draft December 23, 2020 14:56

openinx marked this pull request as ready for review December 25, 2020 13:48

openinx added 3 commits December 28, 2020 10:58

Flink: Support primary key in DDL.

6a40994

Add TestFlinkTableSinkV2.

a21eb7a

Refactor TestFlinkTableSinkV2.

c5f0019

rdblue reviewed Dec 28, 2020

View reviewed changes

Addressing comments.

faefce7

openinx closed this Sep 7, 2021

openinx deleted the flink-sql-pk branch September 7, 2021 08:35

Flink: Add unit test to write CDC events by SQL. #1978

Flink: Add unit test to write CDC events by SQL. #1978

Uh oh!

Conversation

openinx commented Dec 23, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangjun0x01 Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangjun0x01 commented Feb 8, 2021

Uh oh!

openinx commented Sep 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangjun0x01 Feb 8, 2021 •

edited

Loading