Flink: Refactor flink source tests for FLIP-27 unified source. #2047

stevenzwu · 2021-01-08T00:06:54Z

…n extend from

stevenzwu · 2021-01-08T00:35:35Z

@openinx I haven't switched FlinkTestBase to use HadoopCatalog yet. We probably should do it too. Maybe only leave TestFlinkHiveCatalog to use HiveCatalog and TestHiveMetastore.

stevenzwu · 2021-01-08T03:24:24Z

flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkReaderDeletesBase.java

+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public abstract class TestFlinkReaderDeletesBase extends DeleteReadTests {


This new base class is extracted out of the TestFlinkInputFormatReaderDeletes. FLIP-27 source will add a test extending from this base class.

OK, sounds good to me to make it to be a separate base unit test class.

stevenzwu · 2021-01-08T03:26:33Z

flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkScan.java


  protected static final PartitionSpec SPEC = PartitionSpec.builderFor(SCHEMA)
          .identity("dt")
          .bucket("id", 1)


FLIP-27 source will have a test extending from this base class. Logic for the current source is refactored into the TestFlinkSource class

openinx · 2021-02-02T08:06:26Z

@stevenzwu , You mean this PR will be the next one for reviewing which was separated from this big one (#2105) ? For my understanding, this is just an unit tests refactor and we could pull request the next feature PR for reviewing, could just just improve the unit tests when iterating the unified flink source work.

stevenzwu · 2021-02-02T16:49:09Z

@openinx yes, please help review this unit test refactoring next. The main reason is to avoid constant needs of resolving merge conflicts after periodical rebasing. Other FLIP-27 source code and tests are more isolated and less likely to get merge conflicts during the probably long review process.

openinx · 2021-03-02T12:15:33Z

flink/src/main/java/org/apache/iceberg/flink/data/RowDataUtil.java

+   * because the from RowData may contains additional column for position deletes.
+   * Using {@link RowDataSerializer#copy(RowData, RowData)} will fail the arity check.
+   */
+  public static RowData clone(RowData from, RowData reuse, RowType rowType, TypeSerializer[] fieldSerializers) {


How the FLIP-27 use this method ? How did they construct their TypeSerializer ?

For my understanding, this clone methods is really not friendly for developers to use. If we really need to introduce a copy with checking the length, then how about making this to be private and expose an more easy method to public :

public static RowData clone(RowData from, RowType rowType)

It is used here: https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/main/java/org/apache/iceberg/flink/source/reader/RowDataIteratorBulkFormat.java#L116

The main reason for adding TypeSerializer to the RowDataUtil#clone() method is to avoid constructing it for each clone call. In the constructor of RowDataIteratorBulkFormat, we construct TypeSerializer once from RowType.

OK , seems we're clone the rowData iterately, I saw the FieldGetter will be created for each row here, that should also not be the expected behavior. We may need to introduce an new RowDataCloner which will initialize all its TypeSerializer & FieldGetter once in instance constructor, when iterating the RowData we will just clone row by row don't need to create any extra instances.

Users don't have to interact with the internal TypeSerializer, they could just use the RowDataCloner.

I think we could do the refactor when we review the RowDataIteratorBulkFormat.

openinx · 2021-03-02T13:18:05Z

flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkScan.java

+    helper.appendToTable(RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L));
+
+    TestHelpers.assertRecords(
+        runWithOptions(ImmutableMap.<String, String>builder()


nit: here we could just use:

ImmutableMap.of("snapshot-id", Long.toString(snapshotId);

thx. updated

openinx · 2021-03-02T13:18:20Z

flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkScan.java

+            .build()),
+        expectedRecords, TestFixtures.SCHEMA);
+    TestHelpers.assertRecords(
+        runWithOptions(ImmutableMap.<String, String>builder()


nit: ditto.

thx. updated

openinx · 2021-03-02T13:27:59Z

flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkScan.java


-  protected abstract List<Row> run(FlinkSource.Builder formatBuilder, Map<String, String> sqlOptions, String sqlFilter,
-                                   String... sqlSelectedFields) throws IOException;
+  protected abstract List<Row> runWithProjection(String... projected) throws Exception;


What's the implementation for those four methods in FLIP-27 ? Looks like we are just filling options in TestFlinkSource, will the FLIP-27 have those different implementations ?

The main difference is how they call the private/protected run method.

Current source: uses FlinkSource#Builder for everything
https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkSource.java#L49

FLIP-27 source: just construct and pass long the ScanContext
https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBounded.java#L69

openinx · 2021-03-02T13:30:47Z

@stevenzwu PR looks good to me overall, just left few comments. Sorry for the delay ( a bit busy for our internal things), thanks for the great work !

…tend from

openinx

LGTM , got this merged. Thanks @stevenzwu for contributing

…e#2047)

github-actions bot added the flink label Jan 8, 2021

stevenzwu commented Jan 8, 2021

View reviewed changes

openinx self-requested a review January 8, 2021 06:59

stevenzwu mentioned this pull request Jan 9, 2021

Flink: Upgrade version from 1.11.0 to 1.12.1 #1956

Merged

stevenzwu force-pushed the refactorFlinkTest branch from 9b33ace to 225ddfc Compare January 17, 2021 00:28

stevenzwu mentioned this pull request Jan 18, 2021

Flink: Initial implementation of Flink source with the new FLIP-27 source interface #2105

Closed

stevenzwu force-pushed the refactorFlinkTest branch 2 times, most recently from 83017d0 to cdd5580 Compare January 25, 2021 21:59

stevenzwu changed the title ~~(1) refactor Flink source tests so that future FLIP-27 source test ca…~~ refactor Flink source tests so that future FLIP-27 source test ca… Jan 30, 2021

stevenzwu force-pushed the refactorFlinkTest branch 3 times, most recently from 0b34805 to df8fe18 Compare February 2, 2021 02:28

stevenzwu force-pushed the refactorFlinkTest branch from df8fe18 to f9fda07 Compare February 2, 2021 15:49

stevenzwu force-pushed the refactorFlinkTest branch from f9fda07 to af32922 Compare March 1, 2021 04:37

openinx reviewed Mar 2, 2021

View reviewed changes

openinx changed the title ~~refactor Flink source tests so that future FLIP-27 source test ca…~~ Flink: Refactor flink source tests for FLIP-27' unified source. Mar 2, 2021

openinx changed the title ~~Flink: Refactor flink source tests for FLIP-27' unified source.~~ Flink: Refactor flink source tests for FLIP-27 unified source. Mar 2, 2021

Steven Wu and others added 2 commits March 4, 2021 10:54

refactor Flink source tests so that future FLIP-27 source test can ex…

43e767d

…tend from

switch to simpler ImmutableMap.of per comment

b15a3ba

stevenzwu force-pushed the refactorFlinkTest branch from af32922 to b15a3ba Compare March 4, 2021 18:55

openinx approved these changes Mar 5, 2021

View reviewed changes

openinx merged commit 343104c into apache:master Mar 5, 2021

stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021

Flink: Refactor flink source tests for FLIP-27 unified source. (apach…

676d259

…e#2047)

Flink: Refactor flink source tests for FLIP-27 unified source. #2047

Flink: Refactor flink source tests for FLIP-27 unified source. #2047

Uh oh!

Conversation

stevenzwu commented Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevenzwu commented Jan 8, 2021

Uh oh!

stevenzwu Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx commented Feb 2, 2021

Uh oh!

stevenzwu commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx Mar 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx Mar 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx commented Mar 2, 2021

Uh oh!

openinx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stevenzwu commented Jan 8, 2021 •

edited

Loading

stevenzwu Jan 8, 2021 •

edited

Loading

stevenzwu commented Feb 2, 2021 •

edited

Loading

openinx Mar 5, 2021 •

edited

Loading

openinx Mar 5, 2021 •

edited

Loading