Skip to content

Conversation

@stevenzwu
Copy link
Contributor

@stevenzwu stevenzwu commented Jan 8, 2021

…n extend from

@github-actions github-actions bot added the flink label Jan 8, 2021
@stevenzwu
Copy link
Contributor Author

@openinx I haven't switched FlinkTestBase to use HadoopCatalog yet. We probably should do it too. Maybe only leave TestFlinkHiveCatalog to use HiveCatalog and TestHiveMetastore.

import org.junit.runners.Parameterized;

@RunWith(Parameterized.class)
public abstract class TestFlinkReaderDeletesBase extends DeleteReadTests {
Copy link
Contributor Author

@stevenzwu stevenzwu Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new base class is extracted out of the TestFlinkInputFormatReaderDeletes. FLIP-27 source will add a test extending from this base class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good to me to make it to be a separate base unit test class.


protected static final PartitionSpec SPEC = PartitionSpec.builderFor(SCHEMA)
.identity("dt")
.bucket("id", 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLIP-27 source will have a test extending from this base class. Logic for the current source is refactored into the TestFlinkSource class

@openinx openinx self-requested a review January 8, 2021 06:59
@stevenzwu stevenzwu force-pushed the refactorFlinkTest branch 2 times, most recently from 83017d0 to cdd5580 Compare January 25, 2021 21:59
@stevenzwu stevenzwu changed the title (1) refactor Flink source tests so that future FLIP-27 source test ca… refactor Flink source tests so that future FLIP-27 source test ca… Jan 30, 2021
@stevenzwu stevenzwu force-pushed the refactorFlinkTest branch 3 times, most recently from 0b34805 to df8fe18 Compare February 2, 2021 02:28
@openinx
Copy link
Member

openinx commented Feb 2, 2021

@stevenzwu , You mean this PR will be the next one for reviewing which was separated from this big one (#2105) ? For my understanding, this is just an unit tests refactor and we could pull request the next feature PR for reviewing, could just just improve the unit tests when iterating the unified flink source work.

@stevenzwu
Copy link
Contributor Author

stevenzwu commented Feb 2, 2021

@openinx yes, please help review this unit test refactoring next. The main reason is to avoid constant needs of resolving merge conflicts after periodical rebasing. Other FLIP-27 source code and tests are more isolated and less likely to get merge conflicts during the probably long review process.

@stevenzwu stevenzwu force-pushed the refactorFlinkTest branch from f9fda07 to af32922 Compare March 1, 2021 04:37
* because the from RowData may contains additional column for position deletes.
* Using {@link RowDataSerializer#copy(RowData, RowData)} will fail the arity check.
*/
public static RowData clone(RowData from, RowData reuse, RowType rowType, TypeSerializer[] fieldSerializers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the FLIP-27 use this method ? How did they construct their TypeSerializer ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, this clone methods is really not friendly for developers to use. If we really need to introduce a copy with checking the length, then how about making this to be private and expose an more easy method to public :

public static RowData clone(RowData from, RowType rowType)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used here: https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/main/java/org/apache/iceberg/flink/source/reader/RowDataIteratorBulkFormat.java#L116

The main reason for adding TypeSerializer to the RowDataUtil#clone() method is to avoid constructing it for each clone call. In the constructor of RowDataIteratorBulkFormat, we construct TypeSerializer once from RowType.

Copy link
Member

@openinx openinx Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK , seems we're clone the rowData iterately, I saw the FieldGetter will be created for each row here, that should also not be the expected behavior. We may need to introduce an new RowDataCloner which will initialize all its TypeSerializer & FieldGetter once in instance constructor, when iterating the RowData we will just clone row by row don't need to create any extra instances.

Users don't have to interact with the internal TypeSerializer, they could just use the RowDataCloner.

Copy link
Member

@openinx openinx Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do the refactor when we review the RowDataIteratorBulkFormat.

helper.appendToTable(RandomGenericData.generate(TestFixtures.SCHEMA, 1, 0L));

TestHelpers.assertRecords(
runWithOptions(ImmutableMap.<String, String>builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here we could just use:

ImmutableMap.of("snapshot-id", Long.toString(snapshotId);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx. updated

.build()),
expectedRecords, TestFixtures.SCHEMA);
TestHelpers.assertRecords(
runWithOptions(ImmutableMap.<String, String>builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx. updated


protected abstract List<Row> run(FlinkSource.Builder formatBuilder, Map<String, String> sqlOptions, String sqlFilter,
String... sqlSelectedFields) throws IOException;
protected abstract List<Row> runWithProjection(String... projected) throws Exception;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the implementation for those four methods in FLIP-27 ? Looks like we are just filling options in TestFlinkSource, will the FLIP-27 have those different implementations ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openinx openinx changed the title refactor Flink source tests so that future FLIP-27 source test ca… Flink: Refactor flink source tests for FLIP-27' unified source. Mar 2, 2021
@openinx openinx changed the title Flink: Refactor flink source tests for FLIP-27' unified source. Flink: Refactor flink source tests for FLIP-27 unified source. Mar 2, 2021
@openinx
Copy link
Member

openinx commented Mar 2, 2021

@stevenzwu PR looks good to me overall, just left few comments. Sorry for the delay ( a bit busy for our internal things), thanks for the great work !

@stevenzwu stevenzwu force-pushed the refactorFlinkTest branch from af32922 to b15a3ba Compare March 4, 2021 18:55
Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , got this merged. Thanks @stevenzwu for contributing

@openinx openinx merged commit 343104c into apache:master Mar 5, 2021
stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants