Skip to content

Conversation

@Guosmilesmile
Copy link
Contributor

When RewriteDataFiles executes rewrite tasks for a PlannedGroup, if the table is detected to support RowLineage, it rewrites the schema to add ROW_ID and LAST_UPDATED_SEQUENCE_NUMBER. It then reads the newly added ROW_ID and LAST_UPDATED_SEQUENCE_NUMBER fields and writes the lineage information into the merged DataFiles.

This pr is relay on the pr #14148

Comment on lines 126 to 127
return createTable("2");
return createTable("3");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep tests for V2 too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revert this change and only use the table version V3 in lineage test case.

But I have a question, now OperatorTestBase.createTable() use version 2 for default. Should we test the all UT for V3 too ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in another PR. We need to come up with a set of tests which we want to run against multiple table versions to have test coverage for all of the supported spec versions

@Guosmilesmile
Copy link
Contributor Author

Rebase on main since reader #14148 merged.

@Guosmilesmile Guosmilesmile requested a review from pvary September 26, 2025 13:37

protected static Table createTable() {
// only test V2 tables as compaction doesn't support V3 with row lineage
return createTable("2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a look at TestHelpers.V3_AND_ABOVE, then we need to transition these tests to that in the long run.

Maybe the "2" should be an int instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long run, it does need to be changed to TestHelpers.V3_AND_ABOVE, because there are quite a few test cases involved. Moreover, some features of v3 are actually not supported in Flink and may require compatibility handling. Is it possible to open a separate PR to handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that we need to do it in a separate PR. The point here is just to use an int instead of a string to store the version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, have changed now.

@pvary
Copy link
Contributor

pvary commented Sep 26, 2025

@mxm: Could you please take a look?

@Guosmilesmile
Copy link
Contributor Author

Mark ci fail https://github.com/apache/iceberg/actions/runs/18073439985/job/51426507107?pr=14149


TestFlinkTableSink > testIcebergSinkDifferentDAG() > catalogName=testhadoop_basenamespace, baseNamespace=l0.l1, format=PARQUET, isStreaming=false, useV2Sink=true FAILED
    java.lang.RuntimeException: Failed to collect table result
        at org.apache.iceberg.flink.TestBase.sql(TestBase.java:105)
        at org.apache.iceberg.flink.TestFlinkTableSink.testIcebergSinkDifferentDAG(TestFlinkTableSink.java:304)

        Caused by:
        org.apache.flink.table.api.TableException: Failed to wait job finish
            at app//org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:85)
            at app//org.apache.flink.table.api.internal.InsertResultProvider.access$200(InsertResultProvider.java:37)
            at 

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 29, 2025
@Guosmilesmile
Copy link
Contributor Author

Keep it alive

@github-actions github-actions bot removed the stale label Oct 30, 2025
}

@Test
void testRewriteUnpartitionedPreserveLineage() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add tests for TestDataFileRewriteRunner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add a ut for the v3 table in TestDataFileRewriteRunner

@pvary pvary merged commit 4e68ff0 into apache:main Nov 6, 2025
18 checks passed
@pvary
Copy link
Contributor

pvary commented Nov 6, 2025

Merged to main.
Thanks @Guosmilesmile for the progress on V3!
Great to see this moving forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants