Build: Flink CI work against both flink 1.12.5 & 1.13.2 #3364

openinx · 2021-10-25T07:47:09Z

This PR is trying to make the flink CI work against both flink 1.12.5 & 1.13.2, related issue: #3183

Currently, the same flink module can works fine with flink 1.12.5 & 1.13.2 after merged the PR: #3116, so we don't have to separate different modules for flink1.12.x and flink1.13.x. In fact, the flink approach is more similar to the hive approach in current iceberg repo, I mean we use the same module to work against both hive2 and hive3.

openinx · 2021-10-25T08:00:36Z

flink/build.gradle


 project(':iceberg-flink') {
+  project.ext {
+    flinkVersion = System.getProperty("flinkVersions") != null ? System.getProperty("flinkVersions") : ""


Actually, I'm a little confused about the system properties flinkVersions , sparkVersions, hiveVersions. Let's take the spark build as an example: Currently, the candidate values for sparkVersions are 2.4,3.0,3.2. So when building the spark tests, it will add the build.gradle from spark2.4 module, spark3.0 module and spark3.2 module. But for the spark3.0 module, we will need to build the test against both spark3.0.x and spark3.1.x. Then will we introduce another similar system property sparkBuild like @jackye1995 proposed in #3237 ?

I'm thinking why don't we assign a the specific engine version MAJOR.MINOR.BUGFIX (rather than the MAJRO.MINORversion list) to the flinkVersion (rather than the flinkVersions) to build all the version's unit cases. In fact, in the flink-ci.yml, the -DflinkVersions=${{ matrix.flink }} is also assigning one single value (matrix.flink is a single value rather than a version list, right ?) to the flinkVersions.

when building the spark tests, it will add the build.gradle from spark2.4 module, spark3.0 module and spark3.2 module. But for the spark3.0 module, we will need to build the test against both spark3.0.x and spark3.1.x

Not anymore. We're going to add a Spark 3.1 build to test directly rather than reusing the same module for different versions.

I'm thinking why don't we assign a the specific engine version MAJOR.MINOR.BUGFIX (rather than the MAJRO.MINORversion list)

This is assuming that bugfix versions are compatible with one another, so we only need one build for all of them. I really doubt that we want to test against all patch/bugfix versions. That's way more effort to find problems in the upstream project and not worth the time it would take in every Iceberg PR.

rdblue · 2021-10-25T15:44:38Z

.github/workflows/flink-ci.yml

      matrix:
        jvm: [8, 11]
-        flink: ['1.13']
+        flink: ['1.12.5', '1.13.2']


As I said in another comment, these should be minor versions only. The patch/bugfix version is the specific dependency that we built against, but we don't want to parameterize here.

rdblue · 2021-10-25T15:48:41Z

flink/build.gradle

+    compileOnly "org.apache.flink:flink-streaming-java_2.12:${flinkVersion}:tests"
+    compileOnly "org.apache.flink:flink-table-api-java-bridge_2.12:${flinkVersion}"
+    compileOnly "org.apache.flink:flink-table-planner-blink_2.12:${flinkVersion}"
+    compileOnly "org.apache.flink:flink-table-planner_2.12:${flinkVersion}"


The flinkVersions property is used only to determine which modules to add to the build. We don't want to use it to set the Flink version because it could be a list to add more than one set of Flink modules in the future.

If you want to use the flinkVersions property, then I think we should add multiple Flink modules, for 1.12, 1.13, 1.14, etc. I'm not sure how we want to manage those. For Spark, we are choosing to copy the code from one version to the next so they are independent. That would make sense for Flink as well. And because Flink only has 2 supported versions at a time, it wouldn't be that much work to maintain.

I also think this would make sense, as there are certain interfaces in 1.13 that implement both the new and the old interface. But when the old interface is implemented, it is preferred.

Specifically, for Flink, I am thinking of FlinkCatalogFactory. When upgrading from 1.12 to 1.13, we had to implement the old 1.12 interface (the interface as of 1.13 mixes in the "new" interface and maintains the deprecated interface for compatibility reasons).

However, it doesn't allow users the ability to choose which path / implementation they'd like to follow. So we had to make a choice to either continue to support 1.12 or to not yet engage in the richer feature support brought by later versions.

If we had modules split by at least major version (1.12, 1.13), then we could implement these small but important difference differently for each version.

then I think we should add multiple Flink modules, for 1.12, 1.13, 1.14, etc. I'm not sure how we want to manage those. For Spark, we are choosing to copy the code from one version to the next so they are independent. That would make sense for Flink as well. And because Flink only has 2 supported versions at a time, it wouldn't be that much work to maintain.

I agree it's more friendly for iceberg users to maintain iceberg integration work against multiple engine versions, but I doubt our current approach to accomplish the goal. The current spark3.0.x and spark3.1.x integration are sharing all the code but we are trying to copy the integration code into a different module for spark3.1.x, this kind of code splitting indeed isolate the integration difference between different MAJOR.MINOR spark versions. But it's really tried to copy the newly introduced feature for every versions. Take the Core: Enable ORC delete writer in the v2 write path. feature as an example, we need to enable the ORC delete writer for spark2.4, spark3.0, spark3.2, maybe spark3.1 in the future.

The iceberg integration work for flink 1.12 & flink1.13 are also sharing the same code now. Since we can use some methods to shield the subtle differences between different minor versions, is it necessary for us to copy the same code to complete the test build of the minor version?

For different flink minor versions, I would recommend reusing existing modules as much as possible, until a minor version we have to use copy code to implement the development work based on subsequent flink new interfaces.

I think we have some options here other than copying the code like Spark is doing. What about creating modules for each Flink version and adding a shared source directory and a version-specific source directory?

flink/src/main/java // <-- common code flink/v1.12/src/main/java // <-- 1.12 code flink/v1.13/src/main/java // <-- 1.13 code

Then in flink/build.gradle, create iceberg-flink-1.12, iceberg-flink-1.13, and the runtime modules. The main modules have 2 source directories, flink/src/main/java and the version-specific one. Then as the code changes you can move files around. For example, to use a new interface in 1.13, you'd move the code from common into 1.12 and implement the new interface specifically in 1.13.

Would that work, @openinx?

Let's try to update the PR.

This reverts commit 45533fa.

rdblue · 2021-10-31T18:29:39Z

flink/v1.13/build.gradle

+}
+
+project(':iceberg-flink:iceberg-flink-1.13') {
+  apply plugin: 'scala'


Was this accidentally copied from Spark? The build.gradle for Flink doesn't currently use Scala.

I diffed the builds and everything looks fine except for this. I think it was probably an accident so I removed it to get the CI to run without it.

rdblue · 2021-10-31T18:40:44Z

Looks like all of the Flink tests complete in about a minute. The problem is that the iceberg-flink module is no longer tested by the Flink CI workflow. In settings.gradle, it is now always included in the project. I have a quick fix for it.

rdblue · 2021-10-31T19:28:02Z

CI is passing, so I'll merge this. Thanks @openinx! Great to have both Flink 1.12 and 1.13 working!

openinx · 2021-11-01T03:02:54Z

Thanks @rdblue for the changes !

Build: Flink CI work against on both flink 1.12.5 & 1.13.2

45533fa

github-actions bot added build flink INFRA labels Oct 25, 2021

openinx commented Oct 25, 2021

View reviewed changes

openinx requested a review from rdblue October 25, 2021 08:10

rdblue reviewed Oct 25, 2021

View reviewed changes

openinx added 7 commits October 27, 2021 10:27

Revert "Build: Flink CI work against on both flink 1.12.5 & 1.13.2"

ea7d84f

This reverts commit 45533fa.

Separate the build process for flink1.12 & flink1.13

5359dc1

Temp

43dc6cd

Basic Build

776aae3

Add iceberg-flink-runtime

ef86459

Address the CI issues

67e0e13

Addressing the CI

b1ae5f1

rdblue reviewed Oct 31, 2021

View reviewed changes

Remove unnecessary Scala plugin addition.

4a00f56

rdblue added 2 commits October 31, 2021 11:42

Add Flink to the build if 1.12 or 1.13 is enabled.

73b7814

Add iceberg-flink tests to Flink CI workflow

17f27ce

rdblue merged commit 7f520b0 into apache:master Oct 31, 2021

This was referenced Nov 1, 2021

Flink: Fix compatibiliy issue between 1.12.x and 1.13.x #3354

Merged

Flink: FLIP-27 source split and reader #2305

Closed

Flink: Run unit tests for different versions. #3445

Closed

Build: Flink CI work against both flink 1.12.5 & 1.13.2 #3364

Build: Flink CI work against both flink 1.12.5 & 1.13.2 #3364

Uh oh!

Conversation

openinx commented Oct 25, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx Oct 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Oct 31, 2021

Uh oh!

rdblue commented Oct 31, 2021

Uh oh!

openinx commented Nov 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openinx Oct 25, 2021 •

edited

Loading