-
Notifications
You must be signed in to change notification settings - Fork 3k
Build: Flink CI work against both flink 1.12.5 & 1.13.2 #3364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
flink/build.gradle
Outdated
|
|
||
| project(':iceberg-flink') { | ||
| project.ext { | ||
| flinkVersion = System.getProperty("flinkVersions") != null ? System.getProperty("flinkVersions") : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm a little confused about the system properties flinkVersions , sparkVersions, hiveVersions. Let's take the spark build as an example: Currently, the candidate values for sparkVersions are 2.4,3.0,3.2. So when building the spark tests, it will add the build.gradle from spark2.4 module, spark3.0 module and spark3.2 module. But for the spark3.0 module, we will need to build the test against both spark3.0.x and spark3.1.x. Then will we introduce another similar system property sparkBuild like @jackye1995 proposed in #3237 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking why don't we assign a the specific engine version MAJOR.MINOR.BUGFIX (rather than the MAJRO.MINORversion list) to the flinkVersion (rather than the flinkVersions) to build all the version's unit cases. In fact, in the flink-ci.yml, the -DflinkVersions=${{ matrix.flink }} is also assigning one single value (matrix.flink is a single value rather than a version list, right ?) to the flinkVersions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when building the spark tests, it will add the build.gradle from spark2.4 module, spark3.0 module and spark3.2 module. But for the spark3.0 module, we will need to build the test against both spark3.0.x and spark3.1.x
Not anymore. We're going to add a Spark 3.1 build to test directly rather than reusing the same module for different versions.
I'm thinking why don't we assign a the specific engine version MAJOR.MINOR.BUGFIX (rather than the MAJRO.MINORversion list)
This is assuming that bugfix versions are compatible with one another, so we only need one build for all of them. I really doubt that we want to test against all patch/bugfix versions. That's way more effort to find problems in the upstream project and not worth the time it would take in every Iceberg PR.
.github/workflows/flink-ci.yml
Outdated
| matrix: | ||
| jvm: [8, 11] | ||
| flink: ['1.13'] | ||
| flink: ['1.12.5', '1.13.2'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said in another comment, these should be minor versions only. The patch/bugfix version is the specific dependency that we built against, but we don't want to parameterize here.
flink/build.gradle
Outdated
| compileOnly "org.apache.flink:flink-streaming-java_2.12:${flinkVersion}:tests" | ||
| compileOnly "org.apache.flink:flink-table-api-java-bridge_2.12:${flinkVersion}" | ||
| compileOnly "org.apache.flink:flink-table-planner-blink_2.12:${flinkVersion}" | ||
| compileOnly "org.apache.flink:flink-table-planner_2.12:${flinkVersion}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flinkVersions property is used only to determine which modules to add to the build. We don't want to use it to set the Flink version because it could be a list to add more than one set of Flink modules in the future.
If you want to use the flinkVersions property, then I think we should add multiple Flink modules, for 1.12, 1.13, 1.14, etc. I'm not sure how we want to manage those. For Spark, we are choosing to copy the code from one version to the next so they are independent. That would make sense for Flink as well. And because Flink only has 2 supported versions at a time, it wouldn't be that much work to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this would make sense, as there are certain interfaces in 1.13 that implement both the new and the old interface. But when the old interface is implemented, it is preferred.
Specifically, for Flink, I am thinking of FlinkCatalogFactory. When upgrading from 1.12 to 1.13, we had to implement the old 1.12 interface (the interface as of 1.13 mixes in the "new" interface and maintains the deprecated interface for compatibility reasons).
However, it doesn't allow users the ability to choose which path / implementation they'd like to follow. So we had to make a choice to either continue to support 1.12 or to not yet engage in the richer feature support brought by later versions.
If we had modules split by at least major version (1.12, 1.13), then we could implement these small but important difference differently for each version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I think we should add multiple Flink modules, for 1.12, 1.13, 1.14, etc. I'm not sure how we want to manage those. For Spark, we are choosing to copy the code from one version to the next so they are independent. That would make sense for Flink as well. And because Flink only has 2 supported versions at a time, it wouldn't be that much work to maintain.
I agree it's more friendly for iceberg users to maintain iceberg integration work against multiple engine versions, but I doubt our current approach to accomplish the goal. The current spark3.0.x and spark3.1.x integration are sharing all the code but we are trying to copy the integration code into a different module for spark3.1.x, this kind of code splitting indeed isolate the integration difference between different MAJOR.MINOR spark versions. But it's really tried to copy the newly introduced feature for every versions. Take the Core: Enable ORC delete writer in the v2 write path. feature as an example, we need to enable the ORC delete writer for spark2.4, spark3.0, spark3.2, maybe spark3.1 in the future.
The iceberg integration work for flink 1.12 & flink1.13 are also sharing the same code now. Since we can use some methods to shield the subtle differences between different minor versions, is it necessary for us to copy the same code to complete the test build of the minor version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For different flink minor versions, I would recommend reusing existing modules as much as possible, until a minor version we have to use copy code to implement the development work based on subsequent flink new interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have some options here other than copying the code like Spark is doing. What about creating modules for each Flink version and adding a shared source directory and a version-specific source directory?
flink/src/main/java // <-- common code
flink/v1.12/src/main/java // <-- 1.12 code
flink/v1.13/src/main/java // <-- 1.13 code
Then in flink/build.gradle, create iceberg-flink-1.12, iceberg-flink-1.13, and the runtime modules. The main modules have 2 source directories, flink/src/main/java and the version-specific one. Then as the code changes you can move files around. For example, to use a new interface in 1.13, you'd move the code from common into 1.12 and implement the new interface specifically in 1.13.
Would that work, @openinx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to update the PR.
flink/v1.13/build.gradle
Outdated
| } | ||
|
|
||
| project(':iceberg-flink:iceberg-flink-1.13') { | ||
| apply plugin: 'scala' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this accidentally copied from Spark? The build.gradle for Flink doesn't currently use Scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I diffed the builds and everything looks fine except for this. I think it was probably an accident so I removed it to get the CI to run without it.
|
Looks like all of the Flink tests complete in about a minute. The problem is that the |
|
CI is passing, so I'll merge this. Thanks @openinx! Great to have both Flink 1.12 and 1.13 working! |
|
Thanks @rdblue for the changes ! |
This PR is trying to make the flink CI work against both flink 1.12.5 & 1.13.2, related issue: #3183
Currently, the same flink module can works fine with flink 1.12.5 & 1.13.2 after merged the PR: #3116, so we don't have to separate different modules for flink1.12.x and flink1.13.x. In fact, the flink approach is more similar to the hive approach in current iceberg repo, I mean we use the same module to work against both hive2 and hive3.