Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#30789] Add support for Flink 1.18 #31062

Merged
merged 7 commits into from
May 7, 2024
Merged

Conversation

thebozzcl
Copy link
Contributor

@thebozzcl thebozzcl commented Apr 19, 2024

This PR tries to solve #30789 by adding a runner for Flink 1.18. So far I haven't identified any changes that need to be done beyond the version bump.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@thebozzcl
Copy link
Contributor Author

As I thought, seems there's a flaky test: https://github.com/apache/beam/actions/runs/8760406645/job/24045305260

 --- FAIL: TestElementChan (0.00s)
    --- FAIL: TestElementChan/FillBufferThenAbortThenRead (0.00s)
        datamgr_test.go:412: got sum 5, count 5, want sum 20, count 20
FAIL

I'm gonna take a look at it to see if I can identify what caused the previous run to fail. That being said, Go is not my forte.

@Abacn
Copy link
Contributor

Abacn commented Apr 22, 2024

Thanks for the work! Yeah the flaky go SDK test is irrelevant

CHANGES.md Outdated Show resolved Hide resolved
@@ -1,12 +1,12 @@
{
"name": "apache-beam",
"version": "2.54.0-SNAPSHOT",
"version": "2.57.0-SNAPSHOT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this is an unrelated change -- should this version bump be automated as part of cut-release-branch script? cc: @robertwb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove this unrelated change. We can merge this afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it, but I didn't change it myself... won't it just get bumped again automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd, I pushed a commit but it's not showing directly in the PR's file changes. It's reflected properly in my branch and in the commit's changes, though.

Copy link
Contributor

@je-ik je-ik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thebozzcl would you please remove the unrelated change to (possibly) different PR? This can be merged then (after checks pass).

@@ -1,12 +1,12 @@
{
"name": "apache-beam",
"version": "2.54.0-SNAPSHOT",
"version": "2.57.0-SNAPSHOT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove this unrelated change. We can merge this afterwards.

@@ -101,5 +101,5 @@ jobs:
with:
gradle-command: :sdks:java:testing:tpcds:run
arguments: |
-Ptpcds.runner=:runners:flink:1.17 \
-Ptpcds.runner=:runners:flink:1.18 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not putting comments at once, but just realize that these github action changes will apply to both the master branch and the ongoing release process, but the current pending release-2.56.0 branch do not have :runners:flink:1.18 project which will causing issue

To avoid issue we can leave .github/ content as is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might merge this after the release of 2.56.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in no rush to get this merged, so I'd be okay with that.

@@ -6,7 +6,7 @@
"packages": {
"": {
"name": "apache-beam",
"version": "2.54.0-SNAPSHOT",
"version": "2.57.0-SNAPSHOT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, there is still this change, which might cause the failure of typescript check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally understood what happened here: there's two lines with the package version, and I failed to notice the second one.

Copy link
Contributor

@je-ik je-ik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating the PR, @thebozzcl. One last thing, can you please squash the commits? I'll merge this once the release of 2.56.0 is out and once the checks pass.

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go.
R: @Abacn for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@je-ik
Copy link
Contributor

je-ik commented Apr 30, 2024

R: @je-ik

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@je-ik
Copy link
Contributor

je-ik commented May 7, 2024

Looks like this is all green and ready to merge. Can you please squash the commits to single commit? Thanks. 👍

@je-ik je-ik self-requested a review May 7, 2024 08:30
@je-ik je-ik mentioned this pull request May 7, 2024
3 tasks
@Abacn
Copy link
Contributor

Abacn commented May 7, 2024

Actually committer can "squash and merge" using GitHub website. No further action needed for PR author.

@je-ik je-ik merged commit 45fe4f9 into apache:master May 7, 2024
77 checks passed
@je-ik
Copy link
Contributor

je-ik commented May 7, 2024

@Abacn Cool, I was not aware of that 👍
@thebozzcl Thanks!

@thebozzcl
Copy link
Contributor Author

@Abacn nice, I wasn't aware of that either! @je-ik thanks for taking care of it!

@shunping
Copy link
Contributor

shunping commented May 12, 2024

The task :runners:flink:1.18:runQuickstartJavaFlinkLocal has failed to run due to this commit, which leads to the recent failure of our github action "https://github.com/apache/beam/actions/workflows/beam_PostRelease_NightlySnapshot.yml".

Below is the latest error from the log:

2024-05-10T16:28:07.4342882Z Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionConfig
2024-05-10T16:28:07.4344273Z 	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
2024-05-10T16:28:07.4345556Z 	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
2024-05-10T16:28:07.4346644Z 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
2024-05-10T16:28:07.4347805Z 	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
2024-05-10T16:28:07.4348942Z 	at java.lang.Class.forName0(Native Method)
2024-05-10T16:28:07.4349661Z 	at java.lang.Class.forName(Class.java:348)
2024-05-10T16:28:07.4350979Z 	at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
2024-05-10T16:28:07.4352850Z 	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1988)
2024-05-10T16:28:07.4354599Z 	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
2024-05-10T16:28:07.4356870Z 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
2024-05-10T16:28:07.4358955Z 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
2024-05-10T16:28:07.4360298Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
2024-05-10T16:28:07.4362035Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
2024-05-10T16:28:07.4364329Z 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:539)
2024-05-10T16:28:07.4367296Z 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:527)
2024-05-10T16:28:07.4368928Z 	at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:67)
2024-05-10T16:28:07.4370661Z 	at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:101)
2024-05-10T16:28:07.4373136Z 	at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:122)
2024-05-10T16:28:07.4375581Z 	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:379)
2024-05-10T16:28:07.4376991Z 	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:356)
2024-05-10T16:28:07.4379145Z 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
2024-05-10T16:28:07.4382049Z 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
2024-05-10T16:28:07.4384288Z 	at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
2024-05-10T16:28:07.4385674Z 	... 4 more

+@damccorm

@thebozzcl
Copy link
Contributor Author

thebozzcl commented May 12, 2024

Hmm. I'm gonna try looking into it, but I have limited connectivity at the moment and might not be able to do much until later this week. What's the preferred way to deal with this? Do we want to roll back?

je-ik added a commit to je-ik/beam that referenced this pull request May 13, 2024
@je-ik
Copy link
Contributor

je-ik commented May 13, 2024

I can confirm the failure locally, we should revert this until we can make the check pass.

#31262

@je-ik
Copy link
Contributor

je-ik commented May 13, 2024

Correction. The test locally fails even before this commit (e.g. tested 49056fd7ed111c4c9ebb9236ab447d39f0aa86f3 with 1.17). The check seems to have independent issues.

@thebozzcl
Copy link
Contributor Author

I took a "quick" look (it took a while to dl all the dependencies in a connection that's not much better than dial-up). I'm kind of baffled by this issue - I checked the flink-core library and org.apache.flink.api.common.ExecutionConfig hasn't been moved around at all since 1.14.

We should probably create a separate ticket for this.

@Abacn
Copy link
Contributor

Abacn commented May 13, 2024

@je-ik PostRelease test is different from other tests. It downloads the latest Nightly Snapshot and run some validation pipelines. It does not build beam on the current branch, and that is why you see "The test locally fails even before this commit".

@shunping
Copy link
Contributor

@je-ik PostRelease test is different from other tests. It downloads the latest Nightly Snapshot and run some validation pipelines. It does not build beam on the current branch, and that is why you see "The test locally fails even before this commit".

@je-ik, @Abacn: Given the task :runners:flink:1.18:runQuickstartJavaFlinkLocal is the failed task and it was introduced by this commit, can we revert it and see if the test is fixed first? We can look into other test issues if they emerge after that.

@shunping
Copy link
Contributor

Looks like someone have reported this issue on Flink 1.18 before.

@Abacn
Copy link
Contributor

Abacn commented May 13, 2024

I agree revert at the change moment

:runners:flink:1.17:runQuickstartJavaFlinkLocal passed on Beam 2.56.0:

:runners:flink:1.17:runQuickstartJavaFlinkLocal (as well as :1.18:runQuickstartJavaFlinkLocal) failed on master:

This means current master branch is failing Flink runner validation for both Flink 1.18 and under 1.18, possibly caused by this PR.

If we revert the change and wait for another nightly snapshot built, and the PostRelease test is back green, we can confirm

@Abacn
Copy link
Contributor

Abacn commented May 13, 2024

if revert, there is another PR need to be reverted at the same time : #31217

This one included a typo fix which shouldn't be reverted : https://github.com/apache/beam/pull/31217/files#diff-07d5f4d101410e8cf42b7241fe72074d3c8003866e2b077f51388e1283ebe442R337

@Abacn
Copy link
Contributor

Abacn commented May 13, 2024

Checking this PR again, it does not include non-trivial change. I doubt that revert it will make the flinkQuickstart back green again though

@Abacn Abacn mentioned this pull request May 13, 2024
3 tasks
Abacn added a commit that referenced this pull request May 13, 2024
je-ik added a commit to je-ik/beam that referenced this pull request May 23, 2024
@kennknowles
Copy link
Member

Just for posterity and everyone else getting notifications. As release manager I learned just now that we need a tickets such as this whenever we add a new version: https://issues.apache.org/jira/browse/INFRA-25880

These can take a few days to get resolved, so it is good to do it right away as part of adding the new version support.

@Abacn
Copy link
Contributor

Abacn commented Jun 18, 2024

A proposal to use single dockerhub repository for Flink job server container: #31631

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants