Skip to content

Conversation

@kbendick
Copy link
Contributor

@kbendick kbendick commented Apr 6, 2022

In #4364, we fixed an issue that was causing data correctness bugs when using upsert mode when writing to v2 table's with Flink.

The fix for the issue did not resolve the issue in Flink 1.12: #4418

Because Flink 1.12 has been deprecated, it was decided that we should log a warning if their table is in Flink's 1.12's upsert mode which asks them to upgrade Flink versions to something supported for the upcoming patch release, Iceberg 0.13.2.

@openinx
Copy link
Member

openinx commented Apr 8, 2022

Because Flink 1.12 has been deprecated, it was decided that we should disable upsert mode in Flink 1.12 for the upcoming patch release, Iceberg 0.13.2.

Will it be better to remove the flink 1.12 in the next iceberg release 0.14.0 ? As iceberg 0.13.2 is a bugfix release for the 0.13.x, Clearing the support for flink 1.12 does not sounds reasonable for me. I would like to port the #4418 into the flink 1.12, and once we decided to remove this we just need to remove the whole flink/v1.12 directory.

@rdblue
Copy link
Contributor

rdblue commented Apr 10, 2022

@openinx, I think this is serious enough that we should not keep writing data in the 0.13.2 release if we can't fix it. Since we can't fix it in Flink 1.12, I think the right option is to disable it and ask the user to upgrade. If you want to fix it for the release, that would be great. But since we will probably remove support for 1.12 in the next release anyway, I don't think it makes much sense to spend time on it.

@openinx
Copy link
Member

openinx commented Apr 11, 2022

Since we can't fix it in Flink 1.12, I think the right option is to disable it and ask the user to upgrade.

Sorry, maybe I missed something. Is there any blocker that we cannot just fix this bug in flink 1.12 ? In my context, I think the approach to fix the bug in fink 1.12 is just porting the patch from flink1.13/1.14 to flink1.12, I usually just export it to a patch file and replace the v1.14 to v1.12 globally.

I can also understand that we are planing to remove the flink 1.12 in the next iceberg release 0.14.0. But in iceberg release 0.13.2, I don't think we should remove flink 1.12 support because it's a bugfix release. So for the release 0.13.2, the correct approach is fixing this bug or disabling the UPSERT write feature. For me, fixing this bug in flink 1.12 seems more reasonable.

@rdblue
Copy link
Contributor

rdblue commented Apr 11, 2022

Is there any blocker that we cannot just fix this bug in flink 1.12 ? In my context, I think the approach to fix the bug in fink 1.12 is just porting the patch from flink1.13/1.14 to flink1.12, I usually just export it to a patch file and replace the v1.14 to v1.12 globally.

My understanding is that there is a bug in Flink 1.12 that prevents us from fixing this, but I'm not sure what that is. I think we'll need @kbendick to jump in and give the details.

I agree with you, if we can easily adapt the fix for 1.13 and 1.14, then we should just fix it!

@kbendick kbendick added this to the Iceberg 0.13.2 Release milestone Apr 11, 2022
@kbendick kbendick force-pushed the disable-upsert-in-flink-1.12 branch from 22c1b33 to afbfd65 Compare April 11, 2022 22:18
@kbendick
Copy link
Contributor Author

Is there any blocker that we cannot just fix this bug in flink 1.12 ? In my context, I think the approach to fix the bug in fink 1.12 is just porting the patch from flink1.13/1.14 to flink1.12, I usually just export it to a patch file and replace the v1.14 to v1.12 globally.

My understanding is that there is a bug in Flink 1.12 that prevents us from fixing this, but I'm not sure what that is. I think we'll need @kbendick to jump in and give the details.

I agree with you, if we can easily adapt the fix for 1.13 and 1.14, then we should just fix it!

@rdblue is correct. The patch from #4364 didn't work with Flink 1.12. The problem still persists. #4418 did not pass all unit tests and I was not able to resolve them. So I agree with Ryan that it's probably better to disable this one feature instead of continue to allow users to get incorrect results.

If they're somehow ok with the results they're getting, they can remain on Iceberg 0.13.1.

I repushed the branch from #4418 that originally had the patch for 1.12 . Let me try using DATE instead of TO_DATE to see if that resolves the issue. Otherwise, I think it's best to continue to disable this one feature in 1.12, given that we'll be removing 1.12 in the next major release anyway.

But if we can't get it to work, I don't think we should block the patch release on fixing it given that technically Flink 1.12 is deprecated upstream anyway and we've removed support in Iceberg 0.14.0-SNAPSHOT too.

@kbendick
Copy link
Contributor Author

kbendick commented Apr 15, 2022

@openinx I left a note on the original PR to backport the fix to Flink 1.12 of which tests are failing (and repushed the branch so you could see for yourself) here #4418

I was unable to find any difference int he code between the two when I was working on it before, but if you're able to make it work then by all means that would be great.

@kbendick kbendick changed the title Flink - Disallow upsert mode for Flink 1.12 due to correctness issues Flink - Log warning when in table upsert mode for Flink 1.12 due to correctness issues May 12, 2022
@kbendick kbendick force-pushed the disable-upsert-in-flink-1.12 branch 2 times, most recently from 324dfc7 to 27bc9c5 Compare May 12, 2022 02:26
@kbendick
Copy link
Contributor Author

As Flink 1.12 has been removed form master, this should likely be cherry-picked directly onto the branch for Flink release-base-0.13 or generally only targeting 0.13.2 and any later patch releases in the 0.13.x series (which are not likely to happen).

cc @rdblue @nastra throwing an exception in Flink for 0.13.2 has been changed to logging an error.

Please let me know if you'd like anything changed =)

String deprecationNotice =
"This job is running in upsert mode. Upsert mode should not be used with Flink 1.12 due to correctness " +
"issues. Please upgrade to Flink 1.13+ if upsert mode is truly needed. If upsert mode is not needed, " +
"set the table property `write.upsert.enabled` to 'false' and don't use `upsert(true)` while " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think including the part about "if truly needed" isn't necessary. I think people only use upsert if it is needed. It would simplify the message to remove that part of it.

Also, can we be more specific about the problem? For example, "Upsert mode should not be used with Flink 1.12 because it will write incorrect delete file metadata, which could prevent deletes from being correctly applied. Upgrading to Flink 1.13+ is recommended. To safely use Flink 1.12, set manifest metrics to counts only."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I kept the language similat to the message before, but happy to update it however.

I can see what you mean by if truly needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that Flink 1.12 won't be supported in the next major Iceberg release? Support is also dropped upstream entirely (they won't even cherry-pick to 1.13 now), so it might be worth mentioning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Movedd this to an inline log (vs the variable), as the 1.12 CI isn't running as it's not in master, but I got a checkstyle exception using a non-constant expression in a log statement.

@kbendick
Copy link
Contributor Author

cc @rdblue @nastra the requested changes have been made =)

I also ran ./gradlew clean -DflinkVersions=1.12,1.13,1.14 build -x javadoc -x integrationTest and it ran to completion successfully (keep in mind 1.12 isn't in master CI - though it will be in your cherry-pick branch).

@kbendick kbendick changed the base branch from master to 0.13.x May 12, 2022 06:30
@kbendick
Copy link
Contributor Author

I changed the base of this to be 0.13.x.

Will work on fixes.

@kbendick kbendick changed the base branch from 0.13.x to master May 12, 2022 06:32
@kbendick
Copy link
Contributor Author

I changed the base of this to be 0.13.x.
Will work on fixes.

Just kidding! Changing the target branch to 0.13.x from master made the diff massive. I'll open a new PR and apply a patch from here to there.

@rdblue
Copy link
Contributor

rdblue commented May 12, 2022

I merged the warning PR, so I don't think there is any more to do here. I'll close this one. Thanks, @kbendick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants