-
Notifications
You must be signed in to change notification settings - Fork 3k
[0.13] Flink upsert delete file metadata backports #4786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.13] Flink upsert delete file metadata backports #4786
Conversation
|
@kbendick could you please double-check if there are any other Flink changes that need to be backported? |
|
|
||
| @Override | ||
| protected StructLike asStructLikeKey(RowData data) { | ||
| return wrapper.wrap(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbendick should this throw an exception like in
iceberg/flink/v1.12/flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
Line 109 in 910f271
| throw new UnsupportedOperationException("Not implemented for Flink 1.12 during PR review"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep it this way upon initial inspection.
It needs to be implemented for the API and it will possibly get called. Keeping it the same as asStructLike seems safest for backwards compatibility.
34edff3 to
275dce6
Compare
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I wanted to take some time to verify some of the changes in core didn’t cause issues because we didn’t backport.
TLDR is that it shouldn’t (and we’re emphatically alerting people to upgrade anyway).
|
The test issue looks like a github problem. I tried to cancel, but it looks like I can't right now. I'll check back in a few hours. It should time out. |
|
Yes the job is timing out but it's unclear why because the run itself doesn't show any output. Locally it helped to increase the Heap size for Gradle (since the build didn't progress as well locally) |
|
So GH Actions use the settings in master for security purposes. Mayhe merging in the heap increase first will help? I doubt the tests are running with the heap increase if it’s in this PR. Just a guess but the tests didn’t complete without increased heap for me as well. Also, since this is a release branch, feel free to make the heap slightly bigger and/or tune the forked JVM’s GC settings in the settings.gradle (or wherever). But I think running with the heap settings already merged in will help the most. I think you can try it in your fork if you run the tests there. |
|
Oh and the other thing would be to add class unloading into the test JVM settings. I helped tune the JVM for CI in Spark and class unloading, a fixed MaxMetaSpaceSize (to force class unloading), and somewhat larger heaps helped quite a bit for a situation that had gotten really out of hand. Also my only meaningful contribution to the Spark repo and the credit got misattributed as I never added that email to my GitHub 😂 But class unloading really helps due to Flink’s inverted class loader. And out JVM prior to this change (and even still really) is very small in my opinion. Again, since this is a release branch and not one we’ll work off of ever again very likely, feel free to be liberal in updating the JVM for test purposes! They just need to run enough to make the release! |
|
The only other issue I can think of is a GitHub issue with the JVM cache as the problem is sticky for Flink 1.13. But the most recent attempt did run somewhat so I don’t think that’s it. I think it’s just poor memory utilization / availability combined with some somewhat heavy Flink SQL upsert tests. |
Co-authored-by: liliwei <[email protected]>
Co-authored-by: liliwei <[email protected]>
… one checkpoint cycle (apache#4189)
275dce6 to
5ea6b7f
Compare
|
I think I found the issue (which can easily be debugged locally by adding |
+1 to better timeout resolution. It's possible there's something else we need to cherry-pick given the the underlying exception for that one test case. I'll run locally and we'll see. We can also try to schedule time to talk if you'd like. I'm very interested in getting this out the door. Though tests do seem to be passing now so potentially you already got to it. 🙂 |

This backports #4417 / #3834 (because #4364 depends on it) / #4364 / #4189 (because CI was failing without this test fix) to the 0.13.x branch