-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial-progress.max-failed-commits
Incorrectly compare the failureCommit value
#12076
Comments
cc: @manuzhang the commit owner for visibility |
Thanks for reporting this issue, I will submit a fix shortly. |
I'm not sure I understand this, can you elaborate?
|
@manuzhang Thanks a lot! |
@RussellSpitzer Basically this line: https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java#L380 The maxCommits is not the actual total commit count but from the configuration, which will make failedCommits higher than expected. We could either get the actual total commit count or directly collect failureCommit count within error handling block: https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/actions/BaseCommitService.java#L232-L234 |
Got it! That makes sense |
@ruotianwang Are you referring to the case where there are fewer commits than maxCommits due to file group rewrite failure? int groupsPerCommit = IntMath.divide(ctx.totalGroupCount(), maxCommits, RoundingMode.CEILING); Given the above algorithm, I can't think of an example that the number of commits can be smaller than maxCommits without any file group failure. |
@manuzhang This maxCommits is directly getting from the configuration: https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java#L458-L460 I think the edge case is when Here is what I saw within out application log
The error I saw
And another table's job
The error I saw
You can see the total actual rewrite file group is 1 and 3 even though the partial max commit is configured as 10. |
I've submitted #12120 to fix. @ruotianwang @RussellSpitzer please help review, thanks! |
Apache Iceberg version
1.7.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
During the usage of
partial-progress.max-failed-commits
, we've found that the threshold check's false positive rate is too high. After taking a deep look, within this PR: #9611It first get the succeededCommits whenever there is a succeed commit, then calculating
int failedCommits = maxCommits - commitService.succeededCommits();
However, I've found a couple of cases that even though we defined the
partial-progress.max-commits
value, internally iceberg would optimize the group file into a lower number of this max-commits. eg: the actual group file can be smaller than maxCommits definition. In this case, the threshold check above will be wrong.The suggested solution would be instead of calculating succeed commit, we should directly collecting failure commit count and do comparison.
Willingness to contribute
The text was updated successfully, but these errors were encountered: