API: Define RepairManifests action interface #14341

geruh · 2025-10-16T03:21:33Z

We've been seeing a need for repair manifests functionality at AWS to fix some corrupted Iceberg tables and been using a workaround with the original pr to fix. @amogh-jahagirdar mentioned it’d be good for someone to drive this forward since he doesn’t currently have the bandwidth, so I’ve picked up the work on RepairManifests (#10784), which was originally opened by @tabmatfournier in #10445.

You can check out the original discussions in #10784.

In this reopened PR, I’ve updated RepairManifests so that the result returns counts instead of full paths for duplicated, recovered and removed files. This helps avoid extremely large lists in the results when huge portions of the tables manifests have been impacted. For instance, when users have S3 retention policies or partitioned paths that are deleted from storage. In most cases, I think it's safer to return stats based on the count of operations than inspecting each file. The updated result still includes the new set of manifest lists, so users can inspect the changes if needed.

Co-authored-by: Mathew Fournier <[email protected]>

Co-authored-by: Amogh Jahagirdar <[email protected]> Co-authored-by: Mathew Fournier <[email protected]>

singhpk234 · 2025-10-16T04:14:22Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   * Configuration method for removing duplicate file entries and removing files which no longer
+   * exist in storage


[doubt] My understanding is it's fine to ignore files which are duplicate, as it may happen a writer committed same file twice (may didn't checkpoint correctly) ,
but can we just ignore the files which are no longer there in the storage ,lets say some one accidentally deleted a file, its like acknowledging this is incorrect and move on, should ignore deleted files be a seperate API, rather than both under same API ?

Hey @singhpk234!! In regards to duplicates, I've seen if we have duplicates committed across snapshots it will result in duplicate rows in Spark. As for duplicates in the same operation, they are deduplicated with the DataFileSet during the operation. But in the previous PR, there are some users who can't even read their tables right now if their table has duplicate manifests accross the snapshots. For deleted files, the FileIO will throw an exception when it can't open a missing file but that's at th engine level.

I agree that duplicates and missing files are different types of issues, but I still think they should be handled together in one action. The main goal is to restore the table to a queryable state and returning the manifests to most correct state. Most users just want to be able to query their table again that’s the priority. Once the table is readable, they can inspect it, understand what went wrong, and decide whether it’s worth doing a deeper fix.

But, I'd like to hear what others think about this!

Yeah fundamentally the goal of this repair procedure is for unblocking users to access their tables.

When there are files referenced in metadata that don't actually exist in storage (caused by object storage retention policies typically), then there's a choice:

Either the file can be recovered. If it can be, I think we should always try for that.

The file cannot be recovered, in which case we should unblock the users table by removing that file, and indicating to the user that a file was missing and could not be recovered, and as a result we removed it.

We could treat best effort file recovery as separate but I don't really see the value in that. If you're running this procedure, you're probably running it because you hit some failure and you invoke this procedure with the intent to get the table back to a readable state.

They key thing is that the output should indicate what happened to a user so they can at least reason about the fact that after this procedure there may be more (in case of recovery) records than before.

I covered missing files in my reply above but for duplicates, the same principle applies there.
We know that duplicate entries is a problem; no client ever intends to do that, it just happens (zombies in KC etc) due to cases we don't handle yet. Even thoguh the table is still readable, we know it's a correctness issue in this case because scan planning would return the files twice. We could maybe address this in the implementation in scan planning, but this is just more state that'd need to be retained in nodes doing planning, and it still really just masks the issue for other implementations. So I think the right solution here is also to just fix up the manifest and report to the user.

amogh-jahagirdar · 2025-10-17T21:53:36Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   * Configuration method for removing duplicate file entries and removing files which no longer
+   * exist in storage


Yeah fundamentally the goal of this repair procedure is for unblocking users to access their tables.

When there are files referenced in metadata that don't actually exist in storage (caused by object storage retention policies typically), then there's a choice:

Either the file can be recovered. If it can be, I think we should always try for that.

The file cannot be recovered, in which case we should unblock the users table by removing that file, and indicating to the user that a file was missing and could not be recovered, and as a result we removed it.

We could treat best effort file recovery as separate but I don't really see the value in that. If you're running this procedure, you're probably running it because you hit some failure and you invoke this procedure with the intent to get the table back to a readable state.

They key thing is that the output should indicate what happened to a user so they can at least reason about the fact that after this procedure there may be more (in case of recovery) records than before.

amogh-jahagirdar · 2025-10-17T21:59:10Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+    /** Returns the number of manifest entries for which stats were incorrect */
+    long entryStatsIncorrectCount();
+
+    /** Returns the number of manifest entries for which stats were corrected */
+    long entryStatsRepairedCount();


Hm, when would these two not be the same? If it's incorrect, we'd always correct it no?

Yeah, truly we only care about what was corrected here. I initially had the distinction because if there were failures (though not sure that's a real case here), we could document them. But more importantly, for the case where we run without the repair flag, we could report how many incorrect entry stats you have.

But looking back now, it wouldn't make sense to differentiate since If repair fails, the operation would likely fail entirely rather than partially succeed and incorrect count is the same as repaired in a normal op.

amogh-jahagirdar · 2025-10-17T22:07:56Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+    Iterable<ManifestFile> rewrittenManifests();
+
+    /** Returns the number of duplicate file entries that were removed from manifests */
+    long duplicateFilesRemoved();
+
+    /** Returns the number of missing file references that were removed from manifests */
+    long missingFilesRemoved();
+
+    /** Returns the number of missing files that were successfully recovered. */
+    long missingFilesRecovered();


I think last time most of the debate was on the output, which I think I'd like to advance the conversation a bit more. So the way I look at this is as a user I really want to see numbers of files that were repaired in the different categories.

I don't think we particularly care about the actual paths of data files, and besides that result set could be unfortunately large. Even if we do output for example which files were removed because they just don't exist and could not be recovered, that's not actionable output to a user; that file is just gone.

The numbers would clearly indicate how many files have been recovered, how many files have not been recovered, how many duplicates existed.

Then we could surface the higher level in the tree of which manifests were rewritten. If someone really cares about which particular files were there they could go through those. and even later on in the API it's not a 1 way door, if there's a need to see which particular files we could have an option to output those files to some other file.

I agree with this approach. For the most part, the user will need actionable information to help them understand the impact of running the repair operation. Whether thy just want counts or are curious about the actual fixes then we can take a step up the tree and surface the rewritten manifests to encapsulate the leafs or content files which the operation touched.

amogh-jahagirdar · 2025-10-18T00:38:18Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+  /** The action result that contains a summary of the execution. */
+  interface Result {
+    /** Returns rewritten manifests. */
+    Iterable<ManifestFile> rewrittenManifests();


I went back and forth on this but I do think returning the whole manifest is reasonable, though it's not completely perfect for someone who wants to identify the number of records that may be impacted (because the rolled up manifest summaries could contain file that never needed to be repaired).

Alternatively, we could just return the paths and record level counts but I don't think we need to complicate it.

Besides, those would be rolled up in the produced snapshot anyways? Furthermore, we can always add that info later.

hangc0276 · 2025-10-18T07:00:38Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   *
+   * @return this for method chaining
+   */
+  RepairManifests repairFileEntries();


From the description, the interface only handles the missing or duplicated data files and does not include the missing manifest files.
This bug (#12792) accidentally deleted the manifest file, resulting in a corrupted table. I think we also need a separate interface repairManifestEntries() to fix the corrupted table caused by manifest issues.

Hey @hangc0276, this is a great point about the scope of this operation. You're right that the current RepairManifests action focuses on repairing issues within manifests (duplicate data file entries, missing data files, incorrect stats), but it doesn't handle the case where manifest files themselves are missing or corrupted.

This issue seems a level above what we're addressing here, unless we expand the scope to make this operation more about repairing metadata in general.

github-actions · 2025-11-21T00:17:57Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

mdub · 2025-11-21T01:44:51Z

I'm still watching this, and hoping it will land.

(i.e. not-stale)

github-actions · 2025-12-22T00:20:06Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2025-12-29T00:20:56Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

amogh-jahagirdar and others added 2 commits October 15, 2025 17:37

API: Define RepairManifests action interface

b5f1f61

Co-authored-by: Mathew Fournier <[email protected]>

Update comments and update Result interface

d8783cc

Co-authored-by: Amogh Jahagirdar <[email protected]> Co-authored-by: Mathew Fournier <[email protected]>

github-actions bot added the API label Oct 16, 2025

singhpk234 reviewed Oct 16, 2025

View reviewed changes

amogh-jahagirdar reviewed Oct 17, 2025

View reviewed changes

amogh-jahagirdar reviewed Oct 18, 2025

View reviewed changes

hangc0276 reviewed Oct 18, 2025

View reviewed changes

consolidate the entry stats metric

6194010

github-actions bot added the stale label Nov 21, 2025

github-actions bot removed the stale label Nov 22, 2025

github-actions bot added the stale label Dec 22, 2025

github-actions bot closed this Dec 29, 2025

		* Configuration method for removing duplicate file entries and removing files which no longer
		* exist in storage

API: Define RepairManifests action interface #14341

API: Define RepairManifests action interface #14341

Uh oh!

Conversation

geruh commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

singhpk234 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geruh Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geruh Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

mdub commented Nov 21, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

geruh commented Oct 16, 2025 •

edited

Loading

singhpk234 Oct 16, 2025 •

edited

Loading

geruh Oct 16, 2025 •

edited

Loading

geruh Oct 21, 2025 •

edited

Loading