Skip to content

Conversation

@raunaqmorarka
Copy link
Member

Description

Avoids coordinator OOM and too many small manifests

Additional context and related issues

Fixes #26323

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Fix coordinator crashes from running OPTIMIZE_MANIFESTS on partitioned tables. ({issue}`26323`)

@cla-bot cla-bot bot added the cla-signed label Sep 11, 2025
@raunaqmorarka raunaqmorarka requested a review from ebyhr September 11, 2025 21:32
@github-actions github-actions bot added the iceberg Iceberg connector label Sep 11, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes coordinator crashes from running OPTIMIZE_MANIFESTS on partitioned Iceberg tables by limiting the number of manifest files that can be created during the optimization process.

  • Updated clustering logic to hash partition values and limit buckets to 16 to prevent OOM issues
  • Changed from using only the first partition field to considering all partition fields for better data locality
  • Updated test expectations to reflect the new manifest file count limits

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
IcebergMetadata.java Modified clustering logic to hash all partition fields and limit buckets to 16
TestIcebergOptimizeManifestsProcedure.java Updated test to verify new manifest count limits and improved test coverage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Avoids coordinator OOM and too many small manifests
}
long totalManifestsSize = manifests.stream().mapToLong(ManifestFile::length).sum();
// Having too many open manifest writers can potentially cause OOM on the coordinator
long targetManifestClusters = Math.min(((totalManifestsSize + manifestTargetSizeBytes - 1) / manifestTargetSizeBytes), 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to make this 100 configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's mainly there as a safety net. It should be rare to reach that number. We've seen coordinator OOM with large number of open manifest writers here, so don't want to expose it as a configureable to users.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@raunaqmorarka raunaqmorarka merged commit de10150 into trinodb:master Sep 15, 2025
83 of 84 checks passed
@github-actions github-actions bot added this to the 477 milestone Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

OOME in Trino coordinator when performing Iceberg OPTIMIZE_MANIFESTS operations

4 participants