-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Limit number of manifest files written by OPTIMIZE_MANIFESTS #26616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes coordinator crashes from running OPTIMIZE_MANIFESTS on partitioned Iceberg tables by limiting the number of manifest files that can be created during the optimization process.
- Updated clustering logic to hash partition values and limit buckets to 16 to prevent OOM issues
- Changed from using only the first partition field to considering all partition fields for better data locality
- Updated test expectations to reflect the new manifest file count limits
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| IcebergMetadata.java | Modified clustering logic to hash all partition fields and limit buckets to 16 |
| TestIcebergOptimizeManifestsProcedure.java | Updated test to verify new manifest count limits and improved test coverage |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
Avoids coordinator OOM and too many small manifests
c88733c to
11ff09a
Compare
| } | ||
| long totalManifestsSize = manifests.stream().mapToLong(ManifestFile::length).sum(); | ||
| // Having too many open manifest writers can potentially cause OOM on the coordinator | ||
| long targetManifestClusters = Math.min(((totalManifestsSize + manifestTargetSizeBytes - 1) / manifestTargetSizeBytes), 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to make this 100 configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's mainly there as a safety net. It should be rare to reach that number. We've seen coordinator OOM with large number of open manifest writers here, so don't want to expose it as a configureable to users.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Description
Avoids coordinator OOM and too many small manifests
Additional context and related issues
Fixes #26323
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: