Skip to content

Conversation

@grantatspothero
Copy link
Contributor

@grantatspothero grantatspothero commented May 21, 2025

Description

Reduces memory usage of coordinator during iceberg remove orphan files procedure for tables with large numbers of snapshots.

See comment for description of fix:

    /**
     * Use instead of loadAllManifestsFromSnapshot when loading manifests from multiple distinct snapshots
     * Each BaseSnapshot object caches manifest files separately, so loading manifests from multiple distinct snapshots
     * results in O(num_snapshots^2) copies of the same manifest file metadata in memory
     */

Additional context and related issues

To show decreased memory usage:

  1. Created a table with 1000 snapshots/1000 manifest files and ran orphan files
    @Test
    public void testRemoveOrphanFilesOnTableWithManySnapshots()
    {
        String tableName = "test_remove_orphan_files_on_table_with_many_snapshots_" + randomNameSuffix();
        String fullyQualifiedTableName = "iceberg.%s.%s".formatted(getSession().getSchema().orElseThrow(), tableName);

        // Create table with 10K snapshots
        assertUpdate("SET SESSION iceberg.merge_manifests_on_write = false");
        assertUpdate("CREATE TABLE %s (a) AS VALUES 0".formatted(fullyQualifiedTableName), 1);
        for (int i = 1; i < 1000; i++) {
            assertUpdate("INSERT INTO %s VALUES %d".formatted(fullyQualifiedTableName, i), 1);
        }

        assertQuerySucceeds("ALTER TABLE %s EXECUTE REMOVE_ORPHAN_FILES".formatted(fullyQualifiedTableName));
        assertThat(computeScalar("SELECT count(*) FROM %s".formatted(fullyQualifiedTableName))).isEqualTo(1000L);
        assertUpdate("DROP TABLE %s".formatted(fullyQualifiedTableName));
    }
  1. Attempted to take heapdump, but it did not show a difference in memory usage attributed to the caching of manifest files on the BaseSnapshot object. Root cause was incorrect attribution of memory usage of the List, because BaseSnapshot#cacheManifests calls ManifestLists.read which returns a LinkedList. The heapdump memory attribution logic was not correctly attributing all the objects in the LinkedList to BaseSnapshot, only the first node.
  2. Manually inspected table.snapshots() in a debugger to confirm the manifest caching was not occurring.

(before change)
Screenshot 2025-05-21 at 3 03 13 PM
(after change)
Screenshot 2025-05-21 at 3 06 21 PM

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Reduce memory usage of remove_orphan_files procedure. ({issue}`25847`)

@cla-bot cla-bot bot added the cla-signed label May 21, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label May 21, 2025
@grantatspothero grantatspothero force-pushed the gn/removeOrphanFilesMemoryLeak branch from b5f7198 to 6cce0e7 Compare May 21, 2025 15:38
@grantatspothero grantatspothero force-pushed the gn/removeOrphanFilesMemoryLeak branch from 6cce0e7 to a7c53f8 Compare May 21, 2025 20:16
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this move to ManifestUtils ?

Copy link
Contributor Author

@grantatspothero grantatspothero May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the org.apache.iceberg package.

The intention of this class was to utilize the package private method ManifestLists.read.

We want the behavior of BaseSnapshot.allManifests() without caching, but iceberg does not support that. My workaround works fine for v2 iceberg tables, but I'm concerned it might not work with v1 iceberg tables. See:
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseSnapshot.java#L174-L186

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to restrict the fix to V2 iceberg tables while we look for a solution for V1 tables in https://apache-iceberg.slack.com/archives/C03LG1D563F/p1747921709647279 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantatspothero i've pushed changes here to restrict the fix to v2 tables, PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something we have to do this PR, but can we start moving these types of utilities out into a separate utility class? We have lots of utilities throughout with similar/close functionality repeated, and I imagine most occasional contributors (like myself) don't even know they exist because they are placed in the calling class as a private method.

@raunaqmorarka raunaqmorarka requested review from Copilot and ebyhr May 22, 2025 02:43

This comment was marked as outdated.

@raunaqmorarka raunaqmorarka changed the title Fix memory leak in Iceberg remove orphan files procedure Reduce memory usage of remove_orphan_files procedure May 22, 2025
@raunaqmorarka raunaqmorarka added the bug Something isn't working label May 22, 2025
@grantatspothero grantatspothero force-pushed the gn/removeOrphanFilesMemoryLeak branch 3 times, most recently from c715bb8 to 347f370 Compare May 22, 2025 14:10
@github-actions
Copy link

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions bot added the stale label Jun 23, 2025
@github-actions github-actions bot removed the stale label Jun 24, 2025
@github-actions
Copy link

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions bot added the stale label Jul 16, 2025
@raunaqmorarka raunaqmorarka force-pushed the gn/removeOrphanFilesMemoryLeak branch from 347f370 to a279f4e Compare August 1, 2025 12:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reduces memory usage during Iceberg's remove orphan files procedure by avoiding duplicate caching of manifest files when processing tables with many snapshots. The fix introduces a new method to load manifests directly from manifest lists instead of going through snapshot objects that cache manifests independently.

  • Introduces a new ManifestUtils class with a memory-efficient method to read manifest files
  • Modifies the orphan file removal procedure to use direct manifest list reading for better memory efficiency
  • Maintains backward compatibility for V1 tables with embedded manifest lists

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ManifestUtils.java New utility class providing memory-efficient manifest file reading
IcebergMetadata.java Updated orphan file removal logic to use the new manifest reading approach
Comments suppressed due to low confidence (1)

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java:2265

  • [nitpick] The variable name allManifests is somewhat ambiguous. Consider renaming it to manifestFiles or manifests to be more concise and clear.
            List<ManifestFile> allManifests;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something we have to do this PR, but can we start moving these types of utilities out into a separate utility class? We have lots of utilities throughout with similar/close functionality repeated, and I imagine most occasional contributors (like myself) don't even know they exist because they are placed in the calling class as a private method.

@raunaqmorarka raunaqmorarka force-pushed the gn/removeOrphanFilesMemoryLeak branch from a279f4e to d436fc9 Compare August 1, 2025 15:22
@raunaqmorarka raunaqmorarka force-pushed the gn/removeOrphanFilesMemoryLeak branch from d436fc9 to a5b0e00 Compare August 1, 2025 16:13
@github-actions github-actions bot removed the stale label Aug 1, 2025
validMetadataFileNames.add(fileName(snapshot.manifestListLocation()));
String manifestListLocation = snapshot.manifestListLocation();
List<ManifestFile> allManifests;
if (manifestListLocation != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we have tests using v1 iceberg format for remove orphan files. We should add at least one as a sanity check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case of V1 tables where write.manifest-lists.enabled is set to false. However, the iceberg library removed the ability to write embedded manifest list a few years ago (apache/iceberg#5773). So we don't have a way to produce such data in the tests.
It might be possible to generate such data from an old version and test on that pre-generated dataset, but I think that is overkill at the moment. The fallback logic looks safe enough to me.

Copy link
Contributor Author

@grantatspothero grantatspothero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as it fixes a real issue but I'm not happy with adding separate code paths for different iceberg spec versions.

For context I tried asking iceberg maintainers for advice on what to do here:
https://apache-iceberg.slack.com/archives/C03LG1D563F/p1747921709647279

Suggestion was to implement a per table metadata cache of manifest files. I spent a few hours doing this and then had to pivot to other things. If someone is familiar with the avro reader iceberg uses would love to pair with them.

@raunaqmorarka
Copy link
Member

This is fine as it fixes a real issue but I'm not happy with adding separate code paths for different iceberg spec versions.

I don't see the point of doing extra work for V1 tables, why would anyone still be using those ?

@grantatspothero
Copy link
Contributor Author

grantatspothero commented Aug 1, 2025

This is fine as it fixes a real issue but I'm not happy with adding separate code paths for different iceberg spec versions.

I don't see the point of doing extra work for V1 tables, why would anyone still be using those ?

This would break remove orphan files for v1 iceberg tables, the trino docs say The connector supports Apache Iceberg table spec versions 1 and 2.: https://trino.io/docs/current/connector/iceberg.html

@raunaqmorarka
Copy link
Member

raunaqmorarka commented Aug 1, 2025

This is fine as it fixes a real issue but I'm not happy with adding separate code paths for different iceberg spec versions.

I don't see the point of doing extra work for V1 tables, why would anyone still be using those ?

This would break remove orphan files for v1 iceberg tables, the trino docs say The connector supports Apache Iceberg table spec versions 1 and 2.: https://trino.io/docs/current/connector/iceberg.html

I'm not talking about dropping support, I'm talking about doing extra work to have the same code path or same fix for V1.
The intent of the current change is to leave everything as it was for V1. If someone comes with an issue for V1, I'd rather tell them to update to V2 than write more code for V1.

@raunaqmorarka raunaqmorarka merged commit a0992f1 into trinodb:master Aug 1, 2025
62 of 65 checks passed
@github-actions github-actions bot added this to the 477 milestone Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

5 participants