Improve performance of reading iceberg table with many equality delete files by Heltman · Pull Request #17115 · trinodb/trino

Heltman · 2023-04-19T06:26:18Z

Description

Additional context and related issues

If a split with many delete file RowPredicate.and will create a deep stack, this pr compact all StructLikeSet to a collection to reduce stack depth.

The stack depth is only a hidden danger. The real problem is that multiple StructLikeSet of a split are not merged according to the ids, resulting in too many StructLikeSet being generated, which makes the filtering efficiency very lowly.

The main content is to classify delete files according to id, and only use the same StructLikeSet to collect deletion data.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Improve performance of reading iceberg table with many equality delete files. ({issue}`17114`)

alexjo2144

The core change here, reducing the number of StructLikeSets created seems like a good idea to me. Just a couple code simplification questions.

Just for reference here's a thread on this from the original PR: #13219 (comment)
cc: @findepi

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/EqualityDeleteFilter.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/EqualityDeleteSets.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

xiacongling · 2023-04-23T04:52:53Z

Great job on a quick fix of the issue, @Heltman @alexjo2144 ! It may significantly improve Trino's performance on Iceberg delete file filtering.

IMHO, the patch made by @alexjo2144 makes less code changes and seems easier to understand, but some problems may not completely resolved:

EqualityDeleteSets of delete files can be grouped by thier schemas. Since the equality field IDs come from the table schema, for two delete files with the same fields in different order, the projection structs will be identical. Using a set is preferred and it is what Iceberg does in its org.apache.iceberg.data.DeleteFilter.
chaining RowPredicts with RowPredict.and will lead to recursive method calls, it will impact the performance and have the risk of stack overflow when too many delete files present. Delete file grouping can significantly reduce the number of delete filters, this improvement may not seem as important. Since delete filter will be applied to each row scanned from data files, even the smallest enhancement may improve the efficiency of the query. Would you consider adding this change as well, @alexjo2144 ? Is there any stats can be provided to prove the performance improvement, @Heltman?

alexjo2144 · 2023-04-24T16:23:18Z

Looks like using a Set is safe there in the Iceberg Spark implementation because they do an additional projection step to ensure that the readers return the Schema stored in the Set there, even if it doesn't match the file schema. We don't have that additional projection here yet, so it will cause problems in Trino.

It needs some clean-up but here's a test case illustrating the problem: https://gist.github.com/alexjo2144/8a80ff5146ab3c82fa0c5fc5b4f33e66

So we can either need to add the additional projection, or use an ordered Collection like a List

Heltman · 2023-04-25T03:39:34Z

@alexjo2144 Schema mismatch is indeed a problem, but fortunately, the additional projection step implemented by spark is also easy to implement in trino. We only need to refer to the implementation of spark and organize the fields in order when reading the delete files. Trino's ParquetReader has additional projection(same as spark). Please check the new commit, I fix this problem and add your test case.

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

findinpath · 2023-06-08T11:56:29Z

The stack depth is only a hidden danger. The real problem is that multiple StructLikeSet of a split are not merged according to the ids, resulting in too many StructLikeSet being generated, which makes the filtering efficiency very lowly.

Can you pls provide a test case in your PR which can be used to showcase this situation?

I was trying with the code from master and see only one StructLikeSet with 7 elements (same as in your submission).

@Test
public void testMultipleDeletes()
            throws Exception
    {
        String tableName = "test_equality_delets_different_schemas_" + randomNameSuffix();
        assertUpdate("CREATE TABLE " + tableName + " AS SELECT * FROM tpch.tiny.nation", 25);
        Table icebergTable = updateTableToV2(tableName);
        Assertions.assertThat(icebergTable.currentSnapshot().summary().get("total-equality-deletes")).isEqualTo("0");
        Path metadataDir = new Path(metastoreDir.toURI());
        TrinoFileSystem fs = HDFS_FILE_SYSTEM_FACTORY.create(SESSION);

        String deleteFile1 = "delete_file_" + UUID.randomUUID();
        List<String> firstDeleteFileColumns = ImmutableList.of("regionkey");
        Schema deleteRowSchema = icebergTable.schema().select(firstDeleteFileColumns);
        List<Integer> equalityFieldIds = firstDeleteFileColumns.stream()
                .map(name -> deleteRowSchema.findField(name).fieldId())
                .collect(toImmutableList());
        Parquet.DeleteWriteBuilder writerBuilder = Parquet.writeDeletes(new ForwardingFileIo(fs).newOutputFile(new Path(metadataDir, deleteFile1).toString()))
                .forTable(icebergTable)
                .rowSchema(deleteRowSchema)
                .createWriterFunc(GenericParquetWriter::buildWriter)
                .equalityFieldIds(equalityFieldIds)
                .overwrite();
        EqualityDeleteWriter<Record> writer = writerBuilder.buildEqualityWriter();

        Record dataDelete = GenericRecord.create(deleteRowSchema);
        try (Closeable ignored = writer) {
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 1L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 2L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 3L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 4L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 5L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 6L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 7L)));
        }
        icebergTable.newRowDelta().addDeletes(writer.toDeleteFile()).commit();


        assertQuery("SELECT * FROM " + tableName, "SELECT * FROM nation WHERE  regionkey > 7");
        assertUpdate("DROP TABLE " + tableName);
    }

Heltman · 2023-06-08T11:57:44Z

Looks like using a Set is safe there in the Iceberg Spark implementation because they do an additional projection step to ensure that the readers return the Schema stored in the Set there, even if it doesn't match the file schema. We don't have that additional projection here yet, so it will cause problems in Trino.

It needs some clean-up but here's a test case illustrating the problem: https://gist.github.com/alexjo2144/8a80ff5146ab3c82fa0c5fc5b4f33e66

So we can either need to add the additional projection, or use an ordered Collection like a List

@findinpath just check below.

Heltman · 2023-06-08T12:03:07Z

The stack depth is only a hidden danger. The real problem is that multiple StructLikeSet of a split are not merged according to the ids, resulting in too many StructLikeSet being generated, which makes the filtering efficiency very lowly.

Can you pls provide a test case in your PR which can be used to showcase this situation?

It is difficult to simulate this performance problem, because it needs to have more deletefiles, and it has been updated many times according to same columns.

We only need to imagine that if there are two deletefiles, each with 10,000 lines, 9,000 of which are the same. Originally, we need to filter a line of data from 20,000 lines, but now we only need to match 11,000 lines.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

Heltman · 2023-06-09T03:29:23Z

@findinpath , add testMultipleEqualityDeletes, delete file has compact:

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/EqualityDeleteFilter.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

findinpath

Overall there are some cosmetics improvements needed and testing coverage for nested deletes is missing.

Great work ! 👍

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/EqualityDeleteSets.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/EqualityDeleteFilter.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/EqualityDeleteSets.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

alexjo2144 · 2023-09-13T19:02:19Z

Pending the last couple comments from Marius, this looks good to me.
It does need a rebase though.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/EqualityDeleteFilter.java

findepi · 2023-09-22T07:38:05Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

the call site looks like new EqualityDeleteSet(deleteSchema, schemaFromHandles(readColumns))
are the constructor arg names right? is the call site right?

I suggest public EqualityDeleteSet(Schema deleteSchema, Schema fileSchema). Because first one is schema from iceberg metadata, second schema from read file (parquet, orc, etc.)

@Heltman yes, public EqualityDeleteSet(Schema deleteSchema, Schema fileSchema) should be fine. Let's go forward with this suggestion

Finally we reached a consensus, public EqualityDeleteSet(Schema deleteSchema, Schema dataSchema) is good idea. deleteSchema come from iceberg metadata, dataSchema come from equality delete file.

findepi · 2023-09-26T16:48:48Z

Merged, thanks!

cla-bot bot added the cla-signed label Apr 19, 2023

Heltman requested a review from electrum April 19, 2023 06:26

github-actions bot added the iceberg Iceberg connector label Apr 19, 2023

alexjo2144 reviewed Apr 19, 2023

View reviewed changes

Heltman force-pushed the iceberg-deletefile-optimize branch from cd0d4d8 to 4f8e9ca Compare April 20, 2023 04:59

alexjo2144 reviewed Apr 21, 2023

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Outdated Show resolved Hide resolved

Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from c571e0a to fb9c955 Compare June 8, 2023 08:05

findinpath mentioned this pull request Jun 8, 2023

Use only one RowPredicate in Iceberg for filtering the deletes #17735

Closed

findinpath reviewed Jun 8, 2023

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated Show resolved Hide resolved

findinpath reviewed Jun 8, 2023

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Outdated Show resolved Hide resolved

findinpath reviewed Jun 8, 2023

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated Show resolved Hide resolved

findinpath reviewed Jun 8, 2023

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Outdated Show resolved Hide resolved

Heltman force-pushed the iceberg-deletefile-optimize branch from fb9c955 to 66fe0b4 Compare June 9, 2023 03:27

Heltman force-pushed the iceberg-deletefile-optimize branch from 66fe0b4 to 852299d Compare June 9, 2023 04:00

findinpath reviewed Jun 11, 2023

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated Show resolved Hide resolved

This was referenced Aug 4, 2023

Iceberg: Inefficient Equality Delete file handling #18396

Closed

Improve performance for Equality Delete files in Iceberg connector #18397

Closed

findinpath reviewed Aug 4, 2023

View reviewed changes

Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from 552b092 to 58a765d Compare August 7, 2023 09:49

findinpath reviewed Aug 7, 2023

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Outdated Show resolved Hide resolved

Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from d9193b1 to 947faab Compare August 31, 2023 12:19

Heltman commented Aug 31, 2023

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated Show resolved Hide resolved

findinpath reviewed Aug 31, 2023

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated Show resolved Hide resolved

Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from c1a4fa9 to 7a23964 Compare August 31, 2023 15:03

findinpath approved these changes Aug 31, 2023

View reviewed changes

findinpath reviewed Sep 4, 2023

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated Show resolved Hide resolved

Heltman commented Sep 4, 2023

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Outdated Show resolved Hide resolved

Heltman force-pushed the iceberg-deletefile-optimize branch from 7a23964 to 541cbb4 Compare September 4, 2023 14:28

findinpath requested review from alexjo2144 and findepi September 8, 2023 10:09

alexjo2144 approved these changes Sep 13, 2023

View reviewed changes

Heltman force-pushed the iceberg-deletefile-optimize branch from 541cbb4 to 3e0278f Compare September 18, 2023 03:22

findinpath reviewed Sep 18, 2023

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Outdated Show resolved Hide resolved

findinpath approved these changes Sep 18, 2023

View reviewed changes

Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from ff47acc to 6ed8f78 Compare September 18, 2023 06:34

findepi reviewed Sep 20, 2023

View reviewed changes

Heltman force-pushed the iceberg-deletefile-optimize branch from 6ed8f78 to 69ca01b Compare September 21, 2023 07:19

findepi reviewed Sep 22, 2023

View reviewed changes

Heltman force-pushed the iceberg-deletefile-optimize branch from 69ca01b to 62276ce Compare September 26, 2023 11:58

findinpath requested a review from findepi September 26, 2023 12:06

Improve performance of reading iceberg table with many delete files

a3d08f2

Heltman force-pushed the iceberg-deletefile-optimize branch from 62276ce to a3d08f2 Compare September 26, 2023 12:11

findepi merged commit 04ac9ba into trinodb:master Sep 26, 2023

findepi changed the title ~~Improve performance of reading iceberg table with many delete files~~ Improve performance of reading iceberg table with many equality delete files Sep 26, 2023

github-actions bot added this to the 427 milestone Sep 26, 2023

Conversation

Heltman commented Apr 19, 2023 • edited by findepi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

alexjo2144 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiacongling commented Apr 23, 2023

Uh oh!

alexjo2144 commented Apr 24, 2023

Uh oh!

Heltman commented Apr 25, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findinpath commented Jun 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Heltman commented Jun 8, 2023

Uh oh!

Heltman commented Jun 8, 2023

Uh oh!

Uh oh!

Heltman commented Jun 9, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findinpath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexjo2144 commented Sep 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findepi Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

Heltman Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

findinpath Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

Heltman Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

findepi commented Sep 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

Heltman commented Apr 19, 2023 •

edited by findepi

Loading

findinpath commented Jun 8, 2023 •

edited

Loading